Posts

Showing posts with the label Text Preprocessing
NLP by Vinod

A structured public journey from NLP fundamentals to real-world AI systems.

Vinod Codes is where I document my learning in AI, Machine Learning, Deep Learning, Natural Language Processing, Generative AI, and practical projects.

The main series here is NLP by Vinod — a learner-builder journey where I explain concepts with intuition, Python examples, mistakes, GitHub work, and honest implementation notes.

Start here: follow the Foundations Track first, then move into deep learning, transformers, projects, and real-world NLP systems.
NLP Foundations Python for NLP Machine Learning Deep Learning Real Projects

Text Preprocessing in NLP - Cleaning Raw Text Before Feature Extraction

Image
NLP by Vinod - Foundations Text Preprocessing Text Preprocessing in NLP - Cleaning Raw Text Before Feature Extraction. After data acquisition, I learned how raw text is cleaned, normalized, tokenized, and prepared before feature extraction. This post connects basic and advanced preprocessing from my notebooks into one clear sequence. NLP Text Cleaning Tokenization Stemming

Data Acquisition for NLP - Collecting Text Before Preprocessing

Image
Data Acquisition for NLP - Collecting Text Before Preprocessing NLP by Vinod - Foundations Data Acquisition Data Acquisition for NLP - Collecting Text Before Preprocessing. Data acquisition is the first real step before text preprocessing. In this post, I am documenting how I collected data using web scraping, JSON files, SQL, APIs, CSV workflows, and basic EDA. NLP Data Acquisition Web Scraping APIs

Python Strings & Regex for NLP — The Real Foundation

Image
Python Strings & Regex for NLP - The Real Foundation NLP by Vinod - Foundations NLP Core Skills Python Strings & Regex for NLP - The Real Foundation. Before tokenization, embeddings, transformers, or BERT, every NLP pipeline starts with raw text. This post is my practical breakdown of Python strings, Unicode, regex patterns, and text cleaning for NLP. NLP Python Regex Text Preprocessing