NLP by Vinod

A structured public journey from NLP fundamentals to real-world AI systems.

Vinod Codes is where I document my learning in AI, Machine Learning, Deep Learning, Natural Language Processing, Generative AI, and practical projects.

The main series here is NLP by Vinod — a learner-builder journey where I explain concepts with intuition, Python examples, mistakes, GitHub work, and honest implementation notes.

Start here: follow the Foundations Track first, then move into deep learning, transformers, projects, and real-world NLP systems.
NLP Foundations Python for NLP Machine Learning Deep Learning Real Projects

RNN for Sequence Modeling - Why Neural Networks Need Memory

NLP by Vinod - Deep Learning
Sequence Models

RNN for Sequence Modeling - Why Neural Networks Need Memory.

After PyTorch and ANN training, I moved to Recurrent Neural Networks to understand why normal ANN and CNN models are not enough when the input is ordered sequential data like text.

RNN Sequence Model PyTorch NLP

RNN in deep learning became the next topic in my NLP journey because text is not just a collection of independent words. Text has order. The meaning of a word depends on the words before it, and sometimes even the full sentence context matters. This is where I started seeing the limitation of using only ANN or CNN style thinking for sequential data.

My rough understanding is this: an ANN works well for tabular data, and CNNs are strong for grid-like data such as images. But when the data is sequential, like text, speech, time series or stock prices, the model needs some way to remember previous steps. A Recurrent Neural Network does this by using a hidden state that is passed from one time step to the next.

In the notebooks, I first tried to understand the RNN forward pass from scratch using NumPy. Then I moved to PyTorch and built small RNN examples for sentiment analysis, machine translation style sequence processing and a simple question-answering system.

What clicked for me:
ANN looks at fixed features. RNN reads a sequence step by step and carries memory through hidden states.
RNN sequence model workflow showing text tokens hidden states and final prediction in NLP
RNNs process sequential data one step at a time, carrying information forward through hidden states.

01 Why I Needed RNN After ANN and CNN

While learning neural networks, I understood that different data types need different model thinking. Tabular data can often go into ANN. Images are grid-like, so CNNs make sense. But text is ordered. If I change the order of words, the meaning can change.

Data Type Common Model Why
Tabular ANN works with fixed feature columns
Images CNN captures local spatial patterns
Text RNN uses word order and previous context
Speech RNN style models signal changes over time
Time series Sequence models past values influence future values

This comparison helped me understand the real reason behind RNN. It is not just another neural network architecture. It is made for inputs where order matters.

My simple definition: RNN is a neural network for sequential data where the model uses a hidden state to remember information from previous time steps.

02 The Main Idea of Hidden State

The hidden state is the most important idea in a simple RNN. At each time step, the model takes the current input and the previous hidden state. Then it creates a new hidden state.

In a sentence like I love NLP, the model does not read all words as unrelated features. It reads I, then carries some information to love, then carries updated information to NLP.

Token 1 I
Hidden 1 memory after first word
Token 2 love
Hidden 2 updated memory
Prediction sentiment or class

This flow made RNN easier for me. The same cell is reused at every time step, and the hidden state becomes the memory of what has been seen so far.

What clicked: the hidden state is not the final answer. It is the running summary of the sequence so far.

03 RNN Forward Pass from Scratch

In the first notebook, I created a very small sentence with word vectors: I, love and NLP. Then I created input weights, hidden weights, output weights and biases manually using NumPy.

The forward pass became clearer when I wrote it in small steps. First, I initialized the hidden state. Then for every word, I calculated a new hidden state using the current input vector and the previous hidden state. After the final word, I used the last hidden state to make a prediction.

Python
import numpy as np

word_vectors = {
    "I": np.array([0.1, 0.2]),
    "love": np.array([0.5, 0.2]),
    "NLP": np.array([0.3, 0.7])
}

sentence = ["I", "love", "NLP"]

h_prev = np.zeros(2)

for word in sentence:
    x_t = word_vectors[word]
    h_t = np.tanh(
        np.dot(W_i, x_t) +
        np.dot(W_hh, h_prev) +
        b_h
    )
    h_prev = h_t

This small example was important because it showed the mathematical heart of RNN without hiding everything inside PyTorch.

Notebook lesson: when learning RNN, the shapes are easy to confuse. Input vector size, hidden size and output size must match the matrix operations.

04 Backpropagation Through Time

After the forward pass, I also touched Backpropagation Through Time. This was difficult because the same RNN weights are used repeatedly across time steps. So during training, the model has to calculate how each time step contributed to the final loss.

My understanding is that BPTT is normal backpropagation applied across the unfolded sequence. If a sentence has three tokens, the RNN can be imagined as three repeated cells sharing the same weights.

Forward Pass

  • read token by token
  • update hidden state
  • create final prediction
  • calculate loss

Backward Pass

  • start from the loss
  • move through time steps
  • calculate gradients
  • update shared weights
What I understood: RNN training is harder than simple ANN training because the model has repeated computation across time.

05 RNN for Sentiment Analysis

After the from-scratch part, I moved to a PyTorch RNN sentiment analysis example. I used small sentences such as i love it, i hate it, so good, very bad, awesome movie and worst ever.

The workflow was similar to what I learned earlier in NLP: tokenize text, build vocabulary, convert words to indices, pad sequences, create tensors, build an embedding layer, use RNN and train with a loss function.

Python
class RNNModel(nn.Module):
    def __init__(self, vocab_size):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim=8)
        self.rnn = nn.RNN(8, 16, batch_first=True)
        self.fc = nn.Linear(16, 1)

    def forward(self, x):
        x = self.embedding(x)
        output, hidden = self.rnn(x)
        final_hidden = hidden.squeeze(0)
        logits = self.fc(final_hidden)
        return logits

This example helped me connect embeddings with RNN. The RNN does not directly understand raw words. Words first become indices, then embeddings, and then those embedding vectors are processed step by step.

RNN sentiment analysis workflow showing tokenization vocabulary embeddings hidden state and output prediction
In sentiment analysis, words are converted into embeddings and then passed through RNN hidden states before the final prediction.

06 RNN for Machine Translation Style Sequence Processing

I also tried a small machine translation style example using dummy English and French sentence pairs. The idea was not to build a real translator. The goal was to see how one sequence can be mapped to another sequence.

I built separate vocabularies for English and French, converted both sides into indices, padded them, passed the English sentence through an RNN and predicted output tokens.

Input Side

  • English sentence
  • tokenization
  • English vocabulary
  • integer sequence
  • embedding vectors

Output Side

  • French sentence
  • French vocabulary
  • target tokens
  • cross entropy loss
  • sequence prediction
Important limitation: this was a tiny learning example. Real machine translation needs better architecture, more data, teacher forcing, attention and later transformer-based models.

07 Building a Simple RNN-Based QA System

The second notebook was a small question-answering system using an RNN. I loaded a CSV dataset of question-answer pairs, tokenized both questions and answers, built vocabulary, converted text into numerical indices and created a custom PyTorch dataset.

This connected many earlier topics together: text preprocessing, vocabulary, indexing, Dataset, DataLoader, embeddings and sequence modeling.

Python
class QADataset(Dataset):
    def __init__(self, df, vocab):
        self.df = df
        self.vocab = vocab

    def __len__(self):
        return self.df.shape[0]

    def __getitem__(self, index):
        question = text_to_indices(self.df.iloc[index]["question"], self.vocab)
        answer = text_to_indices(self.df.iloc[index]["answer"], self.vocab)
        return torch.tensor(question), torch.tensor(answer)

Then I created a simple RNN model with an embedding layer, an RNN layer and a linear layer that predicts a vocabulary word from the final hidden state.

Python
class SimpleRNN(nn.Module):
    def __init__(self, vocab_size):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim=50)
        self.rnn = nn.RNN(50, 64, batch_first=True)
        self.fc = nn.Linear(64, vocab_size)

    def forward(self, question):
        embedded_question = self.embedding(question)
        hidden, final = self.rnn(embedded_question)
        output = self.fc(final.squeeze(0))
        return output
What clicked: the RNN reads the question sequence and the final hidden state is used as a summary to predict an answer token.

08 Training Loop for the QA Model

The training loop followed the same PyTorch pattern I learned in the previous post: clear gradients, run forward pass, calculate loss, run backward pass and update parameters.

Python
for epoch in range(epochs):
    total_loss = 0

    for question, answer in dataloader:
        optimizer.zero_grad()

        output = model(question)
        loss = criterion(output, answer[0])

        loss.backward()
        optimizer.step()

        total_loss = total_loss + loss.item()

I also wrote a prediction function that converts a new question into indices, sends it to the model, applies softmax and returns the word with the highest probability.

Notebook lesson: this QA system is simple, but it helped me understand the end-to-end flow from raw question text to indexed sequence to RNN output.

09 Mistakes and Confusions I Noticed

RNN looked simple conceptually, but implementation had several places where I could get confused. Most confusion came from shapes and from understanding what the RNN returns.

Confusions

  • difference between output and final hidden state
  • batch dimension versus sequence dimension
  • why padding is needed for unequal sentence lengths
  • why final hidden state is used for classification
  • why raw words cannot go directly into RNN

Better Thinking

  • text first becomes tokens
  • tokens become indices
  • indices become embeddings
  • RNN creates hidden states
  • final layer makes prediction
Common RNN implementation mistakes showing tensor shapes hidden states padding and sequence dimensions
The hardest part of RNN implementation was not the syntax, but understanding tensor shapes, hidden states, padding and sequence dimensions.

10 Limitations of Simple RNN

After building simple RNN examples, I also understood why RNN is not the final answer for sequence modeling. A basic RNN can struggle with long sequences because information from earlier time steps may become weak as the sequence grows.

This connects directly to the next topic: LSTM. LSTM was designed to handle the memory problem better by using gates. So RNN is the correct foundation, but LSTM explains why simple recurrence was not enough.

Simple RNN Problems

  • struggles with long-term dependencies
  • can suffer from vanishing gradients
  • memory becomes weak over long sequences
  • not ideal for complex language tasks

Why LSTM Comes Next

  • uses gates for memory control
  • keeps important information longer
  • handles longer context better
  • improves sequence modeling
My final caution: RNN helped me understand sequence modeling, but for stronger NLP systems, I need to move forward to LSTM, GRU and then transformers.

11 My Final Understanding

My final understanding is that RNNs are important because they introduced the idea of processing text as a sequence. Instead of treating words as independent features, an RNN reads them one by one and keeps a hidden state as memory.

01
RNN is for ordered data
Text, speech and time series need models that respect sequence order.
02
Hidden state acts like memory
The model updates its memory at each time step while reading the sequence.
03
RNN still uses PyTorch basics
Embedding, loss, optimizer, backward pass and DataLoader are still part of the workflow.
04
Simple RNN has limits
Long context is difficult, so LSTM becomes the next natural topic.

12 GitHub Notebook Connection

This blog explains what I understood from my RNN notebooks. The implementation side is connected to the NLP by Vinod GitHub repository.

GH

NLP by Vinod GitHub Repository

Notebook references: 01_RNN_from_scratch.ipynb and 02-rnn-based-qa-system.ipynb.

Open the GitHub repository

14 What Comes Next in the NLP Journey

The next topic is LSTM. After understanding simple recurrence and hidden states, I now want to learn how LSTM improves memory using gates.

01
Vanishing gradients

Why simple RNN struggles when sequences become long.

02
LSTM gates

How forget, input and output gates control memory.

03
Better sequence learning

How LSTM handles longer context better than a simple RNN.

RNN Sequence Models Deep Learning PyTorch NLP Hidden State

RNN made sequence modeling feel practical for the first time.

This topic helped me understand why normal neural networks struggle with ordered text and how RNNs use hidden states to process sequences step by step.

Comments

Most viewed

Python Strings & Regex for NLP — The Real Foundation

NLP Learning Roadmap — From Fundamentals to Real-World AI Systems

Data Acquisition for NLP - Collecting Text Before Preprocessing