NLP by Vinod

A structured public journey from NLP fundamentals to real-world AI systems.

Vinod Codes is where I document my learning in AI, Machine Learning, Deep Learning, Natural Language Processing, Generative AI, and practical projects.

The main series here is NLP by Vinod — a learner-builder journey where I explain concepts with intuition, Python examples, mistakes, GitHub work, and honest implementation notes.

Start here: follow the Foundations Track first, then move into deep learning, transformers, projects, and real-world NLP systems.
NLP Foundations Python for NLP Machine Learning Deep Learning Real Projects

PyTorch for Deep Learning - From Tensors to ANN Training

NLP by Vinod - Deep Learning
PyTorch Foundations

PyTorch for Deep Learning - From Tensors to ANN Training.

After NLP libraries, I moved into deep learning foundations and learned how PyTorch helps build neural networks using tensors, autograd, training loops, DataLoader, GPU training and hyperparameter tuning.

PyTorch Tensors Autograd ANN

PyTorch for deep learning became the first practical step for me before going into RNNs and modern NLP architectures. Until now, I was mostly working with preprocessing, embeddings and NLP libraries. But to understand deep learning for NLP properly, I needed to understand how neural networks are actually built, trained, improved and debugged.

My rough understanding is this: PyTorch is one of the best libraries for learning and coding deep learning because it feels close to normal Python while still giving powerful tools for tensors, automatic differentiation, GPU training and neural network building. In these notebooks, I started from basics, moved to tensors, learned autograd, wrote a training pipeline, used Dataset and DataLoader, trained an ANN on GPU, noticed overfitting and then tried improvements and hyperparameter tuning.

This topic was important because I did not want deep learning to feel like a black box. I wanted to see how data becomes tensors, how forward pass works, how loss is calculated, how gradients are created, and how optimizer updates the model parameters.

What clicked for me:
PyTorch is not only about writing a model. It teaches the full training workflow: tensor, forward pass, loss, backward pass, optimizer step and evaluation.
PyTorch deep learning workflow showing tensors autograd neural network training and GPU execution
PyTorch connects tensors, autograd, neural network modules, optimizers, datasets, dataloaders and GPU training into one practical deep learning workflow.

01 Why PyTorch Matters Before RNNs

The next topic in my roadmap is RNN. But before studying sequence models, I need the base deep learning workflow. RNNs are also neural networks. They still use tensors, gradients, loss functions, optimizers and training loops. So learning PyTorch first makes the next topics less confusing.

What PyTorch Gives

  • tensor operations
  • automatic gradients
  • neural network modules
  • optimizers and loss functions
  • GPU acceleration
  • custom datasets and batching

Why It Helps NLP

  • embeddings become tensors
  • text batches need DataLoader
  • sequence models need training loops
  • transformers are built using the same ideas
  • experiments need tuning and debugging
My simple definition: PyTorch is a deep learning library that helps us create tensors, build neural networks, calculate gradients and train models in a flexible Pythonic way.

02 Tensors - The Basic Data Structure

The first serious concept I learned was tensors. A tensor is like a specialized multi-dimensional array designed for mathematical operations. In deep learning, almost everything becomes a tensor: input data, labels, weights, bias values, gradients and model outputs.

A scalar is a zero-dimensional tensor, a vector is one-dimensional, a matrix is two-dimensional, and images or batches can become higher-dimensional tensors.

Python
import torch

a = torch.zeros(2, 3)
b = torch.ones(2, 3)
c = torch.rand(2, 3)

print(a)
print(b.shape)
print(c.dtype)

In the tensor notebook, I practiced creating tensors using empty, zeros, ones, rand, manual_seed and tensor. I also practiced tensor shapes, data types, mathematical operations, reduction operations, comparison functions, in-place operations, copying, reshaping and NumPy conversion.

What clicked: tensors are the language of PyTorch. If I understand shapes and tensor operations, debugging neural networks becomes much easier.

03 CPU and GPU Tensors

Another important part was understanding device movement. PyTorch can run tensors and models on CPU or GPU. But the tensor and model must be on the same device. Otherwise, errors happen.

Python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

x = torch.rand(3, 4)
x = x.to(device)

print(device)
print(x.device)
Practical mistake to avoid: do not keep the model on GPU and the input batch on CPU. Both should be moved to the same device before the forward pass.

04 Autograd - How Gradients Are Calculated

Autograd was one of the most important concepts. Earlier, I learned derivatives manually. But in a neural network, there are many parameters and many operations. Manually writing every derivative is not practical.

PyTorch autograd tracks operations on tensors when requires_grad=True. During backward(), it calculates gradients using the computation graph.

Python
x = torch.tensor(3.0, requires_grad=True)

y = x ** 2
y.backward()

print(x.grad)

I also tested chained operations like square, sine and exponential. This helped me understand the computation graph: forward pass creates values, backward pass calculates how much each tracked variable affected the final output.

What clicked: gradients are not stored for every intermediate tensor by default. The main trainable parameters are the important ones.

05 A Training Pipeline from Scratch

After tensors and autograd, I moved to the training pipeline. In one notebook, I used a breast cancer dataset and first built the process manually. I converted NumPy arrays into PyTorch tensors, created weights and bias with requires_grad=True, wrote a forward function, calculated loss and updated parameters.

Data features and labels
Tensor convert and scale
Forward predict output
Loss measure error
Update backward and optimizer
Python
class ANNModel:
    def __init__(self, X):
        self.W = torch.rand(X.shape[1], 1, requires_grad=True)
        self.b = torch.rand(1, requires_grad=True)

    def forward(self, X_train):
        z = torch.matmul(X_train, self.W) + self.b
        return torch.sigmoid(z)

This was helpful because I could see the basic logic of a neural network without hiding everything behind built-in classes.

06 Moving to nn.Module

After writing the pipeline manually, I moved to torch.nn. This is where the code started looking like real deep learning code. I learned that nn.Module is the base class for neural network models. The model has an __init__ method for layers and a forward method for the forward pass.

Python
import torch.nn as nn

class Model(nn.Module):
    def __init__(self, num_features):
        super().__init__()
        self.linear = nn.Linear(num_features, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, features):
        out = self.linear(features)
        out = self.sigmoid(out)
        return out

Then I added a hidden layer with ReLU. After that, I learned nn.Sequential, which makes simple feed-forward networks cleaner when layers are arranged one after another.

My takeaway: manual code explains the logic, but nn.Module gives a clean structure for real model building.

07 Dataset and DataLoader

Before learning DataLoader, I was thinking of training as looping over the full dataset. But full batch training can be memory inefficient and slow. The notebook made mini-batch training clearer.

The problem with manually slicing data is that it does not give a standard interface for transformations, shuffling, sampling, batching or parallel loading. PyTorch solves this using Dataset and DataLoader.

Python
from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
    def __init__(self, features, labels):
        self.features = features
        self.labels = labels

    def __len__(self):
        return len(self.features)

    def __getitem__(self, index):
        return self.features[index], self.labels[index]

loader = DataLoader(dataset, batch_size=32, shuffle=True)

Dataset

  • stores how data is accessed
  • defines length of data
  • returns one sample at a time
  • can include transformations

DataLoader

  • creates batches
  • handles shuffling
  • supports parallel loading
  • feeds batches into training loop
PyTorch training pipeline showing Dataset DataLoader model loss backward optimizer and evaluation
Dataset controls how samples are accessed, while DataLoader turns those samples into batches for the training loop.

08 Building an ANN on GPU

The main implementation part was building an Artificial Neural Network and training it on GPU. I used a Fashion-MNIST style dataset where each image becomes a flattened input vector and the model predicts one of ten classes.

The first model had an input layer, two hidden layers and an output layer. The hidden layers used ReLU, and the final layer produced logits for ten classes.

Python
class MyNN(nn.Module):
    def __init__(self, num_features):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(num_features, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        return self.model(x)

For multiclass classification, I used CrossEntropyLoss. For optimization, I used SGD first. I moved the model and batches to the selected device.

Python
model = MyNN(X_train.shape[1]).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
Important detail: with CrossEntropyLoss, I do not need to manually apply softmax inside the model.

09 Training Loop and Evaluation

The training loop is where all PyTorch pieces connect. Each epoch processes batches. For every batch, I send data to device, run forward pass, calculate loss, clear old gradients, run backward pass and update parameters using the optimizer.

Python
for epoch in range(epochs):
    model.train()

    for batch_features, batch_labels in train_loader:
        batch_features = batch_features.to(device)
        batch_labels = batch_labels.to(device)

        outputs = model(batch_features)
        loss = criterion(outputs, batch_labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

The evaluation part uses model.eval() and torch.no_grad(). This matters because evaluation should not track gradients.

What clicked: training mode and evaluation mode are different. Some layers like dropout and batch normalization behave differently during training and testing.

10 Overfitting I Noticed

In the first ANN experiment, I saw that train accuracy became very high, around 99 percent, but test accuracy stayed around 82 percent. That was a clear sign of overfitting.

This was a useful mistake because it showed me that a model can memorize training data without generalizing well to unseen data.

Overfitting Signal

  • training accuracy is very high
  • test accuracy is much lower
  • model memorizes training patterns
  • generalization is weak

Possible Fixes

  • add more data
  • reduce model complexity
  • use dropout
  • use weight decay
  • try batch normalization
  • tune hyperparameters
My notebook lesson: high training accuracy alone is not success. The test result tells whether the model learned useful patterns or only memorized.

11 Improving the ANN

In the improved ANN notebook, I added BatchNorm1d, Dropout and weight_decay. This was my first proper attempt to reduce overfitting using regularization.

Python
class MyNN(nn.Module):
    def __init__(self, num_features):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(num_features, 128),
            nn.BatchNorm1d(128),
            nn.ReLU(),
            nn.Dropout(p=0.3),
            nn.Linear(128, 64),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.Dropout(p=0.3),
            nn.Linear(64, 10)
        )

Even after improvement, the notebook showed that overfitting was still present. But that itself was a good learning point. Overfitting is not always fixed by adding one trick. It may require better architecture, better data, better search space and better tuning.

My understanding: dropout randomly reduces dependency on specific neurons during training, while weight decay adds L2 regularization through the optimizer.
PyTorch ANN improvement showing dropout batch normalization weight decay and Optuna hyperparameter tuning
Overfitting forced me to think beyond accuracy and experiment with dropout, batch normalization, weight decay and hyperparameter tuning.

12 Hyperparameter Tuning with Optuna

The last notebook made one question very clear: why did I choose two hidden layers, 128 neurons, this learning rate, this batch size and this dropout value? Earlier, I was choosing these numbers manually. But there is no single fixed answer.

This is why hyperparameter tuning matters. I explored Grid Search, Random Search and Bayesian Search using Optuna. In my notebook, I used Optuna to tune values such as number of hidden layers, neurons per layer, epochs, optimizer, learning rate, batch size, dropout rate and weight decay.

Python
def objective(trial):
    num_hidden_layers = trial.suggest_int("num_hidden_layers", 1, 5)
    neurons = trial.suggest_int("neurons_per_layer", 8, 128, step=8)
    lr = trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True)
    dropout = trial.suggest_float("dropout_rate", 0.1, 0.5, step=0.1)
    batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])

    # build model, train it and return validation accuracy
    return validation_accuracy
What clicked: hyperparameter tuning turns guessing into a more systematic experiment.

13 My Final Understanding

My final understanding is that PyTorch is not only a library for writing neural networks. It is a full deep learning workflow. The model is only one part. The real system includes data loading, tensor shapes, device management, loss function, optimizer, autograd, training mode, evaluation mode and tuning.

01
Tensors are the base
Data, weights, outputs and gradients all live as tensors in PyTorch.
02
Autograd handles gradients
It tracks the computation graph and calculates gradients during backward pass.
03
DataLoader makes batching practical
It helps with batching, shuffling and feeding data into the training loop.
04
Overfitting needs experiments
Dropout, batch normalization, weight decay and tuning are tools, not magic buttons.

14 GitHub Notebook Connection

This blog explains what I understood from my PyTorch and ANN notebooks. The implementation side is connected to the NLP by Vinod GitHub repository.

GH

NLP by Vinod GitHub Repository

Notebook references: 00_basic.ipynb, 01_tensors.ipynb, 02_autograd.ipynb, 03_training_pipeline.ipynb, 04_dataset_dataloader.ipynb, 05_ANN_training_on_GPU.ipynb, 06_ANN_training_on_GPU_improved.ipynb, and 07_ANN_Hyperparameter_tuning.ipynb.

Open the GitHub repository

16 What Comes Next in the NLP Journey

The next topic is RNN. After learning PyTorch basics and ANN training, I am ready to move into sequence models, where order matters and text is processed step by step.

01
RNN basics

How recurrent networks process sequences one step at a time.

02
Text as sequence

How tokens or embeddings become ordered inputs for sequence models.

03
LSTM and GRU later

How improved recurrent models solve some limitations of simple RNNs.

PyTorch Deep Learning Tensors Autograd ANN Hyperparameter Tuning

PyTorch made deep learning feel like a complete workflow, not just a model.

This topic helped me understand tensors, autograd, Dataset, DataLoader, ANN training, GPU execution, overfitting and hyperparameter tuning before moving into RNNs.

Comments

Most viewed

Python Strings & Regex for NLP — The Real Foundation

NLP Learning Roadmap — From Fundamentals to Real-World AI Systems

Data Acquisition for NLP - Collecting Text Before Preprocessing