NLP by Vinod - Deep Learning

PyTorch Foundations

PyTorch for Deep Learning - From Tensors to ANN Training.

After NLP libraries, I moved into deep learning foundations and learned how PyTorch helps build neural networks using tensors, autograd, training loops, DataLoader, GPU training and hyperparameter tuning.

PyTorch Tensors Autograd ANN

PyTorch for deep learning became the first practical step for me before going into RNNs and modern NLP architectures. Until now, I was mostly working with preprocessing, embeddings and NLP libraries. But to understand deep learning for NLP properly, I needed to understand how neural networks are actually built, trained, improved and debugged.

My rough understanding is this: PyTorch is one of the best libraries for learning and coding deep learning because it feels close to normal Python while still giving powerful tools for tensors, automatic differentiation, GPU training and neural network building. In these notebooks, I started from basics, moved to tensors, learned autograd, wrote a training pipeline, used Dataset and DataLoader, trained an ANN on GPU, noticed overfitting and then tried improvements and hyperparameter tuning.

This topic was important because I did not want deep learning to feel like a black box. I wanted to see how data becomes tensors, how forward pass works, how loss is calculated, how gradients are created, and how optimizer updates the model parameters.

What clicked for me:
PyTorch is not only about writing a model. It teaches the full training workflow: tensor, forward pass, loss, backward pass, optimizer step and evaluation.

PyTorch deep learning workflow showing tensors autograd neural network training and GPU execution — PyTorch connects tensors, autograd, neural network modules, optimizers, datasets, dataloaders and GPU training into one practical deep learning workflow.

01 Why PyTorch Matters Before RNNs

The next topic in my roadmap is RNN. But before studying sequence models, I need the base deep learning workflow. RNNs are also neural networks. They still use tensors, gradients, loss functions, optimizers and training loops. So learning PyTorch first makes the next topics less confusing.

What PyTorch Gives

tensor operations
automatic gradients
neural network modules
optimizers and loss functions
GPU acceleration
custom datasets and batching

Why It Helps NLP

embeddings become tensors
text batches need DataLoader
sequence models need training loops
transformers are built using the same ideas
experiments need tuning and debugging

My simple definition: PyTorch is a deep learning library that helps us create tensors, build neural networks, calculate gradients and train models in a flexible Pythonic way.

02 Tensors - The Basic Data Structure

The first serious concept I learned was tensors. A tensor is like a specialized multi-dimensional array designed for mathematical operations. In deep learning, almost everything becomes a tensor: input data, labels, weights, bias values, gradients and model outputs.

A scalar is a zero-dimensional tensor, a vector is one-dimensional, a matrix is two-dimensional, and images or batches can become higher-dimensional tensors.

          
        
Python

import torch

a = torch.zeros(2, 3)
b = torch.ones(2, 3)
c = torch.rand(2, 3)

print(a)
print(b.shape)
print(c.dtype)

In the tensor notebook, I practiced creating tensors using empty, zeros, ones, rand, manual_seed and tensor. I also practiced tensor shapes, data types, mathematical operations, reduction operations, comparison functions, in-place operations, copying, reshaping and NumPy conversion.

What clicked: tensors are the language of PyTorch. If I understand shapes and tensor operations, debugging neural networks becomes much easier.

03 CPU and GPU Tensors

Another important part was understanding device movement. PyTorch can run tensors and models on CPU or GPU. But the tensor and model must be on the same device. Otherwise, errors happen.

          
        
Python

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

x = torch.rand(3, 4)
x = x.to(device)

print(device)
print(x.device)

Practical mistake to avoid: do not keep the model on GPU and the input batch on CPU. Both should be moved to the same device before the forward pass.

04 Autograd - How Gradients Are Calculated

Autograd was one of the most important concepts. Earlier, I learned derivatives manually. But in a neural network, there are many parameters and many operations. Manually writing every derivative is not practical.

PyTorch autograd tracks operations on tensors when requires_grad=True. During backward(), it calculates gradients using the computation graph.

          
        
Python

x = torch.tensor(3.0, requires_grad=True)

y = x ** 2
y.backward()

print(x.grad)

I also tested chained operations like square, sine and exponential. This helped me understand the computation graph: forward pass creates values, backward pass calculates how much each tracked variable affected the final output.

What clicked: gradients are not stored for every intermediate tensor by default. The main trainable parameters are the important ones.

05 A Training Pipeline from Scratch

After tensors and autograd, I moved to the training pipeline. In one notebook, I used a breast cancer dataset and first built the process manually. I converted NumPy arrays into PyTorch tensors, created weights and bias with requires_grad=True, wrote a forward function, calculated loss and updated parameters.

Data features and labels

Tensor convert and scale

Forward predict output

Loss measure error

Update backward and optimizer

          
        
Python

class ANNModel:
    def __init__(self, X):
        self.W = torch.rand(X.shape[1], 1, requires_grad=True)
        self.b = torch.rand(1, requires_grad=True)

    def forward(self, X_train):
        z = torch.matmul(X_train, self.W) + self.b
        return torch.sigmoid(z)

This was helpful because I could see the basic logic of a neural network without hiding everything behind built-in classes.

06 Moving to nn.Module

After writing the pipeline manually, I moved to torch.nn. This is where the code started looking like real deep learning code. I learned that nn.Module is the base class for neural network models. The model has an __init__ method for layers and a forward method for the forward pass.

          
        
Python

import torch.nn as nn

class Model(nn.Module):
    def __init__(self, num_features):
        super().__init__()
        self.linear = nn.Linear(num_features, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, features):
        out = self.linear(features)
        out = self.sigmoid(out)
        return out

Then I added a hidden layer with ReLU. After that, I learned nn.Sequential, which makes simple feed-forward networks cleaner when layers are arranged one after another.

My takeaway: manual code explains the logic, but nn.Module gives a clean structure for real model building.

07 Dataset and DataLoader

Before learning DataLoader, I was thinking of training as looping over the full dataset. But full batch training can be memory inefficient and slow. The notebook made mini-batch training clearer.

The problem with manually slicing data is that it does not give a standard interface for transformations, shuffling, sampling, batching or parallel loading. PyTorch solves this using Dataset and DataLoader.

          
        
Python

from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
    def __init__(self, features, labels):
        self.features = features
        self.labels = labels

    def __len__(self):
        return len(self.features)

    def __getitem__(self, index):
        return self.features[index], self.labels[index]

loader = DataLoader(dataset, batch_size=32, shuffle=True)

Dataset

stores how data is accessed
defines length of data
returns one sample at a time
can include transformations

DataLoader

creates batches
handles shuffling
supports parallel loading
feeds batches into training loop

08 Building an ANN on GPU

The main implementation part was building an Artificial Neural Network and training it on GPU. I used a Fashion-MNIST style dataset where each image becomes a flattened input vector and the model predicts one of ten classes.

The first model had an input layer, two hidden layers and an output layer. The hidden layers used ReLU, and the final layer produced logits for ten classes.

          
        
Python

class MyNN(nn.Module):
    def __init__(self, num_features):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(num_features, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        return self.model(x)

For multiclass classification, I used CrossEntropyLoss. For optimization, I used SGD first. I moved the model and batches to the selected device.

          
        
Python

model = MyNN(X_train.shape[1]).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

Important detail: with CrossEntropyLoss, I do not need to manually apply softmax inside the model.

09 Training Loop and Evaluation

The training loop is where all PyTorch pieces connect. Each epoch processes batches. For every batch, I send data to device, run forward pass, calculate loss, clear old gradients, run backward pass and update parameters using the optimizer.

          
        
Python

for epoch in range(epochs):
    model.train()

    for batch_features, batch_labels in train_loader:
        batch_features = batch_features.to(device)
        batch_labels = batch_labels.to(device)

        outputs = model(batch_features)
        loss = criterion(outputs, batch_labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

The evaluation part uses model.eval() and torch.no_grad(). This matters because evaluation should not track gradients.

What clicked: training mode and evaluation mode are different. Some layers like dropout and batch normalization behave differently during training and testing.

10 Overfitting I Noticed

In the first ANN experiment, I saw that train accuracy became very high, around 99 percent, but test accuracy stayed around 82 percent. That was a clear sign of overfitting.

This was a useful mistake because it showed me that a model can memorize training data without generalizing well to unseen data.

Overfitting Signal

training accuracy is very high
test accuracy is much lower
model memorizes training patterns
generalization is weak

Possible Fixes

add more data
reduce model complexity
use dropout
use weight decay
try batch normalization
tune hyperparameters

My notebook lesson: high training accuracy alone is not success. The test result tells whether the model learned useful patterns or only memorized.

11 Improving the ANN

In the improved ANN notebook, I added BatchNorm1d, Dropout and weight_decay. This was my first proper attempt to reduce overfitting using regularization.

          
        
Python

class MyNN(nn.Module):
    def __init__(self, num_features):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(num_features, 128),
            nn.BatchNorm1d(128),
            nn.ReLU(),
            nn.Dropout(p=0.3),
            nn.Linear(128, 64),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.Dropout(p=0.3),
            nn.Linear(64, 10)
        )

Even after improvement, the notebook showed that overfitting was still present. But that itself was a good learning point. Overfitting is not always fixed by adding one trick. It may require better architecture, better data, better search space and better tuning.

My understanding: dropout randomly reduces dependency on specific neurons during training, while weight decay adds L2 regularization through the optimizer.

PyTorch ANN improvement showing dropout batch normalization weight decay and Optuna hyperparameter tuning — Overfitting forced me to think beyond accuracy and experiment with dropout, batch normalization, weight decay and hyperparameter tuning.

12 Hyperparameter Tuning with Optuna

The last notebook made one question very clear: why did I choose two hidden layers, 128 neurons, this learning rate, this batch size and this dropout value? Earlier, I was choosing these numbers manually. But there is no single fixed answer.

This is why hyperparameter tuning matters. I explored Grid Search, Random Search and Bayesian Search using Optuna. In my notebook, I used Optuna to tune values such as number of hidden layers, neurons per layer, epochs, optimizer, learning rate, batch size, dropout rate and weight decay.

          
        
Python

def objective(trial):
    num_hidden_layers = trial.suggest_int("num_hidden_layers", 1, 5)
    neurons = trial.suggest_int("neurons_per_layer", 8, 128, step=8)
    lr = trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True)
    dropout = trial.suggest_float("dropout_rate", 0.1, 0.5, step=0.1)
    batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])

    # build model, train it and return validation accuracy
    return validation_accuracy

What clicked: hyperparameter tuning turns guessing into a more systematic experiment.

13 My Final Understanding

My final understanding is that PyTorch is not only a library for writing neural networks. It is a full deep learning workflow. The model is only one part. The real system includes data loading, tensor shapes, device management, loss function, optimizer, autograd, training mode, evaluation mode and tuning.

Tensors are the base

Data, weights, outputs and gradients all live as tensors in PyTorch.

Autograd handles gradients

It tracks the computation graph and calculates gradients during backward pass.

DataLoader makes batching practical

It helps with batching, shuffling and feeding data into the training loop.

Overfitting needs experiments

Dropout, batch normalization, weight decay and tuning are tools, not magic buttons.

14 GitHub Notebook Connection

This blog explains what I understood from my PyTorch and ANN notebooks. The implementation side is connected to the NLP by Vinod GitHub repository.

NLP by Vinod GitHub Repository

Notebook references: 00_basic.ipynb, 01_tensors.ipynb, 02_autograd.ipynb, 03_training_pipeline.ipynb, 04_dataset_dataloader.ipynb, 05_ANN_training_on_GPU.ipynb, 06_ANN_training_on_GPU_improved.ipynb, and 07_ANN_Hyperparameter_tuning.ipynb.

Open the GitHub repository

15 Related Reading

NLP learning roadmap

The roadmap that connects this PyTorch topic to the complete NLP by Vinod journey.

NLP libraries

The previous topic where I explored NLTK, spaCy, TextBlob and Stanza before moving into deep learning.

word embeddings in NLP

The earlier topic where text representation moved from sparse features to dense vectors.

16 What Comes Next in the NLP Journey

The next topic is RNN. After learning PyTorch basics and ANN training, I am ready to move into sequence models, where order matters and text is processed step by step.

RNN basics

How recurrent networks process sequences one step at a time.

Text as sequence

How tokens or embeddings become ordered inputs for sequence models.

LSTM and GRU later

How improved recurrent models solve some limitations of simple RNNs.

PyTorch Deep Learning Tensors Autograd ANN Hyperparameter Tuning

Search This Blog

Vinod Codes | AI Engineering & Data Science

A structured public journey from NLP fundamentals to real-world AI systems.