PyTorch for Deep Learning - From Tensors to ANN Training
PyTorch for Deep Learning - From Tensors to ANN Training.
After NLP libraries, I moved into deep learning foundations and learned how PyTorch helps build neural networks using tensors, autograd, training loops, DataLoader, GPU training and hyperparameter tuning.
PyTorch for deep learning became the first practical step for me before going into RNNs and modern NLP architectures. Until now, I was mostly working with preprocessing, embeddings and NLP libraries. But to understand deep learning for NLP properly, I needed to understand how neural networks are actually built, trained, improved and debugged.
My rough understanding is this: PyTorch is one of the best libraries for learning and coding deep learning because it feels close to normal Python while still giving powerful tools for tensors, automatic differentiation, GPU training and neural network building. In these notebooks, I started from basics, moved to tensors, learned autograd, wrote a training pipeline, used Dataset and DataLoader, trained an ANN on GPU, noticed overfitting and then tried improvements and hyperparameter tuning.
This topic was important because I did not want deep learning to feel like a black box. I wanted to see how data becomes tensors, how forward pass works, how loss is calculated, how gradients are created, and how optimizer updates the model parameters.
PyTorch is not only about writing a model. It teaches the full training workflow: tensor, forward pass, loss, backward pass, optimizer step and evaluation.
01 Why PyTorch Matters Before RNNs
The next topic in my roadmap is RNN. But before studying sequence models, I need the base deep learning workflow. RNNs are also neural networks. They still use tensors, gradients, loss functions, optimizers and training loops. So learning PyTorch first makes the next topics less confusing.
What PyTorch Gives
- tensor operations
- automatic gradients
- neural network modules
- optimizers and loss functions
- GPU acceleration
- custom datasets and batching
Why It Helps NLP
- embeddings become tensors
- text batches need DataLoader
- sequence models need training loops
- transformers are built using the same ideas
- experiments need tuning and debugging
02 Tensors - The Basic Data Structure
The first serious concept I learned was tensors. A tensor is like a specialized multi-dimensional array designed for mathematical operations. In deep learning, almost everything becomes a tensor: input data, labels, weights, bias values, gradients and model outputs.
A scalar is a zero-dimensional tensor, a vector is one-dimensional, a matrix is two-dimensional, and images or batches can become higher-dimensional tensors.
import torch
a = torch.zeros(2, 3)
b = torch.ones(2, 3)
c = torch.rand(2, 3)
print(a)
print(b.shape)
print(c.dtype)
In the tensor notebook, I practiced creating tensors using empty, zeros, ones, rand, manual_seed and tensor. I also practiced tensor shapes, data types, mathematical operations, reduction operations, comparison functions, in-place operations, copying, reshaping and NumPy conversion.
03 CPU and GPU Tensors
Another important part was understanding device movement. PyTorch can run tensors and models on CPU or GPU. But the tensor and model must be on the same device. Otherwise, errors happen.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.rand(3, 4)
x = x.to(device)
print(device)
print(x.device)
04 Autograd - How Gradients Are Calculated
Autograd was one of the most important concepts. Earlier, I learned derivatives manually. But in a neural network, there are many parameters and many operations. Manually writing every derivative is not practical.
PyTorch autograd tracks operations on tensors when requires_grad=True. During backward(), it calculates gradients using the computation graph.
x = torch.tensor(3.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad)
I also tested chained operations like square, sine and exponential. This helped me understand the computation graph: forward pass creates values, backward pass calculates how much each tracked variable affected the final output.
05 A Training Pipeline from Scratch
After tensors and autograd, I moved to the training pipeline. In one notebook, I used a breast cancer dataset and first built the process manually. I converted NumPy arrays into PyTorch tensors, created weights and bias with requires_grad=True, wrote a forward function, calculated loss and updated parameters.
class ANNModel:
def __init__(self, X):
self.W = torch.rand(X.shape[1], 1, requires_grad=True)
self.b = torch.rand(1, requires_grad=True)
def forward(self, X_train):
z = torch.matmul(X_train, self.W) + self.b
return torch.sigmoid(z)
This was helpful because I could see the basic logic of a neural network without hiding everything behind built-in classes.
06 Moving to nn.Module
After writing the pipeline manually, I moved to torch.nn. This is where the code started looking like real deep learning code. I learned that nn.Module is the base class for neural network models. The model has an __init__ method for layers and a forward method for the forward pass.
import torch.nn as nn
class Model(nn.Module):
def __init__(self, num_features):
super().__init__()
self.linear = nn.Linear(num_features, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, features):
out = self.linear(features)
out = self.sigmoid(out)
return out
Then I added a hidden layer with ReLU. After that, I learned nn.Sequential, which makes simple feed-forward networks cleaner when layers are arranged one after another.
nn.Module gives a clean structure for real model building.
07 Dataset and DataLoader
Before learning DataLoader, I was thinking of training as looping over the full dataset. But full batch training can be memory inefficient and slow. The notebook made mini-batch training clearer.
The problem with manually slicing data is that it does not give a standard interface for transformations, shuffling, sampling, batching or parallel loading. PyTorch solves this using Dataset and DataLoader.
from torch.utils.data import Dataset, DataLoader
class CustomDataset(Dataset):
def __init__(self, features, labels):
self.features = features
self.labels = labels
def __len__(self):
return len(self.features)
def __getitem__(self, index):
return self.features[index], self.labels[index]
loader = DataLoader(dataset, batch_size=32, shuffle=True)
Dataset
- stores how data is accessed
- defines length of data
- returns one sample at a time
- can include transformations
DataLoader
- creates batches
- handles shuffling
- supports parallel loading
- feeds batches into training loop
08 Building an ANN on GPU
The main implementation part was building an Artificial Neural Network and training it on GPU. I used a Fashion-MNIST style dataset where each image becomes a flattened input vector and the model predicts one of ten classes.
The first model had an input layer, two hidden layers and an output layer. The hidden layers used ReLU, and the final layer produced logits for ten classes.
class MyNN(nn.Module):
def __init__(self, num_features):
super().__init__()
self.model = nn.Sequential(
nn.Linear(num_features, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10)
)
def forward(self, x):
return self.model(x)
For multiclass classification, I used CrossEntropyLoss. For optimization, I used SGD first. I moved the model and batches to the selected device.
model = MyNN(X_train.shape[1]).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
CrossEntropyLoss, I do not need to manually apply softmax inside the model.
09 Training Loop and Evaluation
The training loop is where all PyTorch pieces connect. Each epoch processes batches. For every batch, I send data to device, run forward pass, calculate loss, clear old gradients, run backward pass and update parameters using the optimizer.
for epoch in range(epochs):
model.train()
for batch_features, batch_labels in train_loader:
batch_features = batch_features.to(device)
batch_labels = batch_labels.to(device)
outputs = model(batch_features)
loss = criterion(outputs, batch_labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
The evaluation part uses model.eval() and torch.no_grad(). This matters because evaluation should not track gradients.
10 Overfitting I Noticed
In the first ANN experiment, I saw that train accuracy became very high, around 99 percent, but test accuracy stayed around 82 percent. That was a clear sign of overfitting.
This was a useful mistake because it showed me that a model can memorize training data without generalizing well to unseen data.
Overfitting Signal
- training accuracy is very high
- test accuracy is much lower
- model memorizes training patterns
- generalization is weak
Possible Fixes
- add more data
- reduce model complexity
- use dropout
- use weight decay
- try batch normalization
- tune hyperparameters
11 Improving the ANN
In the improved ANN notebook, I added BatchNorm1d, Dropout and weight_decay. This was my first proper attempt to reduce overfitting using regularization.
class MyNN(nn.Module):
def __init__(self, num_features):
super().__init__()
self.model = nn.Sequential(
nn.Linear(num_features, 128),
nn.BatchNorm1d(128),
nn.ReLU(),
nn.Dropout(p=0.3),
nn.Linear(128, 64),
nn.BatchNorm1d(64),
nn.ReLU(),
nn.Dropout(p=0.3),
nn.Linear(64, 10)
)
Even after improvement, the notebook showed that overfitting was still present. But that itself was a good learning point. Overfitting is not always fixed by adding one trick. It may require better architecture, better data, better search space and better tuning.
12 Hyperparameter Tuning with Optuna
The last notebook made one question very clear: why did I choose two hidden layers, 128 neurons, this learning rate, this batch size and this dropout value? Earlier, I was choosing these numbers manually. But there is no single fixed answer.
This is why hyperparameter tuning matters. I explored Grid Search, Random Search and Bayesian Search using Optuna. In my notebook, I used Optuna to tune values such as number of hidden layers, neurons per layer, epochs, optimizer, learning rate, batch size, dropout rate and weight decay.
def objective(trial):
num_hidden_layers = trial.suggest_int("num_hidden_layers", 1, 5)
neurons = trial.suggest_int("neurons_per_layer", 8, 128, step=8)
lr = trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True)
dropout = trial.suggest_float("dropout_rate", 0.1, 0.5, step=0.1)
batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])
# build model, train it and return validation accuracy
return validation_accuracy
13 My Final Understanding
My final understanding is that PyTorch is not only a library for writing neural networks. It is a full deep learning workflow. The model is only one part. The real system includes data loading, tensor shapes, device management, loss function, optimizer, autograd, training mode, evaluation mode and tuning.
14 GitHub Notebook Connection
This blog explains what I understood from my PyTorch and ANN notebooks. The implementation side is connected to the NLP by Vinod GitHub repository.
NLP by Vinod GitHub Repository
Notebook references: 00_basic.ipynb, 01_tensors.ipynb, 02_autograd.ipynb, 03_training_pipeline.ipynb, 04_dataset_dataloader.ipynb, 05_ANN_training_on_GPU.ipynb, 06_ANN_training_on_GPU_improved.ipynb, and 07_ANN_Hyperparameter_tuning.ipynb.
16 What Comes Next in the NLP Journey
The next topic is RNN. After learning PyTorch basics and ANN training, I am ready to move into sequence models, where order matters and text is processed step by step.
Comments
Post a Comment