Skip to main content

Simple PyTorch Project

Overview

This guide will walk you through a very simple PyTorch training pipeline. Accompanying code for this article can be found here:
https://git.arts.ac.uk/ipavlov/WikiMisc/blob/main/SimpleCNN.ipynb

Loading Libraries

Every Python project starts by loading all the relevant libraries. In our case, the code for that is:

import torch
from glob import glob
import cv2
import albumentations
from albumentations.pytorch import ToTensorV2
from torch.utils.data import Dataset, DataLoader

Reading the Dataset

For this example we will use the MNIST dataset, containing 70,000 images of handwritten digits from 0 to 9. The specific version of MNIST used in this example can be found here: https://www.kaggle.com/datasets/alexanderyyy/mnist-png
Download the dataset and unpack it in the same directory your jupyter notebook is located. Unpacked dataset will consist of train and test folders containing images for model training and evaluation. Both test and train folders will have 10 sub-folders for each digit.

Python_dowload


To train our model, we need to know the file names and labels of all images in the dataset. A simple way to do this is demonstrated in the code bellow:

def readMnist(folder):
    filenames = [] #List for image filenames
    labels = [] #List for image labels
    
    folderNameLen = len(folder) 
    
    #Reads all the filenames in a given folder recursively
    for filename in glob(folder + '/**/*.png', recursive=True): 
        filenames += [filename]
        #Get the label of the image from it’s filepath
        labels += [int(filename[folderNameLen:folderNameLen+1])]         
    return filenames, labels

trainFiles, trainLabels = readMnist('./mnist_png/train/')
testFiles, testLabels = readMnist('./mnist_png/test/')

Note: Different datasets will require different approaches.

Dataset Class

The Dataset class will provide necessary functionality to our training and evaluation pipeline, like loading images and labels, image transformations, and others.

class MnistDataset(Dataset):
    def __init__(self, filepaths, labels, transform):
        self.labels = labels
        self.filepaths = filepaths
        self.transform = transform
        
    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        image = cv2.imread(self.filepaths[idx], 0)
        h,w = image.shape
            
        image = self.transform(image=image)["image"]/255.
        label = self.labels[idx]
        
        return image, label

# Usually transformations would include data augmentation tricks, 
# but for this example we will limit ourselves to just converting image data from
# NumPy array to PyTorch tensor.
transform = albumentations.Compose(
    [
        ToTensorV2()
    ]
)   

To allow for mini-batch use we need to introduce a DataLoader to our pipeline.

#Instantiate Dataset objects for train and test datasets.
trainDataset = MnistDataset(trainFiles, trainLabels, transform)
testDataset = MnistDataset(testFiles, testLabels, transform)

Model Class

Our model is classifying the input images between 10 different classes. Below is the code for our simple convolutional neural network. The comments in the code will provide additional explanation.

class CNN(torch.nn.Module):
    # Inside of __init__ we define the structure of our neural network.
    # Thinks of this as a collection of all potential layers and modules 
    # that we will use during the feedforward process.
    def __init__(self):
        super().__init__() #Needed to initialize torch.nn.Module correctly
        
        # Our first convolutional block. torch.nn.Sequential is container
        # that will execute modules inside of it sequantialy. 
        # This convolutional block consists of a simple convolutional layer,
        # ReLU activation functions, and Max Pooling operation.
        self.conv1 = torch.nn.Sequential
          (torch.nn.Conv2d(
               in_channels=1,             
               out_channels=16,           
               kernel_size=5,             
               stride=1,                  
               padding=2,                 
           ),                             
           torch.nn.ReLU(),                     
           torch.nn.MaxPool2d(kernel_size=2),   
        )
        # Our second onvolutional block.
        self.conv2 = torch.nn.Sequential(        
           torch.nn.Conv2d(16, 32, 5, 1, 2),    
           torch.nn.ReLU(),                     
           torch.nn.MaxPool2d(2),               
        )
        # Fully connected layer that outputs 10 classes
        self.out = torch.nn.Linear(32 * 7 * 7, 10)
    
    # forward is a function which is used for feedforward operation of our model.
    # Input arguments are input data for our model. In this case x would be a batch of images from the MNIST dataset.
    # Inside of this function we apply modules we defined in __init__ to input images.
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        #This line flattens tensors from 4 dimenstions to 2.
        x = torch.flatten(x, start_dim=1)     
        output = self.out(x)
        return output

# This line creates an object of our convolutional neural network class. 
# We use .cuda() to send our model to GPU.
model = CNN().cuda()

Training and Validation

Below is the code for our training and validation procedure.

#Here we define cross entropy loss functions, which we will use for loss calculation.
loss_fn = torch.nn.CrossEntropyLoss() 

#This is our optimizer algorithm. In this example we use Stochastic gradient descent
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

num_epochs = 25

#We will execute our training inside of a loop. Each iteration is a new epoch.
for epoch in range(num_epochs):
    print('Epoch:', epoch)
    
    running_loss = 0
    model = model.train() #sets our model to training mode.
    for i, data in enumerate(train_dataloader): #we will iterate over our dataloader to get batched data.
        x, y = data
        #Don't forget to send your images and labels to the same device as your model. In our case it's a GPU.
        x = x.cuda()
        y = y.cuda()

        #Resets gradients
        optimizer.zero_grad()
        #output of our CNN model
        outputs = model(x)
        #Here we calcualte loss values
        loss = loss_fn(outputs, y)
        
        loss.backward() #Backpropagation
        optimizer.step() #Backpropagation
        
        running_loss += loss.item()
    
    print(running_loss/len(train_dataloader)) #average training loss for current epoch
    
    model = model.eval() #sets our model to evaluation mode.
    test_acc = 0
    test_running_loss = 0
    for i, data in enumerate(test_dataloader):
        x, y = data
        x = x.cuda()
        y = y.cuda()

        outputs=model(x)
        loss = loss_fn(outputs, y)

        test_running_loss += loss.item()
        #We apply softmax here to get the probabilities for each class
        probs = torch.nn.functional.softmax(outputs, dim=1)
        #We select the highest probability as our final predication
        pred = torch.argmax(probs, dim=1)
        test_acc += torch.sum(pred == y)

    #Average evaluation loss and evaluation accuracy for this epoch
    print(test_running_loss/len(testDataset), test_acc/len(testDataset))