Convolutional Neural Networks (CNNs) have become the standard for tasks involving image recognition because of their ability to automatically detect important features in visual data. They are widely used in applications such as object detection, facial recognition, and self-driving cars.
Starting with the most fundamental task of digit recognition using the MNIST dataset which is considered as the “Hello World” dataset of computer vision, this project aims to develop a strong understanding of CNNs in preparation for more complex image recognition tasks in the future.
Achieved the 84th on Kaggle!
Rundown
Import Datasets
The datasets are downloaded from the Digit Recognizer competition on Kaggle.
1 2
train = pd.read_csv('digit-recognizer/train.csv') test = pd.read_csv('digit-recognizer/test.csv')
Dataset and Data Loader
The Dataset and DataLoader classes encapsulate the process of pulling your data from storage and exposing it to your training loop in batches.
The Dataset is responsible for accessing and processing single instances of data.
The DataLoader pulls instances of data from the Dataset (either automatically or with a sampler that you define), collects them in batches, and returns them for consumption by your training loop. The DataLoader works with all kinds of datasets, regardless of the type of data they contain.
Define 2 Dataset classes, transformations, and normalizations.
Note that the data transformation or augmentation as implemented here does not create new data permanently in the dataset. Instead, it applies the transformations on-the-fly during training. Each time a batch of data is requested from the DataLoader, the transformations are applied to that batch dynamically. They don’t take up additional memory or storage.
Visualization
Visualize the data as a sanity check.
1 2 3 4 5 6 7 8
plt.figure(figsize=(15, 6)) for i in range(30): plt.subplot(3, 10, i+1) plt.imshow(training_set.features[i].reshape((28, 28)), cmap=plt.cm.binary) plt.title(training_set.labels[i]) plt.axis('off') plt.subplots_adjust(wspace=-0.1, hspace=-0.1) plt.show()
Below is a function to reset model weights for the KFold.
1 2 3 4
def reset_weights(m): for layer in m.children(): if hasattr(layer, 'reset_parameters'): layer.reset_parameters()
Structure
KFold from scikit-learn is used to avoid overfitting. The training set is shuffled and split into 5 folds, 80% and 20% of which will be used to train and validate the model respectively.
Training
Below are the parameters and variables for the training.
training_losses_average = [sum(x)/len(x) for x in zip(*training_losses_list)] validation_losses_average = [sum(x)/len(x) for x in zip(*validation_losses_list)] ax6.plot(training_losses_average, label='training_loss') ax6.plot(validation_losses_average, label='validation_loss') ax6.set_title('Average')
ax3.legend() plt.tight_layout()
Ensemble
I ensemble the 5 models saved from training to make the final predictions on the testing set.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
models = [] for fold in range(0, 5): model = Net().to(device) model.load_state_dict(torch.load(f'model_mnist_{fold+1}.pt', weights_only=False)) model.eval() models.append(model)
pred = np.zeros(280000).reshape(28000, 10) with torch.no_grad(): for model in models: for features in testing_loader: features = features.to(device) outputs = model(features) pred += 0.2 * outputs.cpu().numpy()