This project implements image classification models to classify two classes from the MedMNIST dataset: BloodMNIST and BreastMNIST. The classification is performed using two different methodologies: Convolutional Neural Networks (CNN) and Vision Transformers (ViT). The project explores the effectiveness of these models on small-sized datasets while leveraging advanced techniques tailored for performance optimization.
- BloodMNIST: A dataset for classifying blood cell images.
- BreastMNIST: A dataset for classifying breast cancer histopathology images.
Both datasets are part of the MedMNIST project, which offers a collection of common medical image classification datasets.
This project employs two primary methodologies for model training:
-
Convolutional Neural Networks (CNN): Implemented as the baseline model to classify the images from the specified datasets.
-
Vision Transformers (ViT): Two variations are tested:
- Vanila ViT
- Shifted Patch Tockenization with Locality Self Attention (LSA): A technique specified for small-sized datasets to improve classification performance.
- The results show that Vision Transformers outperform CNN models in classification tasks, particularly for the BreastMNIST dataset.
- The Shifted Patch + LSA method yielded the best performance across the models for small datasets tested.
Contributions are welcome! Feel free to open issues or submit pull requests.
This project is licensed under the MIT License. See the LICENSE file for more details.
- MedMNIST for providing the datasets.