Logo

dev-resources.site

for different kinds of informations.

Binary classification with Machine Learning: Neural Networks for classifying Chihuahuas and Muffins

Published at
1/15/2025
Categories
machinelearning
tensorflow
ai
Author
atlantis
Categories
3 categories in total
machinelearning
open
tensorflow
open
ai
open
Author
8 person written this
atlantis
open
Binary classification with Machine Learning: Neural Networks for classifying Chihuahuas and Muffins

[Article by Elia Togni]

Introduction to Image Classification

Image Classification is one of the most fundamental and studied topics in the field of machine learning. It refers to the ability to understand and categorize an image as a whole under a specific label.

In image classification, the goal is to predict the class to which an input image belongs. While humans excel at this task, mimicking human perception is challenging for machines. Traditional computer vision techniques rely on local descriptors (algorithms or methods that capture information about the local appearance or features of specific regions in an image) to find similarities between images. However, advances in technology have shifted the focus toward Deep Learning, which automatically extracts representative features and patterns from images.

One of the most prominent tools in modern image classification is the Convolutional Neural Network (CNN), a specialized class of neural networks designed for visual data analysis. CNNs are characterized by their ability to apply the convolution operation to extract features from small portions of input images, making them ideal for tasks such as image recognition, object detection, and more.

Chihuahuas vs. Muffins

This article explores the challenge of binary classification, which involves predicting one of two possible outcomes. Specifically, the task is to classify images as either chihuahuas or muffins. While this might seem trivial for humans, it poses a unique challenge for machine learning models due to the visual similarities between the two categories.

We compare the performance of a Multi-Layer Neural Network (MLNN) and a Convolutional Neural Network (CNN), focusing on their suitability for image classification and their ability to generalize well to unseen data.

Dataset Overview

This project is based on the Chihuahua vs. Muffin dataset from Kaggle. The dataset consists of two folders, each containing images of the two classesβ€”one for training and one for testing. After combining and inspecting these folders, the dataset was found to include:

  • 3199 chihuahua images
  • 2718 muffin images

Data Cleaning

A manual inspection of the dataset revealed several issues, such as mislabeled or irrelevant images. These images were classified into:

  • Correctly labeled chihuahua images
  • Incorrectly labeled images, such as muffins in the chihuahua folder
  • Unrelated images, such as drawings or entirely unrelated objects

Some examples of images removed include:

Chihuahua Visual Inspection

Muffin Visual Inspection

After cleaning, the dataset was restructured to ensure only relevant images for each class were included. Images containing both chihuahuas and muffins or completely unrelated content were discarded.

Addressing Data Imbalance

Classifiers perform best when the dataset is balanced. To assess balance, the Imbalance Ratio was calculated:

Imbalance Ratio ρ=Number of Instances in Majority ClassNumber of Instances in Minority Class\text{Imbalance Ratio } \rho = \frac{\text{Number of Instances in Majority Class}}{\text{Number of Instances in Minority Class}}

For this dataset, ρ=1.22\rho = 1.22, which is sufficiently close to 1, indicating no significant imbalance.

Preprocessing

Preprocessing Pipeline

The following steps were applied to preprocess the images:

  1. Image Resizing: All images were resized to $128 \times 128$ to ensure uniform input dimensions.
  2. Cropping and Padding: Large white borders were removed to help the model focus on relevant features. Zero-padding was used to maintain aspect ratios.
  3. Normalization: Pixel values were scaled to $[0, 1]$ to improve convergence.
  4. Data Augmentation: Techniques such as rotation, flipping, and brightness adjustments were used to increase dataset variability and reduce overfitting.
  5. Segmentation: Simple Linear Iterative Clustering (SLIC) was employed to isolate key regions in the image, removing irrelevant details.

Preprocessing Pipeline

Chihuahua Pre and Post Processing

Dataset Variants

From the preprocessing steps, three dataset variants were created:

  1. Original RGB Images
  2. Augmented RGB Images
  3. Segmented RGB Images

Models

Multi-Layer Neural Network (MLNN)

The MLNN model consists of fully connected dense layers, batch normalization, dropout, and activation functions. It was configured to test various architectures:

Code Implementation

from tensorflow.keras import layers, models, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import BinaryAccuracy, Precision, Recall, AUC

def MLNN_model(input_size, num_classes, hidden_layer_units, hidden_activation, 
                output_activation, dropout_perc, n_channels, loss, learning_rate=0.001):
    """
    Builds a Multi-Layer Neural Network (MLNN).
    """
    input_layer = Input(shape=(input_size[0], input_size[1], n_channels))
    flatten_layer = layers.Flatten()(input_layer)
    hidden_layer = layers.Rescaling(1./255)(flatten_layer)

    for units in hidden_layer_units:
        hidden_layer = layers.BatchNormalization()(hidden_layer)
        hidden_layer = layers.Activation(hidden_activation)(hidden_layer)
        hidden_layer = layers.Dropout(dropout_perc)(hidden_layer)
        hidden_layer = layers.Dense(units, activation=hidden_activation)(hidden_layer)

    output_layer = layers.Dense(1, activation=output_activation)(hidden_layer)
    model = models.Model(inputs=input_layer, outputs=output_layer)
    model.compile(optimizer=Adam(learning_rate=learning_rate), loss=loss, metrics=[BinaryAccuracy(), Precision(), Recall(), AUC()])

    return model
Enter fullscreen mode Exit fullscreen mode

MLNN Training Summary

MLNN Summary

Convolutional Neural Network (CNN)

CNNs are designed to handle spatial data and are more effective for image classification. This baseline CNN includes convolutional layers, pooling layers, and a dense layer for classification.

Code Implementation

from tensorflow.keras import layers, models

def CNN_model(input_size=(128, 128), n_channels=3, conv_filters=[32, 64, 128], kernel_size=(3, 3),
              pool_size=(2, 2), dense_units=128, output_activation='sigmoid', loss='binary_crossentropy',
              learning_rate=0.001):
    """
    Builds a Convolutional Neural Network (CNN).
    """
    model = models.Sequential()
    model.add(layers.Rescaling(1./255, input_shape=(input_size[0], input_size[1], n_channels)))

    for filters in conv_filters:
        model.add(layers.Conv2D(filters, kernel_size, activation='relu'))
        model.add(layers.MaxPooling2D(pool_size))

    model.add(layers.Flatten())
    model.add(layers.Dense(dense_units, activation='relu'))
    model.add(layers.Dense(1, activation=output_activation))
    model.compile(optimizer=Adam(learning_rate=learning_rate), loss=loss, metrics=[BinaryAccuracy(), Precision(), Recall(), AUC()])

    return model
Enter fullscreen mode Exit fullscreen mode

CNN Training Summary

CNN Summary

TogNet

TogNet is a custom CNN designed to address overfitting. Dropout layers were added to improve generalization.

Code Implementation

from tensorflow.keras import layers, models

def TogNet_model(input_size=(128, 128), n_channels=3, conv_filters=[32, 64, 64], kernel_size=(3, 3),
                 pool_size=(2, 2), dense_units=128, hidden_activation='relu', output_activation='sigmoid',
                 loss='binary_crossentropy', learning_rate=0.001, dropout_rate=0.2):
    """
    Builds the TogNet CNN.
    """
    model = models.Sequential()
    model.add(layers.Rescaling(1./255, input_shape=(input_size[0], input_size[1], n_channels)))

    for filters in conv_filters:
        model.add(layers.Conv2D(filters, kernel_size, activation=hidden_activation))
        model.add(layers.MaxPooling2D(pool_size))
        model.add(layers.Dropout(dropout_rate))

    model.add(layers.Flatten())
    model.add(layers.Dense(dense_units, activation=hidden_activation))
    model.add(layers.Dropout(dropout_rate))
    model.add(layers.Dense(1, activation=output_activation))

    model.compile(optimizer=Adam(learning_rate=learning_rate), loss=loss, metrics=[BinaryAccuracy(), Precision(), Recall(), AUC()])

    return model
Enter fullscreen mode Exit fullscreen mode

TogNet Training Summary

TogNet Summary

Results

Performance Metrics

Model Dataset Binary Accuracy Precision Recall AUC
MLNN 512_256_128 RGB 80.38% 78.02% 80.08% 88.01%
Augmented 82.92% 89.06% 80.95% 91.54%
CNN 32_64_128 RGB 92.88% 91.73% 95.48% 97.01%
Augmented 93.61% 94.18% 92.27% 96.60%
TogNet RGB 92.88% 92.45% 94.36% 97.37%
Augmented 91.53% 89.40% 92.86% 97.03%

Comparative Visualizations

Loss and Accuracy Trends for TogNet

Loss and Accuracy Trends

Metrics Comparisons for MLNN Variants

Metrics Comparisons

Conclusion

This study demonstrates that CNNs, particularly the custom TogNet architecture, outperform MLNNs in binary image classification. Data augmentation improved model performance, while segmentation showed mixed results. Future work will explore advanced architectures and larger datasets to further enhance classification accuracy.

All the code is available on GitHub.

ai Article's
30 articles in total
Artificial Intelligence (AI) focuses on creating intelligent systems that can simulate human thinking and automate decision-making processes.
Favicon
Join us for the Agent.ai Challenge: $10,000 in Prizes!
Favicon
Can I build & market a SaaS app to $100 in 1 month?
Favicon
🚨 The Dangers of Developers Relying Exclusively on AI Without Understanding Fundamental Concepts
Favicon
Announcing Powerful Devs Conference + Hack Together 2025
Favicon
Build Your First AI Application Using LlamaIndex!
Favicon
Designing for developers means designing for LLMs too
Favicon
Daily.dev's unethical software design
Favicon
Teaching Large Language Models (LLMs) to do Math Correctly
Favicon
Boost Your Productivity with Momentum Builder: A Web App to Overcome Procrastination and Track Progress
Favicon
When AI Fails, Good Documentation Saves the Day πŸ€–πŸ“š
Favicon
The Language Server Protocol - Building DBChat (Part 5)
Favicon
πŸ“βœ¨ClearText
Favicon
Habit Tracker: A Web Application to Track Your Daily Habits
Favicon
Impostor syndrome website: Copilot 1-Day Build Challenge
Favicon
How to Get the Most out of Cursor
Favicon
The Frontier of Visual AI in Medical Imaging
Favicon
5 cool things from CES for Amazon Developers (plus 4 more!)
Favicon
5 Free AI Design Tools For Designers!
Favicon
Artificial Neurons: The Heart of AI
Favicon
Amazon Product Finder
Favicon
3D models from images with local AI
Favicon
How RAG works? Retrieval Augmented Generation Explained
Favicon
The AI Advantage: Enhancing Customer Engagement through CRM
Favicon
AI-Powered Lockscreen: The Future of Mobile Intelligence Is Already Here
Favicon
POST ABOUT AI'S INCREASING INFLUENCE IN CODING
Favicon
Evolution By Sound
Favicon
Binary classification with Machine Learning: Neural Networks for classifying Chihuahuas and Muffins
Favicon
Weekly Planner - API
Favicon
Flow Networks Breakthrough: New Theory Shows Promise for Machine Learning Structure Discovery
Favicon
Breakthrough: Privacy-First AI Splits Tasks Across Devices to Match Central Model Performance

Featured ones: