CIFAR-10 Image Diffusion Model

Deep Learning Image Generation

Project Overview

This project implements a diffusion model for image generation trained using the CIFAR-10 dataset. Diffusion models are a class of generative models that learn to gradually denoise random noise into coherent images, and have become the foundation for state-of-the-art image generation systems like DALL-E and Stable Diffusion.

The model was developed as the final project for my Machine Learning class at UNC Charlotte. It is based on Hugging Face's Diffusion Course and demonstrates the core concepts of diffusion-based image generation on a smaller scale, generating 32x32 pixel images across 10 object categories (airplanes, cars, birds, cats, dogs, etc.).

Courses I followed during the development of this project:

  • Hugging Face Diffusion Models Course
  • NVIDIA Generative AI with Diffusion Models
  • Loading interactive demo... This may take a moment if the Space is starting up.

    How Diffusion Models Work

    Diffusion models work through a two-phase process:

    • Forward Diffusion - Gradually add noise to training images over many steps until they become pure random noise
    • Reverse Diffusion - Train a neural network to reverse this process, learning to denoise images step by step

    Once trained, the model can start with random noise and iteratively denoise it to generate new, realistic images that resemble the training data.

    Technical Implementation

    Model Architecture

    • UNet Architecture - Four down-sampling and up-sampling blocks with skip connections
    • Framework - PyTorch for model implementation and training
    • Image Size - 32x32 pixels (CIFAR-10 resolution)
    • Training - 200 epochs on the full CIFAR-10 dataset

    Dataset

    The CIFAR-10 dataset consists of 60,000 32x32 color images across 10 classes:

    • Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck

    Results & Observations

    The generated images are of mixed quality but are generally recognizable as CIFAR-10 images. Interestingly, the "car" and "truck" classes perform the best, likely due to the more rigid and predictable structure of these objects compared to organic subjects like animals.

    Future Improvements

    • Experiment with larger, higher resolution datasets for better image quality
    • Integrate CLIP for text-to-image generation capabilities

    Technologies Used

    Python PyTorch Diffusion Models UNet Gradio Hugging Face CIFAR-10