CIFAR-10 Image Diffusion Model

Project Overview

This project implements a diffusion model for image generation trained using the CIFAR-10 dataset. Diffusion models are a class of generative models that learn to gradually denoise random noise into coherent images, and have become the foundation for state-of-the-art image generation systems like DALL-E and Stable Diffusion.

The model was developed as the final project for my Machine Learning class at UNC Charlotte. It is based on Hugging Face's Diffusion Course and demonstrates the core concepts of diffusion-based image generation on a smaller scale, generating 32x32 pixel images across 10 object categories (airplanes, cars, birds, cats, dogs, etc.).

Courses I followed during the development of this project:

Hugging Face Diffusion Models Course

NVIDIA Generative AI with Diffusion Models

Try out the Image Generator below, or open directly on Hugging Face →

Loading interactive demo... This may take a moment if the Space is starting up.

How Diffusion Models Work

Diffusion models work through a two-phase process:

Forward Diffusion - Gradually add noise to training images over many steps until they become pure random noise
Reverse Diffusion - Train a neural network to reverse this process, learning to denoise images step by step

Once trained, the model can start with random noise and iteratively denoise it to generate new, realistic images that resemble the training data.

Technical Implementation

Model Architecture

UNet Architecture - Four down-sampling and up-sampling blocks with skip connections
Framework - PyTorch for model implementation and training
Image Size - 32x32 pixels (CIFAR-10 resolution)
Training - 200 epochs on the full CIFAR-10 dataset

Dataset

The CIFAR-10 dataset consists of 60,000 32x32 color images across 10 classes:

Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck

Results & Observations

The generated images are of mixed quality but are generally recognizable as CIFAR-10 images. Interestingly, the "car" and "truck" classes perform the best, likely due to the more rigid and predictable structure of these objects compared to organic subjects like animals.

Future Improvements

Experiment with larger, higher resolution datasets for better image quality
Integrate CLIP for text-to-image generation capabilities

Technologies Used

Python PyTorch Diffusion Models UNet Gradio Hugging Face CIFAR-10

Project Links

View on GitHub →

Try on Hugging Face →