Data First, ML Later: Create Killer Image Datasets with This Guide!

Creating an effective embedded computer vision model starts with understanding the specific business use case you are trying to address. Whether it’s automating quality inspection in a factory, detecting emotions in pets, or monitoring environmental conditions, defining a clear business goal is crucial before diving into dataset creation.

Once the business need is established, building a high-quality dataset becomes the foundation for success. While pre-made datasets exist, collecting your own allows you to tailor the dataset specifically to your unique requirements, ensuring that the model is highly effective for your application. In this guide, we’ll show you how to gather and prepare images using embedded-friendly tools like OpenMV cameras, Raspberry Pi, or even a smartphone.

1. Collecting Images

The first step is to brainstorm the different classes you want to classify. Whether you’re classifying emotions, product categories, or other types of objects, defining the classes should be straightforward. Once you have your classes, start collecting images for your dataset. To build a robust image classifier, you’ll need around 50 images per category (or “class”). For example, you could classify pet emotions such as ‘happy’, ‘anxious’, ‘curious’, and ‘sleepy’.

Make sure that:

  • The images are clear and consistent.
  • The object is centered in the frame.
  • Lighting is even across all images.
  • The background is consistent.

This helps make the training process easier for simpler models. If you’re trying to classify more complex distinctions, like different dog breeds, you’d likely need a larger dataset and a more sophisticated model.

Importance of Background Samples

Adding a “background” class to your dataset is often highly beneficial, especially when building robust classification models. A background class consists of images where none of the target objects are present. For instance, if you are classifying pet emotions, take some images of just the plain background without any pets. This helps the model learn to differentiate between when an emotion is being displayed by a pet and when no pet is present at all. This helps your classifier learn to distinguish between relevant objects and the absence of those objects, reducing the likelihood of false positives.

Including a background class allows the model to understand when the target object is missing from the frame, thereby increasing its accuracy. This approach is particularly useful in scenarios like object detection, where the classifier needs to identify both when an object is present and when it is absent. Without a background class, the model might be forced to classify an empty frame as one of the available target classes, leading to incorrect results. By explicitly training the model with background images, it can make more informed decisions, ultimately improving the reliability and robustness of your system in real-world scenarios where objects may or may not be present.

This approach is also applicable to other contexts, such as detecting emotions in pets or identifying animals in a scene. A well-defined background class provides essential negative examples, ensuring the model can recognize not only what to classify but also when there is nothing to classify.

Using Tools to Collect Images

OpenMV Camera

To capture images with an OpenMV camera, follow these steps for best results:

  1. Install the OpenMV IDE: Begin by downloading and installing the OpenMV IDE on your computer. This software is essential for interacting with the OpenMV camera and writing scripts to capture images.
  2. Connect the OpenMV Camera: Plug the OpenMV camera into your computer using a USB cable. Make sure that all necessary drivers are installed correctly, as this will allow the IDE to communicate with the camera.
  3. Use a microSD Card for Storage: If you are using an OpenMV H7 model, it’s crucial to use a microSD card for storing images since the internal storage is limited. Make sure the microSD card is formatted with the FAT32 file system for compatibility.
  4. Adjust Camera Focus: Depending on the object size and distance, adjust the focus of the camera by gently rotating the lens. The focus adjustment is critical for obtaining clear images that are essential for training.
  5. Run the ImageCapture Script: Open the ImageCapture script provided in the OpenMV IDE. This script is used to automate the process of taking pictures. You may need to modify the script to crop images to the correct size, such as 96×96 pixels or whatever input size your model requires. Ensure consistency in capturing conditions like lighting, background, and distance to improve model accuracy.
  6. Capture Images: Position the object to be captured in front of the camera. The object should occupy a large portion of the frame, and the background should be consistent. Run the script to capture multiple images of the object, ensuring that all the images are similar in terms of lighting and distance.
  7. Transfer Images: After capturing, remove the microSD card from the camera and insert it into your computer to transfer the images. Organize them into folders corresponding to each class for easy processing later.

Smartphone or Webcam

Using a smartphone or webcam is one of the most accessible methods to collect images for your dataset. Here’s how you can make the most of it:

  1. Capture High-Quality Images: Use your smartphone camera or a high-resolution webcam to take pictures of the objects you want to classify. Modern smartphones often have excellent camera capabilities, which can provide clear and sharp images for training.
  2. Maintain Consistent Conditions: To ensure your dataset is of high quality, maintain consistent conditions while capturing photos. Keep the object centered, use a uniform background, and ensure the lighting is consistent across all images. Natural light can be good, but avoid shadows and glare that could impact the quality.
  3. Use Tripod or Stabilizer: If possible, use a tripod or stabilizer to keep the camera steady. This will help maintain a consistent distance and angle, which is important for reducing variation in your dataset.
  4. Edit and Resize Images: After capturing the images, use photo editing software (such as GIMP, Photoshop, or even simple online tools) to crop and resize the images to the appropriate size for your neural network. For instance, resize them to 224×224 if using MobileNet or ResNet. Save the images in a lossless format such as PNG or BMP to retain maximum quality.
  5. Organize Images by Class: Create a well-organized folder structure on your computer, with each class having its own folder. This makes the subsequent stages of processing and training the dataset much easier.

Raspberry Pi

If you have a Raspberry Pi with a camera module, it can be an excellent option for capturing images. Follow these steps for using Raspberry Pi effectively:

  1. Set Up the Raspberry Pi: Begin by installing the latest version of Raspberry Pi OS (Raspbian) on your microSD card. Insert the microSD card into your Raspberry Pi, and boot it up. Make sure your Raspberry Pi is connected to a power source and monitor for easy setup.
  2. Enable the Camera Module: Connect the official Raspberry Pi Camera Module to the Raspberry Pi. Run sudo raspi-config from the terminal, go to ‘Interfacing Options’, and enable the camera. You may need to reboot the Raspberry Pi to apply these changes.
  3. Install Required Packages: Install Python 3 and the PiCamera library, which allows you to interact with the camera module using Python scripts. Use the command sudo apt-get install python3-picamera to install the required library.
  4. Preview the Camera Feed: Write a Python script to preview the camera feed and adjust the positioning of the object. This will help you ensure that the images you capture are clear and properly framed.
  5. Capture Images: Write a Python script to capture the images you need. Make sure the images are named sequentially for easy identification.
  6. Resize and Format: Once images are captured, you can use Python scripts or image processing tools to resize the images to the appropriate dimensions for your model, such as 96×96 or 224×224 pixels. You can also use tools like OpenCV in Python to automate this step.
  7. Organize and Transfer Images: After capturing the required number of images, organize them by class in separate folders. You can transfer these images to your primary computer using SCP (Secure Copy Protocol) or a USB drive for further processing and training.

Be Patient and Iterate

Collecting your dataset will take time and patience, but it is a vital step in achieving high-quality results. Consistency in image collection—ensuring uniform lighting, object position, and background—helps to build a more reliable model. However, if your goal is to create a more versatile dataset that can handle various conditions, such as different lighting scenarios, object positions, and backgrounds, you’ll need a significantly larger dataset. This means collecting thousands of images for each class, which in turn may require a more complex model capable of generalizing across diverse conditions. For instance, in pet emotion classification, capturing emotions like ‘happy’ or ‘anxious’ under different lighting, angles, and environments will enhance the robustness of your model, but it also necessitates more data and computational resources. The larger and more varied the dataset, the better your model will perform in real-world applications, adapting to different scenarios and achieving greater accuracy.

2. Formatting Your Dataset

After collecting the images, you’ll need to prepare them. The essential steps are:

  • Resize each image to 96×96 pixels.
  • Format the images in BMP or PNG. While BMP is great for examining raw data, PNG is compressed but still lossless, making it perfect for storage and model training. Make sure to resize the images to match the input size required by your neural network model, as different models may have varying input size requirements. For example:
  • MobileNet: 224×224
  • ResNet: 224×224
  • VGG: 224×224
  • Inception: 299×299
  • EfficientNet: varies by version, e.g., EfficientNet-B0 uses 224×224

Selecting the correct input size ensures the images are compatible with the chosen model, leading to better training results.

  • Convert to Grayscale if needed, especially if the chosen model architecture accepts grayscale images. This allows you to use a simpler model, as grayscale models are lighter compared to RGB models.

The key is to keep the photos as consistent as possible. Stick with the same background, distance, and lighting conditions to ensure quality data for training.

3. Pre-made Datasets

If you want to start experimenting before collecting your own data, you can use pre-made datasets. Pre-made datasets are incredibly useful for gaining experience in building machine learning models without the initial time investment required for collecting and preparing your own dataset. They also provide a good benchmark to evaluate your model’s performance compared to established standards.

These datasets cover a wide range of topics, from general object classification to more specialized areas like facial emotion recognition, animals, and street view imagery. Leveraging these datasets allows you to quickly get started with training, testing, and understanding the data pipeline for machine learning projects.

Below is a list of top websites where you can find ready-to-use image classification datasets:

  1. Kaggle Datasets
  2. Google Dataset Search
  3. Open Images Dataset
  4. UCI Machine Learning Repository
  5. VisualData
  6. Academic Torrents
  7. Data.gov

About the author

Sophia Bennett is an art historian and freelance writer with a passion for exploring the intersections between nature, symbolism, and artistic expression. With a background in Renaissance and modern art, Sophia enjoys uncovering the hidden meanings behind iconic works and sharing her insights with art lovers of all levels. When she’s not visiting museums or researching the latest trends in contemporary art, you can find her hiking in the countryside, always chasing the next rainbow.