Logo

dev-resources.site

for different kinds of informations.

Journey into Visual AI: Exploring FiftyOne Together — Part III Preparing a Computer Vision Challenge.

Published at
1/6/2025
Categories
computervision
ai
machinelearning
datascience
Author
jguerrero-voxel51
Author
17 person written this
jguerrero-voxel51
open
Journey into Visual AI: Exploring FiftyOne Together — Part III Preparing a Computer Vision Challenge.

Author: Paula Ramos (Senior DevRel and Applied AI Research Advocate at Voxel51)

Image description

This blog is part of the series “Journey into Visual AI: Exploring FiftyOne Together,” in which I want to bring my experience using FiftyOne in multiple stages. Don’t miss the previous blogs here:

Blog 1: Journey into Visual AI: Exploring FiftyOne Together — Part I Introduction.

Blog 2: Journey into Visual AI: Exploring FiftyOne Together — Part II Getting Started

In this Blog 3, we’ll explore the new Elderly Action Recognition Challenge I’m working on, its goals, the challenges we face, and how the open-source community can collaborate to address them. At the end of this blog, I hope you are interested in participating in the challenge and bringing your ideas to AI for Good.

From the early days of my professional career, I’ve been passionate about applications for automated systems. In recent years, my focus has naturally gravitated toward cutting-edge AI trends. Yet, despite the advancements, many unresolved challenges remain. I vividly remember working during my master’s degree on a system designed to detect falls in the elderly. The idea involved developing a sensor-based belt that activated an inflatable device to prevent injuries. Such concepts have gone over the years, and companies now market similar solutions.

However, with the rise of computer vision and robotics to assist humans in their daily lives, we face a new challenge: leveraging camera-based technology to detect human actions. I recall my first blog with OpenVINO, “Human Action Recognition,” where I implemented an encoder-decoder architecture to generate embeddings from 16 frames and determine actions captured in videos. You cannot miss that notebook—I have my son in there, lovely!

Image description

Paula and Paula’s son in human action recognition.

Since then, models have evolved dramatically, with new architectures released nearly every week. This rapid evolution in model development begs the question: Can we generate reliable data at a pace that matches this rate of innovation? 

What Is the Elderly Action Recognition Challenge?

This challenge aims to tackle one of the most critical applications in human action recognition: identifying activities of daily living (ADLs) and fall detection for the elderly. The competition invites participants to train models on a significant, generic benchmark of human action recognition and apply transfer learning using a subset of data and class labels specific to elderly-related actions.

Key Details:

Goals: Enable more efficient and accurate recognition of elderly actions, addressing real-world healthcare and assisted living challenges.

Deadlines: Submissions close Feb 15th 2025.

Evaluation: Given a path to mp4, the evaluation script should intake the video and output category + label. The evaluation framework will use the following metrics to ensure a fair and comprehensive assessment:

Submissions must include: 1) an Eval submission CSV JSON file, with the prediction results over the Evaluation Dataset, .2) a Hugging Face Link of your PyTorch model weights, and 3) a PDF Report documenting the data curation process and datasets used.

Target Audience: The challenge is open to AI researchers, students, developers, and enthusiasts interested in advancing action recognition in critical domains.

Here is the submission platform: https://eval.ai/web/challenges/challenge-page/2427/overview

Discord Channel: https://discord.com/channels/1266527359511564372/1319053378843836448

Note: This challenge is part of the Computer Vision for Smalls Workshop (CV4Smalls) hosted in WACV 2025.

Image description

Video from Video capture of the circumstances of falls in elderly people residing in long-term care: an observational study

Human Action Recognition in the Era of Vision Transformers

The development of models for human action recognition has significantly transformed in the era of Vision Transformers (ViTs). While convolutional architectures laid the foundation, ViTs have introduced a new paradigm with their ability to effectively capture long-range dependencies and process spatiotemporal data.

However, this challenge seeks a solution that doesn’t necessarily rely on Vision Transformers. It is open to different approaches, even the more simplistic ones, emphasizing the solution's practicality and accessibility rather than exclusively adopting cutting-edge architectures.

Data complexity and model generalization are the main challenges in model development and deployment. Handling spatiotemporal data is resource-intensive and demands robust architectures and achieving high accuracy across diverse datasets remains a challenge.

Regarding data creation challenges, unfortunately, data creation does not have the same rate of model development, and the available data is still too restrictive. Open-access datasets for elderly action recognition are limited, presenting challenges for reproducibility and benchmarking.

Current Data for Detecting ADLs and Falls in the Elderly

The availability of open-access datasets for elderly action recognition is a critical bottleneck. Most existing datasets have limitations in scale, diversity, or licensing. The key issues I can identify after preparing this material for potential participants of the challenge are: 

  • Data Limitations: Many datasets lack coverage of diverse scenarios or fail to represent real-world variability.
  • Licensing Challenges: Open-access datasets often have restrictive licenses, limiting their utility for commercial or collaborative applications.

Image description

The Role of FiftyOne in Video Data Management

As you can see in my previous blogs, FiftyOne is a powerful open-source tool for handling and analyzing data. The new aspect of this blog is that FiftyOne can also process video data, offering critical functionality for dataset curation and exploration in complex datasets.

With FiftyOne, we can create video datasets and streamline importing, organizing, and visualizing video data. Managing the metadata easily manages metadata associated with datasets, enabling better insights and analysis. It also explores the data curation tools, efficiently visualizing, cleaning, filtering, and curating video datasets, ensuring high-quality inputs for model training.

Here, you can find extra resources for video management with FiftyOne:

Getting Hands-On: Exploring ADL and Fall Detection Datasets

For this demonstration, we’ll dive into the GMNCSA2024 dataset, which provides a comprehensive collection of elderly activity and fall detection videos.

Dataset Description:

  • Contains 160 videos (mp4) covering diverse indoor scenarios related to ADLs (81 videos) and falls (79 videos).
  • Includes rich metadata for better context and model interpretability.
  • Each video could have two or more actions.
  • Activities: Drinking, eating, exercising, reading, sitting, sleeping, standing, walking, writing.
  • Fall classes: Fall backward (BW), fall forward (FW), fall sideways (SW).

Using FiftyOne, we’ll navigate this dataset, showcasing how to explore its structure, visualize key insights, and prepare it for training robust AI models.

Image description

Step 1 – Defining Path for Dataset and Checking if Dataset Exists:  

After installing the required libraries and importing the necessary modules, the first step is to define the dataset path and create a new dataset. To avoid conflicts with previous executions, we first check if a dataset with the same name already exists. If it does, we delete it to start fresh.

# Define the path to your dataset
dataset_path = "/path/to/the/GMDCSA24/folder"  # Replace with the actual path
dataset_name = "ADL_Fall_Videos"

# Check if the dataset already exists
if fo.dataset_exists(dataset_name):
    # Delete the existing dataset
    fo.delete_dataset(dataset_name)

# Create a FiftyOne dataset
fo_dataset = fo.Dataset(dataset_name)
Enter fullscreen mode Exit fullscreen mode

Step 2 – Setting up helper functions:

To process the dataset effectively, we define two key helper functions:

2.1 Function to Parse the Classes

This function extracts action names and their respective time ranges from the dataset. Since each video can include multiple actions, the label file specifies which actions occur at specific timestamps. We use this information to split videos into smaller clips and prepare a new dataset based on these segments.

Image description

# Function to parse the Classes column
def parse_classes(classes_str):
    actions = []
    if pd.isna(classes_str):
        return actions

    # Split by ';' to handle multiple actions
    class_entries = classes_str.split(';')
    for entry in class_entries:
        match = re.match(r"(.+?)\[(.+?)\]", entry.strip())
        if match:
            action = match.group(1).strip() # Extract action name
            time_ranges = match.group(2).strip() # Extract time ranges within brackets

            #print("Action=", action)
            #print("Time_Group=", time_ranges)

            # Split time ranges by ';' and process each range
            ranges = time_ranges.split(';')
            #print(ranges)
            for time_range in ranges:
                time_match = re.match(r"(\d+(\.\d+)?) to (\d+(\.\d+)?)", time_range.strip())
                if time_match:
                    start_time = float(time_match.group(1))
                    #print("Starttime=", start_time)
                    end_time = float(time_match.group(3))
                    #print("Endtime=", end_time)

                    # Ensure start_time is less than or equal to end_time
                    if start_time > end_time:
                        continue  # Skip invalid ranges

                    actions.append({"action": action, "start_time": start_time, "end_time": end_time})

    return actions
Enter fullscreen mode Exit fullscreen mode

2.2 Function to Map Actions to Categories

One of the goals of the challenge is to categorize actions. This function maps each action to a predefined category to ensure the action recognition task also includes a higher-level classification.

Image description

Step 3 – Iteration in the Main Folders, Per Subject, and Splitting Video by Actions Using FiftyOne.

This section combines several important tasks:

Adding Samples to the Dataset: We read the dataset from CSV files, extract metadata (e.g., file name, action, and description), and add these as new samples. We also enrich the metadata by adding fields like subject, type_of_activity (e.g., ADL or Fall), and categories derived from the actions.

Splitting Videos into Clips: We split videos into smaller clips using the parsed action information for each specific action. This is achieved by creating a metadata field called events, which stores the timestamps and frames corresponding to each action.

Exporting the Dataset: The updated dataset can be exported into a FiftyOne format or a Classification Directory Tree after processing. The latter option is especially useful for working with split clips instead of full videos.

# Iterate through the main folders (one per subject)
for subject_folder in os.listdir(dataset_path):
    subject_path = os.path.join(dataset_path, subject_folder)

    if not os.path.isdir(subject_path):
        continue

    # Extract the subject number from the folder name
    subject_number = subject_folder.split("_")[-1]  # Adjust the split logic if needed

    # Look for ADL and Fall folders and CSV files
    adl_folder = os.path.join(subject_path, "ADL")
    fall_folder = os.path.join(subject_path, "Fall")

    label_files = [f for f in os.listdir(subject_path) if f.endswith(".csv")]

    # Load metadata from CSV files
    for label_file in label_files:
        label_path = os.path.join(subject_path, label_file)
        metadata = pd.read_csv(label_path)
        print(label_path)

        for _, row in metadata.iterrows():
            file_name = row["File Name"]
            length = row["Length (seconds)"]
            time_of_recording = row["Time of Recording"]
            attire = row["Attire"]
            description = row["Description"]
            classes = row[" Classes"]

            # Parse the Classes column
            parsed_classes = parse_classes(classes)

            # Determine the file's path
            if "ADL" in label_path:
                video_path = os.path.join(adl_folder, file_name)
                subset = "ADL"
            elif "Fall" in label_path:
                video_path = os.path.join(fall_folder, file_name)
                subset = "Fall"
            else:
                continue

            if not os.path.exists(video_path):
                print(f"Video file not found: {video_path}")
                continue

            # Create a FiftyOne sample
            metadata = fo.VideoMetadata.build_for(video_path)
            sample = fo.Sample(filepath=video_path, metadata=metadata)

            #temporaldetection using actions detections on labeled dataset
            temp_detections = []

            for action in parsed_classes:
                start_time = float(action["start_time"])
                end_time = float(action["end_time"])

                # Check if end_time exceeds video duration
                if end_time > metadata.duration:
                    end_time = metadata.duration

                event = fo.TemporalDetection.from_timestamps(
                            [start_time, end_time],
                            label=action["action"],
                            sample=sample,
                            )
                temp_detections.append(event)

            sample["events"] = fo.TemporalDetections(detections=temp_detections)

            # Add metadata to the sample
            sample["subset"] = subset
            sample["subject_number"] = subject_number
            sample["length"] = length
            sample["time_of_recording"] = time_of_recording
            sample["attire"] = attire
            sample["description"] = description
            sample["classes"] = classes
            #sample["events"] = events

            # Assign category based on actions
            categories = [get_category(action["action"]) for action in parsed_classes]
            sample["category"] = list(set(categories))  # Deduplicate categories

            # Add the sample to the dataset
            fo_dataset.add_sample(sample)
            fo_dataset.compute_metadata()

Enter fullscreen mode Exit fullscreen mode

Step 4 — Launch the APP

Once the dataset is prepared, you can interact with it programmatically by launching the FiftyOne app. This allows you to explore the dataset visually, create views, and export those views to various formats for further analysis or sharing.

The FiftyOne app provides a highly interactive way to:

  • Inspect the dataset and its metadata.
  • Visualize events and clips.
  • Filter and sort data based on specific criteria.
  • Export customized views to your desired format.
session = fo.launch_app(fo_dataset)

view = fo_dataset.to_clips("events")
session.view = view
print(view)
Enter fullscreen mode Exit fullscreen mode

After launching the app, I can see that my new metadata and events are on the left side of the menu, along with all the metadata of the dataset, which I successfully added through the code I shared below and in the notebook.

Image description

Shuffle (51), Checking new metadata and sample[“events”]

Step 5 — Exporting Clips and Single Actions

Using TemporalDetections, we can focus on specific ranges of frames within the original videos, corresponding to individual actions. The events field in the metadata marks these individual events with precise timestamps, enabling clear segmentation.

Image description

Selecting just a particular event in the metadata “sitting” and checking for other events in the original video.

After this process, we can export only the relevant clips and single actions instead of entire videos with complex labels. This streamlined dataset structure is ideal for training machine learning models or for submission to challenges that require precise action recognition.

view.export(
    export_dir="/path/to/the/GMDCSA24/new_folder",
    dataset_type=fo.types.VideoClassificationDirectoryTree,
)

Enter fullscreen mode Exit fullscreen mode

By isolating and exporting these segments, we reduce dataset size and improve clarity and usability for downstream tasks.

FiftyOne can manage different kinds of datasets; in this notebook, we used a custom dataset and added each sample to the dataset. It is time to export this to use more of FiftyOne's capabilities. For more information about which datasets FiftyOne can manage, take a look at this page).

export_dir = "/path/to/the/GMDCSA24/new_folder_FO_Dataset"
new_dataset.export(
    export_dir=export_dir,
    dataset_type=fo.types.FiftyOneDataset,
)
Enter fullscreen mode Exit fullscreen mode

Additional resources:

  1. Notebook for digesting GMDCSA24 Dataset: https://github.com/voxel51/fiftyone-examples/blob/master/examples/elderly_action_recognition.ipynb
  2. GMDCSA24 Dataset: https://github.com/ekramalam/GMDCSA24-A-Dataset-for-Human-Fall-Detection-in-Videos
  3. Tips and tricks for human action recognition with FiftyOne: https://voxel51.com/blog/exploring-ucf101-youtube-based-action-recognition-dataset/
  4. Try the FiftyOne APP in a browser: https://try.fiftyone.ai/
  5. EAR Challenge: https://voxel51.com/computer-vision-events/elderly-action-recognition-challenge-wacv-2025/
  6. FiftyOne Documentation: https://docs.voxel51.com/

Just wrapping up! 😀

Thank you for joining me in exploring the Elderly Action Recognition Challenge and the powerful tools FiftyOne provides for dataset preparation and video data management. We have learned how to define a complex dataset to launch the FiftyOne app and export actionable clips. We’ve seen how FiftyOne streamlines the complexities of handling video datasets.

I invite you to participate in the challenge, test the notebook shared in this blog, and share your experience with FiftyOne.

I would love to hear about your experiences! Please Share Your Thoughts, Ask Questions, and Provide Testimonials. Your insights might help others in our next posts. Don’t forget to participate in the challenge and try out the notebook I have created for you all.

Together, we can innovate in action recognition and make meaningful contributions to AI for Good. Let’s build something impactful!

Stay tuned for the next post, in which we’ll explore FiftyOne’s advanced and evaluate the model.

Let’s make this journey with FiftyOne a collaborative and enriching experience. Happy coding!

Stay Connected:

What is next?

I’m excited to share more about my journey at Voxel51! 🚀 If you’d like to follow along as I explore the world of AI and grow professionally, feel free to connect or follow me on LinkedIn. Let’s inspire each other to embrace change and reach new heights!

You can find me at some Voxel51 events (https://voxel51.com/computer-vision-events/), or if you want t

computervision Article's
30 articles in total
Favicon
The Frontier of Visual AI in Medical Imaging
Favicon
How to Make the Best Self-Driving Dataset
Favicon
Transforming Retail Shelf Monitoring with AI-Powered Computer Vision
Favicon
Crowded Counting in Station
Favicon
Quintum Computing And History
Favicon
Popular Computer Vision Use Cases in the Pharmaceutical Industry
Favicon
Convert LabelMe Annotations to YOLO Format with labelme-to-yolo
Favicon
All Object Detectors: From RCNN to YOLO
Favicon
Datasets for Computer Vision (5)
Favicon
Delving Deeper into Queue Management with Cutting-Edge Computer Vision
Favicon
💧 📉 💧 Are you wasting money & time: does your data have a leak? 💧 📉 💧
Favicon
What are the top benefits of hiring an AI computer vision development company?
Favicon
Journey into Visual AI: Exploring FiftyOne Together — Part II Getting Started
Favicon
How difficult is your dataset REALLY?
Favicon
Top 10 real world use cases of computer vision AI in the oil & gas industry
Favicon
How we used gpt-4o for image detection with 350 very similar, single image classes.
Favicon
A Hero's Journey in AI: Building Recognition with Teachable Machine
Favicon
REMINDER - Dec 12 Virtual AI, ML and Computer Vision Meetup
Favicon
Journey into Visual AI: Exploring FiftyOne Together — Part III Preparing a Computer Vision Challenge.
Favicon
Business IT Support Services and PC Repair Sydney Comprehensive Solutions for Modern Needs
Favicon
Recognition of Land and Water from Satellite Images using U-Net
Favicon
VORTEX AI - The ultimate Vision AI platform
Favicon
Motion Detection In OpenCV Explained In-Depth
Favicon
Why 2024 Was the Best Year for Visual AI (So Far)
Favicon
ECCV 2024: Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning
Favicon
ECCV 2024 - Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
Favicon
Recapping ECCV 2024 Redux: Day 3
Favicon
ECCV 2024 Redux: Fast and Photo-realistic Novel View Synthesis from Sparse Images
Favicon
ECCV 2024: High-Efficiency 3D Scene Compression Using Self-Organizing Gaussians
Favicon
Où est Charlie - AI

Featured ones: