Logo

dev-resources.site

for different kinds of informations.

πŸ’§ πŸ“‰ πŸ’§ Are you wasting money & time: does your data have a leak? πŸ’§ πŸ“‰ πŸ’§

Published at
12/12/2024
Categories
opensource
computervision
machinelearning
leakysplits
Author
jasoncorso
Author
10 person written this
jasoncorso
open
πŸ’§ πŸ“‰ πŸ’§ Are you wasting money & time: does your data have a leak? πŸ’§ πŸ“‰ πŸ’§

New open source AI feature alert! πŸ’§πŸ””πŸ’§πŸ””πŸ’§πŸ””πŸ’§πŸ””

Generalization in machine learning models is still poorly understood. Due to this, the status quo practice is to heuristically verify our models on holdout test sets, and hope that this check has some bearing on performance in the wild. Of course, this means that there is huge cost to faulty testing---a huge cost in both critical MLE time and in error filled data and annotation.

One common failure mode of testing is when the test split is afflicted with data leakage. When testing on such a split, there is no guarantee that generalization is being verified. In fact, in the extreme case, no new information is gained on the performance of the model outside of the train set. Supervised models learn the minimal discriminative features needed to make a decision, and if those features appear in the test set, a dangerous, false sense of confidence can be built in a model. Don't let this happen to you.

Leaky splits can be the bane of ML models, giving a false sense of confidence, and a nasty surprise in production. The image on this post is a sneak peak into what you can expect (this example is taken from ImageNet πŸ‘€)

Check out this Leaky-Splits blog post by my friend and colleague Jacob Sela
https://medium.com/voxel51/on-leaky-datasets-and-a-clever-horse-18b314b98331

Jacob is also the lead developer behind the new open source Leaky-Splits feature in FiftyOne, available in version 1.1.

This function allows you to automatically:
πŸ•΅ Detect data leakage in your dataset splits
πŸͺ£ Clean your data from these leaks

This will help you:
βœ”οΈ Build trust in your data
πŸ“Š Get more accurate evaluations

And, it's open source. Check it out on GitHub.

GitHub

From your friends at Voxel51

computervision Article's
30 articles in total
Favicon
The Frontier of Visual AI in Medical Imaging
Favicon
How to Make the Best Self-Driving Dataset
Favicon
Transforming Retail Shelf Monitoring with AI-Powered Computer Vision
Favicon
Crowded Counting in Station
Favicon
Quintum Computing And History
Favicon
Popular Computer Vision Use Cases in the Pharmaceutical Industry
Favicon
Convert LabelMe Annotations to YOLO Format with labelme-to-yolo
Favicon
All Object Detectors: From RCNN to YOLO
Favicon
Datasets for Computer Vision (5)
Favicon
Delving Deeper into Queue Management with Cutting-Edge Computer Vision
Favicon
πŸ’§ πŸ“‰ πŸ’§ Are you wasting money & time: does your data have a leak? πŸ’§ πŸ“‰ πŸ’§
Favicon
What are the top benefits of hiring an AI computer vision development company?
Favicon
Journey into Visual AI: Exploring FiftyOne Together β€” Part II Getting Started
Favicon
How difficult is your dataset REALLY?
Favicon
Top 10 real world use cases of computer vision AI in the oil & gas industry
Favicon
How we used gpt-4o for image detection with 350 very similar, single image classes.
Favicon
A Hero's Journey in AI: Building Recognition with Teachable Machine
Favicon
REMINDER - Dec 12 Virtual AI, ML and Computer Vision Meetup
Favicon
Journey into Visual AI: Exploring FiftyOne Together β€” Part III Preparing a Computer Vision Challenge.
Favicon
Business IT Support Services and PC Repair Sydney Comprehensive Solutions for Modern Needs
Favicon
Recognition of Land and Water from Satellite Images using U-Net
Favicon
VORTEX AI - The ultimate Vision AI platform
Favicon
Motion Detection In OpenCV Explained In-Depth
Favicon
Why 2024 Was the Best Year for Visual AI (So Far)
Favicon
ECCV 2024: Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning
Favicon
ECCV 2024 - Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
Favicon
Recapping ECCV 2024 Redux: Day 3
Favicon
ECCV 2024 Redux: Fast and Photo-realistic Novel View Synthesis from Sparse Images
Favicon
ECCV 2024: High-Efficiency 3D Scene Compression Using Self-Organizing Gaussians
Favicon
OΓΉ est Charlie - AI

Featured ones: