Datasets for Computer Vision (5)

Published at

12/18/2024

Categories

4 categories in total

Author

8 person written this

*Memos:

My post explains MNIST, EMNIST, QMNIST, ETLCDB, Kuzushiji and Moving MNIST.
My post explains Fashion-MNIST, Caltech 101, Caltech 256, CelebA, CIFAR-10 and CIFAR-100.
My post explains Oxford-IIIT Pet, Oxford 102 Flower, Stanford Cars, Places365, Flickr8k and Flickr30k.
My post explains ImageNet, LSUN and MS COCO.
My post explains Image Classification(Recognition), Object Localization, Object Detection and Image Segmentation.
My post explains Keypoint Detection(Landmark Detection), Image Matching, Object Tracking, Stereo Matching, Video Prediction, Optical Flow, Image Captioning.

(1) PASCAL VOC(Pattern Analysis, Statistical Modelling, and Computational Learning Visual Object Classes)(2005):

(2) SUN Database(Scene UNderstanding database)(2010):

has human action short video clips and there are the 3 datasets Kinetics-400, Kinetics-600 and Kinetics-700: *Memos:
- Each video clip lasts around 10 seconds.
- Kinetics-400(2017) has 306,245 video clips each connected to the label from 400 categories(classes).
- Kinetics-600(2018) has 495,547 video clips each connected to the label from 600 categories.
- Kinetics-700(2019) has 545,317 video clips each connected to the label from 700 categories.
is used for Video Classification.
is Kinetics() in PyTorch.

(4) Cityscapes(2016):

has the 25,000 annotated urban street scene images of semantic understanding with the 30 classes grouped into 8 categories. *5,000 images are fine-annotated and 20,000 images are coarse-annotated.
is used for Image Segmentation.
is Cityscapes() in PyTorch. *How to set the dataset isn't explained.

Fine-annotated images:

Coarse-annotated images:

dev-resources.site