Browse various AI and machine learning datasets, from images and text to audio, to find the right data for training and testing your models
Large-scale visual recognition dataset containing over 14 million images across 20,000+ categories, one of the most important datasets in computer vision
Common Objects in Context dataset developed by Microsoft, providing object detection, segmentation, and captioning annotations, widely used as a benchmark for computer vision tasks
Wikipedia text corpus containing encyclopedia articles in multiple languages, commonly used for training language models and knowledge extraction
English speech dataset derived from audiobooks, containing approximately 1000 hours of 16kHz English speech
Large-scale web crawl data containing billions of web pages, a common data source for training large language models
Large-scale search and question answering dataset developed by Microsoft, based on real Bing search queries
Large-scale speaker identification dataset containing voice data from thousands of celebrities extracted from YouTube videos
Large-scale image description dataset developed by Google, containing approximately 3.3 million pairs of images and descriptive texts
Small dataset of 60,000 32x32 color images across 10 categories, commonly used as a benchmark for image classification algorithms