Jump in and drown in all the data
Here is a list of data avaiable.
“Data is the crude oil of the 21st Century, and analytics is the combustion engine.” –Gartner
I’d like to see how different people work on the same set of data:
Instead of downloading yourself, note that the Floydhub.com has these image datasets already on their servers for Machine Learning code use:
The MNIST data (from Lecun, the Godfather of ML) has 55,000 28x28 pixel images of hand-written letters. Each image is labeled with the number written in the image.
this lists methods by their error rate.
MNIST using a “flashlight” visualization by Tensorboard by Dandelion at the TensorFlow Dev Summit Feb. 2017.
COCO is a new image recognition, segmentation, and captioning dataset. It has 300,000 images containing multiple objects per image. 80,000 object categories.
Imagenet VGG Very Deep 19 19 weight layers pre-trained Convnet model
CALTECH 101/256 contains pictures of objects belonging to 101/256 categories
CIFAR 10/100 Subset of 80 million tiny images dataset
Cats vs Dogs Redux: Kernels Edition Dataset for Kaggle’s famous Dogs vs Cats competition
KONECT (the Koblenz Network Collection) from the Institute of Web Science and Technologies at the University of Koblenz–Landau collects large network datasets of all types in order to perform research in network science and related fields.
Google digitized (scanned) all the books in the 20th century and turned them into n-grams at
https://books.google.com/ngrams/ with counts how often each word occurred in all books.
Wordnet defined affect scores – a mood score.
http://www.makeovermonday.co.uk/data has one (of 52) visualization makeover every week.
IEX (Investors Exchange) has real-time stock exchange.
Google Big Data
us_budget has dollar outlays of each bureau within all agency (branch) of the US government, by year from 1962 to 2021
Allen Institute (ai2)
http://news.google.com/archivesearch has 200 years of archives
http://www.ibiblio.org/slanews/internet/intarchives.htm has links to global archives
http://searches.rootsweb.ancestry.com/ssdi.html Roots web
http://search.ancestry.com/search/db.aspx?dbid=3693 US Social Security Death Masterfile Index goes from 1935-2014
http://www.worldcat.org/default.jsp “lets you search the collections of libraries in your community and thousands more around the world.”
Zip codes by state, latitude, longitude
First names registered in each state, by year, in the US from Google Big Data
Musicbase from a game
- Reduction (generalize synonyms)