Journey of Analytics

Deep dive into data analysis tools, theory and projects

50 free Datasets for Data Science Projects

50 free datasets for Data Science projects

50+ free datasets

50+ free datasets

Here are top 50 websites to gather datasets to use for your data science projects in R, Python, SAS, Excel or other programming language or statistical software. Best part, these are all free, free, free!

The datasets are divided into 5 broad categories as below:

  1. Government & UN/ Global Organizations
  2. Academic Websites
  3. Kaggle & Data Science Websites
  4. Curated Lists
  5. Miscellaneous


Government and UN/World Bank websites:


Academic websites:

  • Yelp academic data – link
  • Univ of California, Irvine – link
  • Harvard Univ: link
  • Harvard Dataverse database: link 
  • MIT – link1 and  link2
  • Univ of North Carolina, adolescent health – link
  • Mars Crater Study, a global database that includes over 300,000 Mars craters 1 km or larger. Link to Descriptive guide and dataset.
  • Click Dataset from Indiana University (~2.5TB dataset) – link .
  • Pew Research Data – Pew Research is an organization focused on research on topics of public interest. Their studies gauge trends in multiple areas such as  internet, technology trends, global attitudes, religion  and social/ demographic trends. Astonishingly, they not only publish these reports but also make all their datasets publicly available for download!
  • Million Song Dataset from Columbia University , including data related to the song tracks and their artist/ composers.


Kaggle & Datascience resources:

Few of my favourite datasets from Kaggle Website are listed here. Please note that Kaggle recently announced an Open Data platform, so you may see many new datasets there in the coming months.

  • Walmart recruting at stores – link
  • Airbnb new user booking predictions – link
  • US dept of education scorecard – link
  • Titanic Survival Analysis – link
  • – link
  • Edx – link
  • Airbnb – link
  • Datasets on Climate information, human genome data, Enron email information, etc – link
  • Gapminder – link


Curated Lists:

  • KDnuggets provides a great list of datasets from almost every field imaginable – space, music, books, etc. May repeat some datasets from the list above. link
  • An eclectic mix of datasets about gun ownership, NYPD crime rates, college student study habits and caffeine concentrations in popular beverages – link
  • Data Science Central has also curated many datasets for free – link
  • List of open datasets from DataFloq – link
  • Sammy Chen (@transwarpio ) curated list of datasets. This list is categorized by topic, so definitely take a look.



  • MRI brain scan images and data – link
  • Economic, education, Health and other datasets from Quandl. Please note this site also has a premium version of other datasets .
  • Google repository of digitized books and ngram viewer – link.  Sample chart shown below:

  • Database with geographical information – link
  • Loan information from Lending Club – link
  • Google Public Data – Google has a search engine specifically for searching publicly available data. This is a good place to start as you can search a large amount of datasets in one place.
  • Statista – This site aggregates thousands of data sets and offers access as a paid service. However, some of the data sets are available for free.
  • Internet Usage Data from the Center for Applied Internet Data Analysis –link .
  • Yahoo offers some interesting datasets, the caveat being that you need to be affiliated with an accredited educational organization. (student or professor) – you can view the datasets here.
  • Enron Emails aggregated as a dataset.
  • Public datasets from Amazon – see link.

This post builds on an earlier post published on our old blog site. You can view the post here listing 25 sites.

Comments are closed.