Big data sources
Kerri Murphy wrote:
What sites do you use to obtain large sets of data for your students working with Big Data?
We find a lot of sites that use big data to show simulations and other data displays, but many times the sites don’t allow access to the data sets.
- https://arc.net/l/quote/puvwyzhi — 10 Great Places to Find Free Datasets (or Google it)
- https://data.world/search?type=resources — Data.world datasets: Free - Up to 3 private projects & datasets, 100MB per project / dataset, & 3 integrations
- https://www.kaggle.com/datasets — Kaggle datasets: A ‘…huge repository of community- published models, data & code’ from ‘…the largest AI & ML community’
- https://data.gov/ — Data.gov is the ‘Home of the U.S. Government’s Open Data’
- https://datahub.io/collections — The collection on Datahub.io ‘…presents collections of high quality datasets organized by topic.’
- https://github.com/search?q=datasets&type=repositories — Github search for datasets within repositories
- https://datasetsearch.research.google.com/ — Google dataset search
- https://data.boston.gov/ — The City of Boston has some great ones
- https://datausa.io/ — Data USA (and https://datausa.io/visualize)
- https://data.mass.gov/ — The Massachusetts Data Hub
- https://isenseproject.org/ — Any project from iSENSE (‘…a web system for sharing and visualizing scientific data… intended to be a resource for middle school and high school science, math, and engineering instruction’) (login)
- https://www.fda.gov/industry/fda-basics-industry/search-databases — FDA databases
- https://www.cdc.gov/surveillancepractice/data.html — CDC databases
- https://github.com/usgpo/cataloging-records — Federal Depository Library Program (FDLP) cataloging records
- https://ourworldindata.org/ — ‘Research and data to make progress against the world’s largest problems… 12,873 charts across 115 topics — [a]ll free: open access and open source’
- https://data.census.gov/ ‘the new platform to access data and digital content from the U.S. Census Bureau’ (also raw USCB public-use datasets)
- https://data.un.org/ ‘UNdata… brings international statistical databases within easy reach of users through a single-entry point.’
- https://www.audubon.org/news/how-use-ebird — Audubon ebird databases
- https://www.audubon.org/native-plants — Audubon plant databases
- https://t.ly/g3fvf — Mobile CSP list of data sources for the data unit
- https://exploringcs.org/wp-content/uploads/2019/07/ECS-v9.0-final.pdf#page=194 Exploring Computer Science (ECS) includes Unit 5: Computing & Data Analysis as part of its curriculum and that contains data sources and links
- https://opendata.cityofnewyork.us/data/ — NYC Open Data: ‘The Open Data Team at the NYC Office of Technology and Innovation (OTI)… works with City agencies to identify and make data available, coordinate platform operations and improvements, and promote the use of Open Data both within government and throughout NYC.’
- https://archive.ics.uci.edu/ — UC Irvine Machine Learning Repository
Thanks for additional contributions from Deborah Boisvert, J Reuther, Elaine Griggs, Beryl Hoffman, Beatriz Mendez, Danielle Theissen, Joshua Hans, Mitch Middler.
Note: https://www.universalhub.com/2024/bps-open-source-curriculum-open-source 2024/04/06 BPS to open source a curriculum on open-source data — ‘…a $500,000 grant from the state Department of Education to develop a lesson plan for teaching students how to use open data sources’ (including the Analyze Boston datasets listed above).
#data