Welcome to the
Healthcare Data Library

2,200+ Clean, Current, Enriched, and Expert Curated Datasets for Data Scientists
Healthcare
1200+
Life Science
400+
Terminology
500+
Society
200+

How It Works

The data is available under two types of licenses:
Research
Commercial

Welcome to the Land of Clean Data!

Each dataset goes through 3 levels of quality review
  • 2 Manual reviews are done by domain experts
  • Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints
Data is normalized into one unified type system
  • All dates, units, codes, currencies look the same
  • All null values are normalized to the same value
  • All dataset and field names are SQL and Hive compliant
Data and Metadata
  • Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters
  • Metadata is provided in the open Frictionless Data standard, and every field is normalized & validated
Data Updates
  • Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted

Welcome to Expert Curated Data!

Field names, descriptions, and normalized values are chosen by people who actually understand their meaning

Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset

Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations

The data is always kept up to date – even when the source requires manual effort to get updates

Support for data subscribers is provided directly by the domain experts who curated the data sets

Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution

Welcome to Easy to Use Data!

Format, Download and Updates
  • Read CSV or Parquet data with one-liners from the standard libraries of Python, R, SAS, SPSS, or Spark;
  • Full download of data enables you to get the most out of your memory, database, or cluster;
  • Subscribe to dataset updates to automate them.
Analysis
  • 26 out of the box integrations to the world’s most popular analytics tools, via our data.world partnership;
  • SQL and SPARQL queries via a web UI or REST API.
Standardized and Complete Schemas
  • Need to load 1,000 datasets into a SQL or Hive DB? Create and populate all tables with one script, thanks to the complete & standardized schemas in metadata.
Enriched Metadata
  • Don’t know the jargon? Our experts curate extra search terms so that you can find ”NPPES” also by ”all US doctors” or “national providers database”.
  • Not sure what the data is about? Metadata is provided in human-readable PDF in addition to JSON.

26 out of the box data integrations

What customers are saying

The data sets were clean, easy to access and easy to use. It was a joy to be able to use the data provided.
Eric Rothman
Co-Founder, Threat Sync
The data sets make excellent reference data and are at their most powerful when combined with unstructured data – to bring order to the chaos if you will.
Mark Pinches
Founder, Alderley.ai
The provided data sets were of good quality, clean and ready to use.The access method was extremely easy to understand, as well as the search engine.
Roxana Radu
Project Manager, The Synergyst
Many people told me the datasets were great and very easy to use.
Jason Jim
HopHacks Organizer