was successfully added to your cart.

Search The Data Library

Each dataset is manually curated by our team of

doctors, pharmacists, public health & medical billing experts

Explore The Data Library


1000+ datasets

Guidelines, Measures, Outcomes, Hospitals, Providers, Cost, Billing, Payments, Population Health

Browse Healthcare Datasets

Life Science

350+ datasets

Research, Clinical Trials, Food, Drug Safety, Drug Pricing, Genomics, Medical Devices

Browse Life Science Datasets


350+ datasets

Snomed, RxNorm, LOINC, ICD,CPT, MeSH, CMT, Genetic Associations, UMLS by Semantic Type, Bill Codes

Browse Terminology Datasets

Open Knowledge

200+ datasets

Census, Geography, Economy, Climate, Demographics, Geo-Enrichment…

Browse Open Knowledge Datasets

Let us do the “dirty work” for you

We are experts on data cleaning and preparing data for analysis.

Welcome to the Land of Clean Data!

Each dataset goes through 3 levels of quality review

- 2 Manual reviews are done by domain experts
- Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints

Data is normalized into one unified type system

-All dates, unites, codes, currencies look the same
-All null values are normalized to the same value
-All dataset and field names are SQL and Hive compliant

Data and Metadata

-Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters
-Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated

Data Updates

- Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted

Welcome to Expert Curated Data!

Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts

Field names, descriptions, and normalized values are chosen by people who actually understand their meaning

Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset

Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations

The data is always kept up to date – even when the source requires manual effort to get updates

Support for data subscribers is provided directly by the domain experts who curated the data sets

Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution

Welcome to Easy to Use Data!

26 out of the box data integrations

Format, Download and Updates

- Read CSV or Parquet data with one-liners from the standard libraries of Python, R, SAS, SPSS, or Spark ;
- Full download of data enables you to get the most out of your memory, database, or cluster ;
- Subscribe to dataset updates to automate them .


- 26 out of the box integrations to the world’s most popular analytics tools, via our data.world partnership ;
- SQL and SPARQL queries via a web UI or REST API .

Standardized and Complete Schemas

- Need to load 1,000 datasets into a SQL or Hive DB? Create and populate all tables with one script, thanks to the complete & standardized schemas in metadata .

Enriched Metadata

- Don’t know the jargon? Our experts curate extra search terms so that you can find ”NPPES” also by ”all US doctors” or “national providers database”.
- Not sure what the data is about? Metadata is provided in human-readable PDF in addition to JSON.

26 out of the box data integrations

What customers are saying

The data sets were clean, easy to access and easy to use. It was a joy to be able to use the data provided.

Eric RothmanCo-Founder, Threat Sync

The data sets make excellent reference data and are at their most powerful when combined with unstructured data – to bring order to the chaos if you will.

Mark PinchesFounder, Alderley.ai

The provided data sets were of good quality, clean and ready to use.
The access method was extremely easy to understand, as well as the search engine.

Roxana RaduProject Manager, The Synergyst

Many people told me the datasets were great and very easy to use.

Jason JimHopHacks Organizer

Healthcare Data In Use

CMS Provider Performance Application

The CMS Provider Performance application, developed by John Snow Labs in collaboration with Qlik, allows healthcare organizations to understand readmission rates and benchmark how others in their peer-group compare.

For hospitals, it is essential to analyze and continuously improve the performance of care as well as the treatment outcomes, and the patients’ assessment of their experiences with respect to the care act.

With this data, any hospital or service provider in the US healthcare market is equipped to understand how they are performing against the average regionally and nationally.

The Data Packages used for this application are:

Readmissions & Deaths

This data package shows, in a standardized manner, how the treatments to ailments compare across US hospitals, using data from the Center for Medicare & Medicaid Services (CMS). The data is updated annually (July).

  • Readmissions and Deaths by Hospital
  • Readmissions and Deaths by National
  • Readmissions and Deaths by State

Hospital Readmissions Reduction Program

This data tracks the US Government’s program where payments for services are linked to the quality of hospital care. The data is updated annually.

  • Hospital Excess Readmissions Reduction Program
  • Hospital General Information and Performance Measures Comparison

Medicare Health Outcomes Survey (HOS)

The first patient-reported outcomes measure used in Medicare managed care. Through the HOS, Medicare have gathered reliable and clinically meaningful health status data from the Medicare Advantage program to use in quality improvement activities, pay for performance, program oversight, public reporting, and to improve health.

  • Medicare Health Outcomes Survey 2012 to 2014
  • Medicare Health Outcomes Survey 2013 to 2015