NLP Case Studies


Many critical facts required by healthcare AI applications like patient risk prediction, cohort selection and clinical decision support are locked in unstructured free-text data. Recent advances in deep learning have raised the bar on achievable accuracy for tasks like biomedical named entity recognition, assertion status detection, entity resolution, de-identification and others. This case study presents the first industrial-grade implementation of these new results and its application at scale.


Roche is the world’s #1 company for in-vitro diagnostics and its medicines are used to treat over 130 million people each year. It’s building a clinical decision support product portfolio, starting with oncology. Roche is using Spark NLP for Healthcare to extract clinical facts from pathology and radiology reports. The case study covers the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale.

Roche applies Spark NLP for healthcare to extract clinical facts from pathology reports and radiology - and simplify training, optimization, and inference of such domain-specific models at scale.

Data scientist for diagnostic information solutions at Roche


SelectData provides clinical coding, audit, and revenue cycle management services to the home health and hospice industry. Automating parts of the coding workflow – from diagnosis & medication extraction to coder assignment – required deep understanding of a variety of noisy, long, scanned, free-text patient records and reports. It also requires domain expertise since the context, vocabulary, and meaning of text is healthcare- and specialty-specific.


Spark NLP for Healthcare was used to provide accurate, scalable, and healthcare-specific pipelines for OCR, sentence segmentation, spell checking, biomedical named entity recognition, assertion status (negation) detection, and entity resolution (to ICD and NDC codes). John Snow Labs’ AI Platform was used to develop, deploy, and operate the custom models within the required privacy, security, compliance, and scalability environment.

Spark NLP augments the SelectData Data Science Platform to extract fuzzy, implied, and complex facts from home health patient records.

Chief information officer at SelectData


Answering questions accurately based on information from financial documents, which can be a hundred or more pages long, is a challenge even for human domain experts. While traditional rule-based or expression-matching techniques work for simple fields in templated documents, it is harder to infer facts based on implied statements, on the absence of certain statements, or on the combination of other facts. Answering such questions at a very high level of accuracy requires state-of-the-art deep learning techniques applied to NLP.

Spark NLP was used to augment the UiPath smart data extraction platform in order to automatically infer fuzzy, implied, and complex facts from long financial documents. This case study covers the technical challenges, the architecture of the full solution, and lessons learned that you can directly apply to your next data extraction project.

UiPath is excited to support this technology partnership and support a seamless integration of John Snow Labs’ state-of-the-art NLP technology inside UiPath Activities. The joint capability is already providing value to business customers and is broadly applicable.

Senior Manager for partnerships and alliances at UiPath