was successfully added to your cart.

State of the Art Natural Language Processing

John Snow Labs' Spark NLP is an open source text processing library for Python, Java, and Scala. It provides production-grade, scalable, and trainable versions of the latest research in natural language processing.
Most Widely Used in
the Enterprise

Widely deployed production-grade codebase.

New releases every 2 weeks since 2017.

Growing community.

Read more

Art Accuracy
State of the Art

First production-grade versions of novel deep learning NLP research.

Use pre-trained models to train to fit your data.

Read more

Unmatched Speed Scale
Speed & Scale

Spark NLP was 80x faster than spaCy to train locally on 2.6MB of data.

Scale to a Spark cluster with zero code changes.

Read more

The most widely used NLP library in the Enterprise, by far

NLP library

Why Spark NLP?


Spark NLP delivered the best performing accuracy on multiple public academic benchmarks.

To the left are F1 scores for the Named Entity Recognition task on the CoNLL 2003 dataset.


Zero code changes are needed to scale a pipelene to any spark cluster.


Optimized builds for the latest chips from Intel, (CPU) Nvidia (GPU), Apple (M1/M2), and AWS (Graviton) enable the fastest training & inference of state-of-the-art models.

This benchmark compares the speed of image transformers inference on the 34k ImageNet dataset on a single machine. Spark NLP is 34% faster than Hugging Face when running on a single CPU, and 51% faster than Hugging Face on a single GPU.

Out Of The Box Functionality

Entity Recognition
Split Text
  • Sentence Detector
  • Deep Sentence Detector
  • Tokenizer
  • nGram Generator
Understand Grammar
  • Stemmer
  • Lemmatizer
  • Part of Speech Tagger
  • Dependency Parser
Information Extraction
Clean Text
  • Spell Checking
  • Spell Correction
  • Normalizer
  • Stopword Cleaner
Find in Text
  • Text Matcher
  • Regex Matcher
  • Date Matcher
  • Chunker
Sentiment Analysis
Information Extraction

Trainable to understand your language

Spark NLP is optimized for training domain-specific NLP models, so you can adapt it to learn the nuances of jargon and documents you must support.

Run privately on an optimized build

Spark NLP always runs on your infrastructure. No data sharing, no admin privileges, no calls to cloud API, not even an Internet connection is required.

Optimized builds ensure you get the most out of whichever hardware you use.

Introducing Spark NLP at Top Level AI Conferences

Spark NLP: How Roche automates knowledge extraction from pathology and radiology reports

Read More

Spark NLP in action: Intelligent, high-accuracy fact extraction from long financial documents

Read More

Spark NLP in action: How SelectData uses AI to better understand home health patients

Read More