John Snow Labs’ NLP is an open source text processing library for Python & Scala that’s built on top of Apache Spark and TensorFlow. It provides production-grade versions of the latest research in natural language processing – raising the bar on accuracy, speed, and scalability.

Unmatched Speed & Scale

Spark NLP was 80x faster than spaCy to train locally on 2.6MB of data.

Scale to a Spark cluster with zero code changes.

State of the art accuracy

First production-grade versions of novel deep learning NLP research.

Use pre-trained models to train to fit your data.

Most widely used in the enterprise

Widely deployed production-grade codebase.

New releases every 2 weeks since 2017.

Growing community.

Spark NLP 2.0 obtained the best performing academic peer-reviewed results.


Spark-NLP is 1-2 orders of magnitude faster than spaCy to train NLP models locally


Zero code changes are needed to scale a pipeline to any Spark cluster.

Trainable to understand your language

Spark NLP is optimized for training domain-specific NLP models, so you can adapt it to learn the nuances of jargon and documents you must support.

We all speak many languages…

