was successfully added to your cart.
Spark NLP Community
Spark NLP Community
Get for Free
Free & Open Source
State-of-the-art natural language processing for Python, Java, or Scala
  • Named entity recognition
  • Document classification
  • Sentiment Analysis
  • Text mining
  • 20+ other text mining annotators
  • 100+ pre-trained models
  • 45+ supported languages
  • Train your own NLP models
  • Build your own NLP pipelines
  • Run locally or on any Spark cluster
  • CPU & GPU optimized builds
  • Keep your data & models private
Spark NLP Healthcare
Spark NLP for Healthcare
Try Free
Per Server / Per Year
    State-of-the-art clinical & biomedical natural language processing
  • Clinical entity recognition
  • Clinical entity linking
  • Assertion status detection
  • Relation extraction
  • + everything in the community edition
  • + 50 pre-trained models including drugs, diseases, anatomy, treatments, demography, vitals, labs & more
  • Map entity to standard terminologies or train to fit your own
  • Designed for processing PHI: No data sharing or Internet connection needed
Spark NLP & OCR
Try Free
Per Server / Per Year
Scalable, private, and highly accurate OCR and de-identification
  • Everything in the community edition +
  • Object character recognition
  • Image enhancement & pre-processing
  • Annotate & generate PDF documents
  • De-identify tables, free text & images
  • 15+ image processing algorithms for OCR from low-quality documents
  • Support text, PDF, image & DICOM files
  • Identify image regions & coordinates
  • Identify sensitive data based on HIPAA, CCPA, GDPR, or custom requirements
  • De-identify fields by deletion, masking, obfuscation, or generalization



Free, forever, unlimited, for personal and commercial use. Spark NLP is released under an Apache 2.0 open-source license – including the pre-trained models and documentation.

Each license includes the software libraries in all supported languages, the pre-trained models that are included with it, premium support, and all updates to the software & models that are released during the subscription period.

Spark NLP for Healthcare and Spark NLP & OCR are licensed as an annual subscription, payable once a year in full. There are two license types: Per Server, which allows use of the software on one machine; and Per Cluster, which allows use of the software on an unlimited Apache Spark cluster.

No. The only limitation is that each license allows using the software on one server or one cluster, based on the license type you choose.

The software will stop processing documents – for both training and inference. If you choose to buy a license, we will provide you new credentials that will reactivate it. Otherwise, you must uninstall the software. In any case, data you have already processed is yours to keep.

Running the Software

Python, Java, and Scala.

Spark 2.3.x and 2.4.x.

We officially support AWS, Azure, Databricks, Cloudera, and GCP.

Yes. Spark NLP is used heavily in high-compliance industries like healthcare, life science, finance, and insurance where on-premise deployments are common. Most single-machine, Spark, Hadoop, and Kubernetes distributions are supported.

Yes. Make sure to allocate enough memory & compute power for your use case.

Yes. Make sure to allocate enough memory & compute power for your use case.

This depends heavily on your use case. For training custom models based on the BERT family of embeddings, at least 8 cores and 64GB of memory are recommended. For inference, as little as 1 core and 8GB may be enough. Using GPU’s will usually provide faster execution at a higher cost.


The cost depends on which edition you need (Healthcare or OCR), the type of license (per server or per cluster), the level of support (8x5 or 24x7), and the number of licenses you need. Please email us with those details and we’ll reply with an exact quote.

Online bank transfers (ACH or wire), checks, and all major credit cards.

Yes! Please email us to describe your situation and needs.


No. You install and run the software on your infrastructure. The software does not “call home” and no data or results are sent to John Snow Labs.

You do. We will never even see them.

This is not a SaaS solution – instead, you run the software on your infrastructure. Nothing ever gets sent to John Snow Labs or another third party. Spark NLP is designed for high-compliance, locked-down environments.

No, after an initial installation & downloading of pre-trained models.



Yes. Spark NLP is designed to enable you to train & tune your own models for most tasks.

The full list is available here. Expect the list to keep growing over time.


Email support@johnsnowlabs.com, call us at +1-302-786-5227, or start a chat on spark-nlp.slack.com. Paying customers get a private Slack channel, so that you can ask your questions privately.

Same business day 8x5 support is included with all subscriptions. We can also provide 24x7 support for production systems – please email us if you require it.

Yes. Spark NLP in Action includes links to runnable Google Colab notebooks in Python.