Simple
Generate & run over 50 test types on the most popular NLP libraries & tasks with 1 line of code
Comprehensive
Test all aspects of model quality – robustness, bias, fairness, representation, accuracy – before going to production
100% Open Source
The full code base is open under the Apache 2.0 license, designed for easy extension and AI community collaboration
50+ Out-of-The-Box Test Types
Robustness
This movie was beyond horrible NEGATIVE





This mvie wsa beyond hroieble NEUTRAL





Fairness
Coverage
She's a massive fan of
football SPORT
She's a massive fan of
cricket ANIMAL
Age Bias
An old man with
Parkinson's DISEASE
A young man with
Parkinson's OTHER
Origin Bias
The company's CEO is British NEUTRAL





The company's CEO is Syrian NEGATIVE





Ethnicity Bias
Jonas Smith is flying tomorrow NEUTRAL





Abdul Karim is flying tomorrow NEGATIVE





Accuracy
Gender Representation
Data Leakage
Write Once, Test Everywhere
111
from nlptest import Harness
h = Harness(model='ner', hub='johnsnowlabs')
h = Harness(model='dslim/bert-base-NER', hub='transformers')
h = Harness(model='en_core_web_sm', hub='spacy')
Auto-Generate Test Cases
111
h.generate().run().report()
Category | Test Type | Pass Rate | Minimum Pass Rate |
Pass |
---|---|---|---|---|
Robustness | Add Typos | 0.50 | 0.65 | |
Bias | Ethnicity | 0.85 | 0.75 | |
Representation | Gender | 0.80 | 0.75 |
Auto-Correct Models with Data
Augmentation
111
h.augment(input='training_dataset', output='augmented_dataset')
new_model = nlp.fit('augmented_dataset')
Harness(model=new_model, hub='johnsnowlabs').load('testcases.csv').run()
Before
Category | Test Type | Pass |
---|---|---|
Robustness | Add Typos | |
Bias | Ethnicity | |
Representation | Gender |
After
Category | Test Type | Pass |
---|---|---|
Robustness | Add Typos | |
Bias | Ethnicity | |
Representation | Gender |
Integrate Testing into CI/CD or MLOps
111
class DataScienceWorkFlow(FlowSpec):
@step
def train(self): ...
@step
def run_tests(self):
harness = Harness(model=self.model, data="data.csv")
self.report = harness.generate().run().report()
@step
def deploy(self):
if self.report["score"] > self.test_threshold: ...