Generated Text Detection

This repository contains the HTTP service for the Generated Text Detector.
To integrate the detector with your project on the SuperAnnotate platform, please follow the instructions provided in our Tutorial

How it works

Model

The Generated Text Detection model is based on a fine-tuned RoBERTa Large architecture. Trained on a diverse dataset sourced from multiple open datasets, it excels at classifying text inputs as either generated/synthetic or human-written.
For more details and access to the model, visit its Hugging Face Model Hub page.

How to run it

API Service Configuration

You can deploy the service wherever it is convenient; one of the basic options is on a created EC2 instance. Learn about instance creation and setup here.
Hardware requirements will depend on your on your deployment type. Recommended ec2 instances for deployment type 2:

GPU: g3s.xlarge
CPU: a1.large

NOTES:

To verify that everything is functioning correctly, try calling the healthcheck endpoint.
Also, ensure that the port on which your service is deployed (8080 by default) is open to the global network. Refer to this tutorial for guidance on opening a port on an EC2 instance.

General Pre-requirements

Clone this repo and move to root folder
Create SSL sertificate. It is necessary to create certificates to make the connection secure, this is mandatory for integration with the SuperAnnotate platform.

Generate self-signed SSL certificate by following command: openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes

Install necessary dependencies

For running as Python file: Pyhon3.11
- GPU inference: Nvidia drivers and CUDA toolkit
For running as Docker: Docker
- GPU inference: Nvidia drivers; CUDA toolkit; NVIDIA Container Toolkit.

As python file

Install requirements: pip install -r generated_text_detector/requirements.txt
Set the Python path variable: export PYTHONPATH="."
Run the API: uvicorn --host 0.0.0.0 --port 8080 --ssl-keyfile=./key.pem --ssl-certfile=./cert.pem generated_text_detector.fastapi_app:app

As docker containers

GPU Version

Build image: sudo docker build -t generated_text_detector:GPU -f Dockerfile_GPU .
Run container: sudo docker run --gpus all -e DETECTOR_CONFIG_PATH="etc/configs/detector_config.json" -p 8080:8080 -d generated_text_detector:GPU

CPU Version

Build image: sudo docker build -t generated_text_detector:CPU -f Dockerfile_CPU .
Run container: sudo docker run -e DETECTOR_CONFIG_PATH="etc/configs/detector_config.json" -p 8080:8080 -d generated_text_detector:CPU

Performance

Benchmark

The model was evaluated on a benchmark collected from the same datasets used for training, alongside a closed subset of SuperAnnotate.
However, there are no direct intersections of samples between the training data and the benchmark.
The benchmark comprises 1k samples, with 200 samples per category.
The model's performance is compared with open-source solutions and popular API detectors in the table below:

Model/API	Wikipedia	Reddit QA	SA instruction	Papers	Average
Hello-SimpleAI	0.97	0.95	0.82	0.69	0.86
RADAR	0.47	0.84	0.59	0.82	0.68
GPTZero	0.72	0.79	0.90	0.67	0.77
Originality.ai	0.91	0.97	0.77	0.93	0.89
LLM content detector	0.88	0.95	0.84	0.81	0.87

Time performance

There are 2 inference modes available on CPU and GPU. In the table below you can see the time performance of the service deployed in the appropriate mode

Method	RPS
GPU	10
CPU	0.9

*In this test, request texts average 500 tokens

Endpoints

The following endpoints are available in the Generated Text Detection service:

GET /healthcheck:
- Summary: Ping
- Description: Alive method
- Input Type: None
- Output Type: JSON
- Output Values:
  - {"healthy": True}
- Status Codes:
  - 200: Successful Response
POST /detect:
- Summary: Main endpoint of detection
- Description: Detection generated text and return report with Generated Score and Predicted Author
- Input Type: JSON. With string filed text
- Input Value Example: {"text": "some text"}
- Output Type: JSON. With 2 fileds:
  - generated_score: float values from 0 to 1
  - author: one of the following string values:
    - LLM Generated
    - Probably LLM Generated
    - Not sure
    - Probably human written
    - Human
- Output Value Example:
  - {"generated_score": 0, "author": "Human"}
- Status Codes:
  - 200: Successful Response

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
etc		etc
generated_text_detector		generated_text_detector
.gitignore		.gitignore
Changelog		Changelog
Dockerfile_CPU		Dockerfile_CPU
Dockerfile_GPU		Dockerfile_GPU
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
tutorial.md		tutorial.md
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generated Text Detection

How it works

Model

How to run it

API Service Configuration

General Pre-requirements

As python file

As docker containers

GPU Version

CPU Version

Performance

Benchmark

Time performance

Endpoints

About

Releases 1

Packages

Languages

License

superannotateai/generated_text_detector

Folders and files

Latest commit

History

Repository files navigation

Generated Text Detection

How it works

Model

How to run it

API Service Configuration

General Pre-requirements

As python file

As docker containers

GPU Version

CPU Version

Performance

Benchmark

Time performance

Endpoints

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages