AI API Reference & Documentation
Complete reference for Clavrit's production-ready computer vision APIs — Detection, Tracking, OCR, Human Activity Recognition, and Best Frame Selection, all hosted at aimodels.clavrit.com.
X-API-Key. All endpoints require this header for authentication.GET /health endpoint on any API to confirm the service is running before uploading media.multipart/form-data POST requests with your image or video in the file field (or video for Best Frame). Best Frame Selection API
Extract the top 3 highest-quality frames from a video using sharpness detection, motion analysis, and noise filtering — with async processing and job tracking.
Overview
The Best Frame Extraction API is a production-grade video processing service built using FastAPI, engineered to intelligently analyze video content and extract the top three highest-quality frames from an uploaded file. It leverages efficient frame sampling and quality assessment techniques — including sharpness detection, motion analysis, and noise filtering — to ensure only the most visually clear and information-rich frames are returned.
The system processes uploads asynchronously to handle high concurrency and large file sizes without performance degradation, making it ideal for content moderation, video summarization, surveillance analytics, and ML preprocessing workflows.
Key Features
- Asynchronous background processing — returns a
job_idimmediately, no blocking - Sharpness, motion & noise quality scoring per sampled frame
- Supports MP4, AVI, MOV, MKV, WebM input formats
- Returns top 3 frames as JPEG/PNG with frame position and timestamp metadata
- Configurable file size limits and frame selection strategies via environment variables
- Built-in health monitoring endpoint for load balancers and orchestrators
Authentication & Rate Limiting
Authentication is controlled by the API_KEYS environment variable on the server. If API_KEYS is not set, the API allows unrestricted access. When set, every request must include a valid key in the header below.
X-API-Key: <your-secret-key>An invalid or missing key returns 401 Unauthorized.
| Method | Rate Limit | Scope |
|---|---|---|
GET | 60 requests / minute | Per API key |
POST | 10 requests / minute | Per API key |
Endpoints
All endpoints are prefixed with /api/. An interactive Swagger UI explorer is available at /docs when the server is running.
Liveness and readiness check for load balancers, monitoring tools, and container orchestrators. No authentication required.
Response
{
"status": "ok"
}Upload a video file for processing. Initiates asynchronous frame extraction and returns a job_id immediately for polling status.
Request
POST /best_frame_selection/api/best-frame/
Content-Type: multipart/form-data
X-API-Key: <your-api-key>
// Form field
video: <video-file>Request Parameters
| Field | Type | Required | Description |
|---|---|---|---|
video | binary | Required | Video file (MP4, AVI, MOV, MKV, WebM) |
Response
{
"status": "accepted",
"job_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status_url": "/api/best-frame/a1b2c3d4-e5f6-7890-abcd-ef1234567890/status"
}status_url to track processing. When complete, the response will include the three extracted frame URLs along with frame position and timestamp metadata.Error Handling
| Code | Error | Description | Common Causes | Fix |
|---|---|---|---|---|
| 400 | Bad Request | Invalid or malformed request | Unsupported file type or file exceeds limit | Check file type and size |
| 401 | Unauthorized | Authentication failed | Missing or invalid X-API-Key header | Add a valid X-API-Key |
| 422 | Unprocessable | Missing or incorrect request data | No file attached, wrong field name | Use field name video |
| 429 | Too Many Requests | Rate limit exceeded | More than 10 POST/min per key | Reduce request frequency |
| 500 | Server Error | Unexpected server failure | Corrupt video, model crash, disk error | Retry or contact support |
Detection API
AI-powered object detection for images and videos. Detects helmets, license plates, and 83 object classes. Returns bounding boxes, category labels, and confidence scores as structured JSON.
Overview
The Detection API is a high-performance, AI-powered computer vision system designed to detect a wide range of objects — including helmets worn by riders, motorcycle license plates, car license plates, and 83 standard object classes. Built on FastAPI and driven by state-of-the-art object detection models, all responses are structured JSON.
Use Cases
| Domain | Application |
|---|---|
| Traffic Monitoring | Automated helmet and license plate detection at road intersections |
| Law Enforcement | Identifying compliance violations from camera feeds in real time |
| Smart City Analytics | Data collection for urban mobility and road safety insights |
| Surveillance | Smart camera integration for continuous area monitoring |
Key Outputs per Detection
- Bounding box coordinates
[x1, y1, x2, y2]per detected object - Object category / class label (e.g.,
"helmet","car") - Confidence score (0.0 – 1.0)
- Per-frame results when processing video inputs
Authentication & Rate Limiting
X-API-Key: <your-api-key>Example:
X-API-Key: demo_abc123 — Invalid or missing key returns 401 Unauthorized. | Method | Rate Limit | Scope |
|---|---|---|
GET | 60 requests / minute | Per API key |
POST | 10 requests / minute | Per API key |
Endpoints
Check if the API is running and the detection model is loaded. No parameters required.
{
"status": "ok"
}Processes a video or image file and performs object detection. Returns per-frame detection results including bounding boxes, class labels, and confidence scores.
Request
POST /detection/detect/media
Content-Type: multipart/form-data
X-API-Key: <your-api-key>
// Form field
file: <video-or-image-file>Request Parameters
| Field | Type | Required | Description |
|---|---|---|---|
file | binary | Required | Video (MP4, AVI, MOV, MKV, WebM) or image file |
Response
{
"type": "video",
"frames": 1,
"data": [
{
"frame": 50,
"detections": [
{
"class": "helmet",
"confidence": 0.94,
"bbox": [120, 45, 210, 180]
}
]
}
]
}Error Handling
| Code | Error | Description | Common Causes | Fix |
|---|---|---|---|---|
| 400 | Bad Request | Invalid or malformed request | Unsupported file type, file > 10 MB | Check file type and size |
| 401 | Unauthorized | Authentication failed | Missing or wrong X-API-Key | Add valid X-API-Key header |
| 422 | Unprocessable | Missing or incorrect data | No file attached, wrong field name | Use field name file |
| 500 | Server Error | Unexpected server failure | Model crash, corrupt file, disk error | Retry or check server logs |
Appendix — All 83 Supported Object Classes
person · bicycle · car · motorcycle · airplane · bus · train · truck · boat · traffic light · fire hydrant · stop sign · parking meter · bench · bird · cat · dog · horse · sheep · cow · elephant · bear · zebra · giraffe · backpack · umbrella · handbag · tie · suitcase · frisbee · skis · snowboard · sports ball · kite · baseball bat · baseball glove · skateboard · surfboard · tennis racket · bottle · wine glass · cup · fork · knife · spoon · bowl · banana · apple · sandwich · orange · broccoli · carrot · hot dog · pizza · donut · cake · chair · couch · potted plant · bed · dining table · toilet · tv · laptop · mouse · remote · keyboard · cell phone · microwave · oven · toaster · sink · refrigerator · book · clock · vase · scissors · teddy bear · hair drier · toothbrush · helmet · bike (custom) · car license plate · bike license plate
Human Activity Recognition API
Classify human activities across 400 Kinetics-400 categories from uploaded videos using the MMAction2 / TSN deep learning pipeline. Returns the predicted activity label as JSON.
Overview
The Human Activity Recognition API is a high-performance, production-ready computer vision system that leverages a pretrained deep learning model from the OpenMMLab ecosystem — specifically the MMAction2 framework with a Temporal Segment Network (TSN) model trained on the Kinetics-400 dataset (400 human activity classes).
How It Works
- Upload — The client uploads a video file via the POST endpoint.
- Store — The video is temporarily stored on the server for processing.
- Feature Extraction — The pretrained TSN model processes the video and extracts spatiotemporal features.
- Inference — The model classifies the video into one of 400 Kinetics-400 activity categories.
- Response — The predicted activity label is returned as a JSON response.
- Cleanup — The temporary file is deleted from the server after processing.
Key Features
- 400 Kinetics-400 activity classes — playing guitar, running, swimming, yoga, basketball, and more
- Confidence-ranked class scores across all categories
- Temporally consistent inference via MMAction2 backbone + head pipeline
- Low-latency video inference with CPU/GPU support
- Supports MP4, AVI, MOV video formats
Authentication & Rate Limiting
X-API-Key: <your-api-key>Example:
X-API-Key: demo_abc123 — Invalid or missing key returns 401 Unauthorized. | Method | Rate Limit | Scope |
|---|---|---|
GET | 60 requests / minute | Per API key |
POST | 10 requests / minute | Per API key |
Endpoints
Check if the API service is running. Requires the X-API-Key header.
| Header | Value |
|---|---|
X-API-Key | Your API key |
{
"status": "ok"
}Upload a video file and receive the predicted human activity label from the Kinetics-400 classification model.
Request
POST /human_activity/activity-prediction
Content-Type: multipart/form-data
X-API-Key: <your-api-key>
// Form field
file: <video-file>Request Parameters
| Field | Type | Required | Description |
|---|---|---|---|
file | binary | Required | Video file (MP4, AVI, MOV) |
Response
{
"activity": "playing guitar"
}activity field contains one of the 400 Kinetics-400 class labels — e.g., "running", "doing yoga", "playing basketball", "swimming". Error Handling
| Status Code | Meaning |
|---|---|
| 400 | Bad Request — invalid or missing video file |
| 401 | Unauthorized — invalid or missing API key |
| 404 | Not Found |
| 422 | Validation Error — missing or incorrect request data |
| 500 | Internal Server Error — model failure or corrupt video |
OCR API
Extract text from images (JPG, PNG, BMP, TIFF, WebP) and DOCX documents using a deep learning OCR pipeline — DB++ for text detection, CRNN for text recognition.
Overview
The OCR API Service is a scalable platform built using FastAPI that provides a unified REST API for extracting text from images and .docx files. The system features intelligent routing via /api/ocr, asynchronous processing for high concurrency, API key-based authentication, and structured logging with request tracing.
The core processing uses a deep learning pipeline: DB++ for accurate text detection and CRNN for reliable text recognition, supported by image preprocessing for improved performance. Real-world use cases include license plate recognition, street sign detection, and document digitization.
Supported Input Types
| Type | Formats | Max Size |
|---|---|---|
| Images | JPG, PNG, BMP, TIFF, WebP | 10 MB |
| Documents | DOCX | 10 MB |
Authentication & Rate Limiting
X-API-Key: <your-api-key>Example:
X-API-Key: demo_abc123 — Invalid or missing key returns 401 Unauthorized. | Method | Rate Limit | Scope |
|---|---|---|
GET | 60 requests / minute | Per API key |
POST | 10 requests / minute | Per API key |
Endpoints
Checks whether the API service is running and the EasyOCR model is successfully loaded.
{
"status": "ok",
"model": "loaded"
}Upload an image or DOCX file to extract all contained text. Returns a structured JSON response with each detected text segment as an array entry.
Request
POST /ocr/api/ocr
Content-Type: multipart/form-data
X-API-Key: YOUR_API_KEY
// Form field
file: <image-or-docx-file>Request Parameters
| Field | Type | Required | Description |
|---|---|---|---|
file | file | Required | Image (JPG/PNG/BMP/TIFF/WebP) or DOCX document. Max 10 MB. |
Response
{
"status": "success",
"detected_type": "document",
"input": {
"original_filename": "report.docx",
"saved_filename": "7e2c977b-f5a7-4caf-9ddf-e6ddefbde99b.docx"
},
"output": [
{ "text": "AI Models Server Deployment Guide" },
{ "text": "Purpose & Scope:" },
{ "text": "This document explains the deployment process..." }
]
}Error Handling
| Code | Error | Description | Common Causes | Fix |
|---|---|---|---|---|
| 400 | Bad Request | Invalid or malformed request | Unsupported file type, file > 10 MB | Check file type and size |
| 401 | Unauthorized | Authentication failed | Missing or wrong X-API-Key | Add valid X-API-Key header |
| 422 | Unprocessable | Missing or incorrect data | No file attached, wrong field name | Use field name file |
| 500 | Server Error | Unexpected OCR or server failure | Model crash, corrupt image, disk error | Retry or check server logs |
Tracking API
Real-time multi-object detection and tracking across video frames. Returns an annotated MP4 video with bounding boxes and temporally consistent tracking IDs drawn on every detected object.
Overview
The Tracking API is a high-performance, enterprise-grade computer vision system engineered to deliver real-time object detection and multi-object tracking over video streams. It processes uploaded video files and returns an annotated video with precisely localized bounding boxes, semantic class labels, and temporally consistent tracking identifiers (IDs) across frames.
Key Features
- Object detection + multi-object tracking in a single pipeline
- Persistent tracking IDs maintained across all frames
- Supports 80 standard object classes
- Input: MP4, AVI, MOV, MKV, WebM video files
- Output: Annotated MP4 video (binary download) with bounding boxes and IDs
- Optimized for CPU/GPU; horizontally scalable via Docker
Input / Output Summary
| Property | Details |
|---|---|
| Input format | MP4, AVI, MOV, MKV, WebM |
| Output format | Annotated MP4 video (binary download) |
| Annotations | Bounding boxes + tracking IDs per object, per frame |
| Classes supported | 80 classes (see Appendix) |
Authentication & Rate Limiting
X-API-Key: <your-api-key>Example:
X-API-Key: demo_abc123 — Invalid or missing key returns 401 Unauthorized. | Method | Rate Limit | Scope |
|---|---|---|
GET | 60 requests / minute | Per API key |
POST | 10 requests / minute | Per API key |
Endpoints
Check whether the API is running and the tracking model is loaded. No parameters required.
{
"status": "ok",
"model_loaded": true
}Upload a video file and receive a downloadable annotated video with bounding boxes and persistent tracking IDs drawn on all detected objects across every frame.
Request
POST /tracking/api/track-video/
Content-Type: multipart/form-data
X-API-Key: <your-api-key>
// Form field
file: <video-file>Request Parameters
| Field | Type | Required | Description |
|---|---|---|---|
file | file | Required | Video file (MP4, AVI, MOV, MKV, WebM) |
Response
video/mp4 file with bounding boxes and tracking IDs rendered on all objects. This is not a JSON response.Processing Flow
- Upload video — Client sends the video via
multipart/form-data - Validate file — Format and size checks are performed server-side
- Frame-by-frame processing — Each frame is analyzed sequentially
- Object detection — Detection model identifies all objects in each frame
- Assign tracking IDs — Tracker assigns persistent, unique IDs across frames
- Draw annotations — Bounding boxes and IDs are rendered onto each frame
- Return video — The processed annotated MP4 is streamed back as a binary download
Error Handling
| Status Code | Meaning |
|---|---|
| 400 | Bad Request — invalid input file |
| 401 | Unauthorized — invalid or missing API key |
| 403 | Forbidden — access denied |
| 413 | Payload Too Large — file exceeds the size limit |
| 422 | Validation Error — missing or malformed request body |
| 500 | Internal Server Error — model failure or processing error |
Appendix — All 80 Supported Classes
person · bicycle · car · motorcycle · airplane · bus · train · truck · boat · traffic light · fire hydrant · stop sign · parking meter · bench · bird · cat · dog · horse · sheep · cow · elephant · bear · zebra · giraffe · backpack · umbrella · handbag · tie · suitcase · frisbee · skis · snowboard · sports ball · kite · baseball bat · baseball glove · skateboard · surfboard · tennis racket · bottle · wine glass · cup · fork · knife · spoon · bowl · banana · apple · sandwich · orange · broccoli · carrot · hot dog · pizza · donut · cake · chair · couch · potted plant · bed · dining table · toilet · tv · laptop · mouse · remote · keyboard · cell phone · microwave · oven · toaster · sink · refrigerator · book · clock · vase · scissors · teddy bear · hair drier · toothbrush