Clavrit AI Model Hub

AI API Reference & Documentation

Complete reference for Clavrit's production-ready computer vision APIs — Detection, Tracking, OCR, Human Activity Recognition, and Best Frame Selection, all hosted at aimodels.clavrit.com.

Base URLhttps://aimodels.clavrit.com

Available APIs

🎬

Best Frame Selection

Extract the top 3 highest-quality frames from a video using sharpness, motion & noise analysis. Async processing with job tracking.

GETPOSTFastAPI · Async

🔍

Detection API

Detect helmets, license plates, and 83 object classes in images and videos. Returns bounding boxes, labels, and confidence scores.

GETPOST83 Classes · JSON

🏃

Human Activity Recognition

Classify human activities across 400 Kinetics-400 categories from uploaded video files using MMAction2 / TSN deep learning.

GETPOST400 Activities

📄

OCR API

Extract text from images (JPG, PNG, BMP, TIFF, WebP) and DOCX documents using DB++ detection and CRNN recognition.

GETPOSTImages · DOCX · 10 MB

📍

Tracking API

Multi-object detection and tracking across video frames with temporally consistent IDs. Returns an annotated MP4 video file.

GETPOST80 Classes · Video Out

Quick Start

Obtain an API Key

Contact Clavrit to receive your X-API-Key. All endpoints require this header for authentication.

Check Service Health

Hit the GET /health endpoint on any API to confirm the service is running before uploading media.

Upload Your Media

Send multipart/form-data POST requests with your image or video in the file field (or video for Best Frame).

Handle the Response

All APIs return structured JSON. Tracking returns a binary video file. Check HTTP status codes for errors.

API ReferenceComputer Vision

Best Frame Selection API

Extract the top 3 highest-quality frames from a video using sharpness detection, motion analysis, and noise filtering — with async processing and job tracking.

Overview

The Best Frame Extraction API is a production-grade video processing service built using FastAPI, engineered to intelligently analyze video content and extract the top three highest-quality frames from an uploaded file. It leverages efficient frame sampling and quality assessment techniques — including sharpness detection, motion analysis, and noise filtering — to ensure only the most visually clear and information-rich frames are returned.

The system processes uploads asynchronously to handle high concurrency and large file sizes without performance degradation, making it ideal for content moderation, video summarization, surveillance analytics, and ML preprocessing workflows.

Key Features

Asynchronous background processing — returns a job_id immediately, no blocking
Sharpness, motion & noise quality scoring per sampled frame
Supports MP4, AVI, MOV, MKV, WebM input formats
Returns top 3 frames as JPEG/PNG with frame position and timestamp metadata
Configurable file size limits and frame selection strategies via environment variables
Built-in health monitoring endpoint for load balancers and orchestrators

Authentication & Rate Limiting

Authentication is controlled by the API_KEYS environment variable on the server. If API_KEYS is not set, the API allows unrestricted access. When set, every request must include a valid key in the header below.

Required Header:X-API-Key: <your-secret-key>
An invalid or missing key returns 401 Unauthorized.

Method	Rate Limit	Scope
`GET`	60 requests / minute	Per API key
`POST`	10 requests / minute	Per API key

Exceeding the rate limit returns 429 Too Many Requests. Limits are applied independently per API key.

Endpoints

All endpoints are prefixed with /api/. An interactive Swagger UI explorer is available at /docs when the server is running.

GEThttps://aimodels.clavrit.com/best_frame_selection/api/health

Liveness and readiness check for load balancers, monitoring tools, and container orchestrators. No authentication required.

Response

JSON · 200 OK

{
  "status": "ok"
}

POSThttps://aimodels.clavrit.com/best_frame_selection/api/best-frame/

Upload a video file for processing. Initiates asynchronous frame extraction and returns a job_id immediately for polling status.

Request

HTTP

POST /best_frame_selection/api/best-frame/
Content-Type: multipart/form-data
X-API-Key: <your-api-key>

// Form field
video: <video-file>

Request Parameters

Field	Type	Required	Description
`video`	binary	Required	Video file (MP4, AVI, MOV, MKV, WebM)

Response

JSON · 200 Accepted

{
  "status": "accepted",
  "job_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status_url": "/api/best-frame/a1b2c3d4-e5f6-7890-abcd-ef1234567890/status"
}

Poll the status_url to track processing. When complete, the response will include the three extracted frame URLs along with frame position and timestamp metadata.

Error Handling

Code	Error	Description	Common Causes	Fix
400	Bad Request	Invalid or malformed request	Unsupported file type or file exceeds limit	Check file type and size
401	Unauthorized	Authentication failed	Missing or invalid `X-API-Key` header	Add a valid `X-API-Key`
422	Unprocessable	Missing or incorrect request data	No file attached, wrong field name	Use field name `video`
429	Too Many Requests	Rate limit exceeded	More than 10 POST/min per key	Reduce request frequency
500	Server Error	Unexpected server failure	Corrupt video, model crash, disk error	Retry or contact support

API ReferenceComputer Vision

Detection API

AI-powered object detection for images and videos. Detects helmets, license plates, and 83 object classes. Returns bounding boxes, category labels, and confidence scores as structured JSON.

Overview

The Detection API is a high-performance, AI-powered computer vision system designed to detect a wide range of objects — including helmets worn by riders, motorcycle license plates, car license plates, and 83 standard object classes. Built on FastAPI and driven by state-of-the-art object detection models, all responses are structured JSON.

Use Cases

Domain	Application
Traffic Monitoring	Automated helmet and license plate detection at road intersections
Law Enforcement	Identifying compliance violations from camera feeds in real time
Smart City Analytics	Data collection for urban mobility and road safety insights
Surveillance	Smart camera integration for continuous area monitoring

Key Outputs per Detection

Bounding box coordinates [x1, y1, x2, y2] per detected object
Object category / class label (e.g., "helmet", "car")
Confidence score (0.0 – 1.0)
Per-frame results when processing video inputs

Authentication & Rate Limiting

Required Header:X-API-Key: <your-api-key>
Example: X-API-Key: demo_abc123 — Invalid or missing key returns 401 Unauthorized.

Method	Rate Limit	Scope
`GET`	60 requests / minute	Per API key
`POST`	10 requests / minute	Per API key

Endpoints

GEThttps://aimodels.clavrit.com/detection/health

Check if the API is running and the detection model is loaded. No parameters required.

JSON · 200 OK

{
  "status": "ok"
}

POSThttps://aimodels.clavrit.com/detection/detect/media

Processes a video or image file and performs object detection. Returns per-frame detection results including bounding boxes, class labels, and confidence scores.

Request

HTTP

POST /detection/detect/media
Content-Type: multipart/form-data
X-API-Key: <your-api-key>

// Form field
file: <video-or-image-file>

Request Parameters

Field	Type	Required	Description
`file`	binary	Required	Video (MP4, AVI, MOV, MKV, WebM) or image file

Response

JSON · 200 OK

{
  "type": "video",
  "frames": 1,
  "data": [
    {
      "frame": 50,
      "detections": [
        {
          "class": "helmet",
          "confidence": 0.94,
          "bbox": [120, 45, 210, 180]
        }
      ]
    }
  ]
}

Error Handling

Code	Error	Description	Common Causes	Fix
400	Bad Request	Invalid or malformed request	Unsupported file type, file > 10 MB	Check file type and size
401	Unauthorized	Authentication failed	Missing or wrong `X-API-Key`	Add valid `X-API-Key` header
422	Unprocessable	Missing or incorrect data	No file attached, wrong field name	Use field name `file`
500	Server Error	Unexpected server failure	Model crash, corrupt file, disk error	Retry or check server logs

Appendix — All 83 Supported Object Classes

person · bicycle · car · motorcycle · airplane · bus · train · truck · boat · traffic light · fire hydrant · stop sign · parking meter · bench · bird · cat · dog · horse · sheep · cow · elephant · bear · zebra · giraffe · backpack · umbrella · handbag · tie · suitcase · frisbee · skis · snowboard · sports ball · kite · baseball bat · baseball glove · skateboard · surfboard · tennis racket · bottle · wine glass · cup · fork · knife · spoon · bowl · banana · apple · sandwich · orange · broccoli · carrot · hot dog · pizza · donut · cake · chair · couch · potted plant · bed · dining table · toilet · tv · laptop · mouse · remote · keyboard · cell phone · microwave · oven · toaster · sink · refrigerator · book · clock · vase · scissors · teddy bear · hair drier · toothbrush · helmet · bike (custom) · car license plate · bike license plate

API ReferenceComputer Vision

Human Activity Recognition API

Classify human activities across 400 Kinetics-400 categories from uploaded videos using the MMAction2 / TSN deep learning pipeline. Returns the predicted activity label as JSON.

Overview

The Human Activity Recognition API is a high-performance, production-ready computer vision system that leverages a pretrained deep learning model from the OpenMMLab ecosystem — specifically the MMAction2 framework with a Temporal Segment Network (TSN) model trained on the Kinetics-400 dataset (400 human activity classes).

How It Works

Upload — The client uploads a video file via the POST endpoint.
Store — The video is temporarily stored on the server for processing.
Feature Extraction — The pretrained TSN model processes the video and extracts spatiotemporal features.
Inference — The model classifies the video into one of 400 Kinetics-400 activity categories.
Response — The predicted activity label is returned as a JSON response.
Cleanup — The temporary file is deleted from the server after processing.

Key Features

400 Kinetics-400 activity classes — playing guitar, running, swimming, yoga, basketball, and more
Confidence-ranked class scores across all categories
Temporally consistent inference via MMAction2 backbone + head pipeline
Low-latency video inference with CPU/GPU support
Supports MP4, AVI, MOV video formats

Authentication & Rate Limiting

Required Header:X-API-Key: <your-api-key>
Example: X-API-Key: demo_abc123 — Invalid or missing key returns 401 Unauthorized.

Method	Rate Limit	Scope
`GET`	60 requests / minute	Per API key
`POST`	10 requests / minute	Per API key

Endpoints

GEThttps://aimodels.clavrit.com/human_activity/health

Check if the API service is running. Requires the X-API-Key header.

Header	Value
`X-API-Key`	Your API key

JSON · 200 OK

{
  "status": "ok"
}

POSThttps://aimodels.clavrit.com/human_activity/activity-prediction

Upload a video file and receive the predicted human activity label from the Kinetics-400 classification model.

Request

HTTP

POST /human_activity/activity-prediction
Content-Type: multipart/form-data
X-API-Key: <your-api-key>

// Form field
file: <video-file>

Request Parameters

Field	Type	Required	Description
`file`	binary	Required	Video file (MP4, AVI, MOV)

Response

JSON · 200 OK

{
  "activity": "playing guitar"
}

The activity field contains one of the 400 Kinetics-400 class labels — e.g., "running", "doing yoga", "playing basketball", "swimming".

Error Handling

Status Code	Meaning
400	Bad Request — invalid or missing video file
401	Unauthorized — invalid or missing API key
404	Not Found
422	Validation Error — missing or incorrect request data
500	Internal Server Error — model failure or corrupt video

API ReferenceDocument Processing

OCR API

Extract text from images (JPG, PNG, BMP, TIFF, WebP) and DOCX documents using a deep learning OCR pipeline — DB++ for text detection, CRNN for text recognition.

Overview

The OCR API Service is a scalable platform built using FastAPI that provides a unified REST API for extracting text from images and .docx files. The system features intelligent routing via /api/ocr, asynchronous processing for high concurrency, API key-based authentication, and structured logging with request tracing.

The core processing uses a deep learning pipeline: DB++ for accurate text detection and CRNN for reliable text recognition, supported by image preprocessing for improved performance. Real-world use cases include license plate recognition, street sign detection, and document digitization.

Supported Input Types

Type	Formats	Max Size
Images	JPG, PNG, BMP, TIFF, WebP	10 MB
Documents	DOCX	10 MB

Authentication & Rate Limiting

Required Header:X-API-Key: <your-api-key>
Example: X-API-Key: demo_abc123 — Invalid or missing key returns 401 Unauthorized.

Method	Rate Limit	Scope
`GET`	60 requests / minute	Per API key
`POST`	10 requests / minute	Per API key

Endpoints

GEThttps://aimodels.clavrit.com/ocr/health

Checks whether the API service is running and the EasyOCR model is successfully loaded.

JSON · 200 OK

{
  "status": "ok",
  "model": "loaded"
}

POSThttps://aimodels.clavrit.com/ocr/api/ocr

Upload an image or DOCX file to extract all contained text. Returns a structured JSON response with each detected text segment as an array entry.

Request

HTTP

POST /ocr/api/ocr
Content-Type: multipart/form-data
X-API-Key: YOUR_API_KEY

// Form field
file: <image-or-docx-file>

Request Parameters

Field	Type	Required	Description
`file`	file	Required	Image (JPG/PNG/BMP/TIFF/WebP) or DOCX document. Max 10 MB.

Response

JSON · 200 OK

{
  "status": "success",
  "detected_type": "document",
  "input": {
    "original_filename": "report.docx",
    "saved_filename": "7e2c977b-f5a7-4caf-9ddf-e6ddefbde99b.docx"
  },
  "output": [
    { "text": "AI Models Server Deployment Guide" },
    { "text": "Purpose & Scope:" },
    { "text": "This document explains the deployment process..." }
  ]
}

Error Handling

Code	Error	Description	Common Causes	Fix
400	Bad Request	Invalid or malformed request	Unsupported file type, file > 10 MB	Check file type and size
401	Unauthorized	Authentication failed	Missing or wrong `X-API-Key`	Add valid `X-API-Key` header
422	Unprocessable	Missing or incorrect data	No file attached, wrong field name	Use field name `file`
500	Server Error	Unexpected OCR or server failure	Model crash, corrupt image, disk error	Retry or check server logs

API ReferenceComputer Vision

Tracking API

Real-time multi-object detection and tracking across video frames. Returns an annotated MP4 video with bounding boxes and temporally consistent tracking IDs drawn on every detected object.

Overview

The Tracking API is a high-performance, enterprise-grade computer vision system engineered to deliver real-time object detection and multi-object tracking over video streams. It processes uploaded video files and returns an annotated video with precisely localized bounding boxes, semantic class labels, and temporally consistent tracking identifiers (IDs) across frames.

Key Features

Object detection + multi-object tracking in a single pipeline
Persistent tracking IDs maintained across all frames
Supports 80 standard object classes
Input: MP4, AVI, MOV, MKV, WebM video files
Output: Annotated MP4 video (binary download) with bounding boxes and IDs
Optimized for CPU/GPU; horizontally scalable via Docker

Input / Output Summary

Property	Details
Input format	MP4, AVI, MOV, MKV, WebM
Output format	Annotated MP4 video (binary download)
Annotations	Bounding boxes + tracking IDs per object, per frame
Classes supported	80 classes (see Appendix)

Authentication & Rate Limiting

Required Header:X-API-Key: <your-api-key>
Example: X-API-Key: demo_abc123 — Invalid or missing key returns 401 Unauthorized.

Method	Rate Limit	Scope
`GET`	60 requests / minute	Per API key
`POST`	10 requests / minute	Per API key

Endpoints

GEThttps://aimodels.clavrit.com/tracking/health

Check whether the API is running and the tracking model is loaded. No parameters required.

JSON · 200 OK

{
  "status": "ok",
  "model_loaded": true
}

POSThttps://aimodels.clavrit.com/tracking/api/track-video/

Upload a video file and receive a downloadable annotated video with bounding boxes and persistent tracking IDs drawn on all detected objects across every frame.

Request

HTTP

POST /tracking/api/track-video/
Content-Type: multipart/form-data
X-API-Key: <your-api-key>

// Form field
file: <video-file>

Request Parameters

Field	Type	Required	Description
`file`	file	Required	Video file (MP4, AVI, MOV, MKV, WebM)

Response

Binary response — The API returns a downloadable video/mp4 file with bounding boxes and tracking IDs rendered on all objects. This is not a JSON response.

Processing Flow

Upload video — Client sends the video via multipart/form-data
Validate file — Format and size checks are performed server-side
Frame-by-frame processing — Each frame is analyzed sequentially
Object detection — Detection model identifies all objects in each frame
Assign tracking IDs — Tracker assigns persistent, unique IDs across frames
Draw annotations — Bounding boxes and IDs are rendered onto each frame
Return video — The processed annotated MP4 is streamed back as a binary download

Error Handling

Status Code	Meaning
400	Bad Request — invalid input file
401	Unauthorized — invalid or missing API key
403	Forbidden — access denied
413	Payload Too Large — file exceeds the size limit
422	Validation Error — missing or malformed request body
500	Internal Server Error — model failure or processing error

AI API Reference & Documentation

Best Frame Selection API

Overview

Key Features

Authentication & Rate Limiting

Endpoints

Response

Request

Request Parameters

Response

Error Handling

Detection API

Overview

Use Cases

Key Outputs per Detection

Authentication & Rate Limiting

Endpoints

Request

Request Parameters

Response

Error Handling

Appendix — All 83 Supported Object Classes

Human Activity Recognition API

Overview

How It Works

Key Features

Authentication & Rate Limiting

Endpoints

Request

Request Parameters

Response

Error Handling

OCR API

Overview

Supported Input Types

Authentication & Rate Limiting

Endpoints

Request

Request Parameters

Response

Error Handling

Tracking API

Overview

Key Features

Input / Output Summary

Authentication & Rate Limiting

Endpoints

Request

Request Parameters

Response

Processing Flow

Error Handling

Appendix — All 80 Supported Classes