API for object detection, depth estimation, and distance prediction using computer vision models
This API provides access to advanced computer vision models for real-time image processing. It leverages:
The API supports both HTTP and WebSocket protocols:
Process a single image via HTTP request
Stream images for real-time processing via WebSocket
Process a single image for object detection, depth estimation, and distance prediction.
Content-Type: multipart/form-data
Parameter | Type | Required | Description |
---|---|---|---|
file | File | Yes | The image file to process (JPEG, PNG) |
Returns a JSON object containing:
Field | Type | Description |
---|---|---|
objects | Array | Array of detected objects with their properties |
objects[].class | String | Class of the detected object (e.g., 'car', 'person') |
objects[].distance_estimated | Number | Estimated distance of the object |
objects[].features | Object | Features used for prediction (bounding box, depth information) |
frame_id | Number | ID of the processed frame (0 for HTTP requests) |
timings | Object | Processing time metrics for each step |
{ "objects": [ { "class": "car", "distance_estimated": 15.42, "features": { "xmin": 120.5, "ymin": 230.8, "xmax": 350.2, "ymax": 480.3, "mean_depth": 0.75, "depth_mean_trim": 0.72, "depth_median": 0.71, "width": 229.7, "height": 249.5 } }, { "class": "person", "distance_estimated": 8.76, "features": { "xmin": 450.1, "ymin": 200.4, "xmax": 510.8, "ymax": 380.2, "mean_depth": 0.58, "depth_mean_trim": 0.56, "depth_median": 0.55, "width": 60.7, "height": 179.8 } } ], "frame_id": 0, "timings": { "decode_time": 0.015, "models_time": 0.452, "process_time": 0.063, "json_time": 0.021, "total_time": 0.551 } }
Status Code | Description |
---|---|
200 | OK - Request was successful |
400 | Bad Request - Empty file or invalid format |
500 | Internal Server Error - Processing error |
Stream images for real-time processing and get instant results. Ideal for video feeds and applications requiring continuous processing.
Note: WebSocket offers better performance for real-time applications. Use this endpoint for processing video feeds or when you need to process multiple images in rapid succession.
Send binary image data directly over the WebSocket connection:
The WebSocket API returns the same JSON structure as the HTTP API, with incrementing frame_id values.
{ "objects": [ { "class": "car", "distance_estimated": 14.86, "features": { "xmin": 125.3, "ymin": 235.1, "xmax": 355.7, "ymax": 485.9, "mean_depth": 0.77, "depth_mean_trim": 0.74, "depth_median": 0.73, "width": 230.4, "height": 250.8 } } ], "frame_id": 42, "timings": { "decode_time": 0.014, "models_time": 0.445, "process_time": 0.061, "json_time": 0.020, "total_time": 0.540 } }
You can test the API directly using the interactive Swagger UI below:
Upload an image to test the WebSocket endpoint:
Status: Disconnected
Last Response: