Skip to main content
DeploymentsContainer

Batch Persistent Worker

Run a long-lived HTTP transcription worker that accepts multiple jobs without restarting, reducing turnaround time and improving GPU utilisation.

Available from version 15.5.0

A batch persistent worker (also called an HTTP batch worker) is a long-running transcription service that accepts jobs over HTTP. Unlike standard batch workers, it stays alive between jobs — meaning you pay the startup cost once, not on every request.


Why use a persistent worker?

Standard batchPersistent worker
Startup costPer jobOnce
Memory usageOne container per jobMultiple jobs share one container
GPU utilisationInterrupted between jobsContinuous
Best forLarge, infrequent filesHigh throughput or smaller files

The persistent worker is especially beneficial for smaller audio files, where startup overhead would otherwise dominate total turnaround time.


Starting the worker

docker run -it \
-e LICENSE_TOKEN=$TOKEN_VALUE \
-p PORT:18000 \
batch-asr-transcriber-en:15.5.0 \
--run-mode http \
--parallel=4 \
--all-formats /output_dir_name

Parameters

ParameterDescription
--parallelNumber of parallel engines (each engine maps to one GPU connection). Increase this to improve throughput, up to your GPU's capacity.
--all-formatsDirectory where all job outputs and logs are saved. If omitted, defaults to /tmp/jobs. See Generating multiple transcript formats for details.
PORTThe local port forwarded to the container's internal port (18000).

To use a different internal port, set the SM_BATCH_WORKER_LISTEN_PORT environment variable.


Submitting a job

curl -X POST address.of.container:PORT/v2/jobs \
-H 'X-SM-Processing-Data: {"parallel_engines": 2, "user_id": "MY_USER_ID"}' \
-F 'config={
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"operating_point": "enhanced"
}
}' \
-F 'data_file=@~/audio_file.mp3'

Response codes

CodeMeaning
201Job accepted. Returns {"job_id": "abcdefgh01"}
400Invalid request
503Server busy — not enough free engines

Managing capacity

The worker processes multiple jobs concurrently, up to the --parallel limit you set at startup.

Each job can request multiple engines using the parallel_engines value in the X-SM-Processing-Data header. More engines per job means faster turnaround for that job, at the cost of reduced concurrency for others.

To check available capacity before submitting, query the /jobs health endpoint. The unused_engines field tells you how many engines are free.

If a job requests more engines than are currently available, it will be rejected:

HTTP 503: {"detail": "Server busy: 8 engines not available (2 engines in use, 5 parallel allowed)"}

Requesting parallel engines

curl -X POST address.of.container:PORT/v2/jobs \
-H 'X-SM-Processing-Data: {"parallel_engines": 2}' \
-F 'config={"type": "transcription", "transcription_config": {"language": "en"}}' \
-F 'data_file=@~/audio_file.mp3'

Speaker identification

To enable the Speaker identification feature you can use the same logic used for the one shot batch container. To enable per-customer encrypted identifiers (as used in our SaaS offering), pass a user_id in the X-SM-Processing-Data header.

curl -X POST address.of.container:PORT/v2/jobs \
-H 'X-SM-Processing-Data: {"user_id": "MY_USER_ID"}' \
-F 'config={
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"operating_point": "enhanced"
}
}' \
-F 'data_file=@~/audio_file.mp3'

For details on secrets management, refer to the Speaker identification documentation.


Job API reference

GET /v2/jobs

Returns a list of jobs.

Query parameters:

ParameterDescription
created_beforeISO 8601 datetime. Only return jobs created before this time.
limitMax number of jobs to return (1–100).

Example response:

{
"jobs": [
{
"id": "191f47e4a4204fa4ac2b",
"created_at": "2026-03-18T19:27:42.436Z",
"data_name": "5_min",
"text_name": null,
"duration": 300,
"status": "RUNNING",
"config": {
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"operating_point": "enhanced"
}
}
},
{
"id": "6dcb02e0dc5943e2b643",
"created_at": "2026-03-18T19:27:47.550Z",
"data_name": "5_min",
"text_name": null,
"duration": 300,
"status": "RUNNING",
"config": {
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"operating_point": "enhanced"
}
}
}
]
}

GET /v2/jobs/{job_id}

Returns the status of a specific job.

Example response:

{
"job": {
"id": "191f47e4a4204fa4ac2b",
"created_at": "2026-03-18T19:27:42.436Z",
"data_name": "5_min",
"duration": 300,
"status": "DONE",
"config": {
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"operating_point": "enhanced"
}
},
"request_id": "191f47e4a4204fa4ac2b"
}
}

GET /v2/jobs/{job_id}/transcript

Returns the transcript for a completed job.

Query parameters:

ParameterOptions
formatjson, txt, srt

Error responses:

CodeReason
404Job not found, job not yet complete (includes current status), or unsupported format

GET /v2/jobs/{job_id}/log

Returns the processing logs for a specific job.


Health endpoints

The worker exposes three health endpoints on the same port as job submission.

These endpoints are designed to work as liveness and readiness probes in a Kubernetes cluster.

GET /jobs

Returns current engine usage and a list of active jobs. Use unused_engines to determine how many engines you can request for the next job.

Example response:

{
"active_jobs": [
{ "job_id": "f8a564954b334eecb823", "parallel_engines": 1 },
{ "job_id": "29351ae8cf2c4e8694f0", "parallel_engines": 1 }
],
"max_engines": 8,
"unused_engines": 6
}

GET /live

Liveness probe. Returns 200 when all container services are running and healthy.

{ "live": true }

GET /ready

Readiness probe. Returns 200 when at least one engine slot is free, 503 when all engines are occupied.

{
"ready": true,
"engines_used": 2
}

Environment variables

VariableDescription
SM_BATCH_WORKER_LISTEN_PORTOverride the default internal port (18000).
SM_BATCH_WORKER_MAX_JOB_HISTORYMaximum number of completed job records to retain in memory.