audio to text

This endpoint allows you to transcribe an audio file using the specified model and parameters.

Request Body Parameters

file (text): The audio file object (not a filename) to transcribe, in the following formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

model (text): The model ID to use. Currently, only whisper-1 (provided by our open-source Whisper V2 model) is available.

prompt (text): Optional text to guide the model's style or to continue the previous audio clip. The prompt should match the audio language.

response_format (text): The output format, available as json, text, srt, verbose_json, or vtt.

temperature (text): The sampling temperature, between 0 and 1. Higher values (e.g., 0.8) produce more random output, while lower values (e.g., 0.2) produce more focused, deterministic output. If set to 0, the model will automatically increase the temperature using log-probability until a certain threshold is reached.

language (text): The language of the input audio. Providing the input language in ISO-639-1 format will improve accuracy and latency.

curl --location --request POST 'https://api.elkapi.com/v1/audio/transcriptions' \ --header 'Authorization: Bearer {{api-key}}' \ --form 'file=@""' \ --form 'model="whisper-1"' \ --form 'prompt="eiusmod nulla"' \ --form 'response_format="json"' \ --form 'temperature="0"' \ --form 'language=""'

{ "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that." }

Request Body Parameters#

Request

Responses

Request Body Parameters