Audio Transcription Auto Detection of Language

In this current version, When i want to post request /v1/audio/transcriptions, If i do not pass language parameter, response is nonsense. So, when i did automation for speech to text i do not realize what language every audio files. Could you suggest me any idea according to that?

Currently, vLLM’s /v1/audio/transcriptions endpoint does not support automatic language detection for Whisper and similar models. If you omit the language parameter, the model defaults to English, which can result in incorrect or “nonsense” transcriptions for non-English audio. This limitation is acknowledged by the vLLM team, and users are advised to always specify the language explicitly for accurate results. There is ongoing discussion and partial work towards adding language detection, but it is not yet available in the main release (see here, here, and here).

As a workaround, you should use an external language detection tool to determine the language of each audio file before sending it to vLLM, then pass the detected language as the language parameter in your transcription request. This is currently the recommended approach to avoid incorrect transcriptions and operational issues (see here). Would you like more detail or example code for integrating language detection in your workflow?

Sources: