Qwen3-ASR-Flash is Alibaba's automatic speech recognition service, built on the Qwen3-Omni foundation and trained on tens of millions of hours of multimodal speech data. The model handles 11 languages — including Chinese (with Cantonese, Sichuanese, Minnan, and Wu dialects), English, Arabic, French, German, Spanish, Italian, Portuguese, Russian, Japanese, and Korean — with automatic language detection so no manual configuration is needed for mixed-language audio. The model is designed for difficult acoustic conditions: it transcribes lyrics over background music, handles noisy and far-field recordings, filters silence and non-speech audio, and accepts arbitrary context text (names, jargon, domain terminology) to bias recognition toward specific vocabulary.