Home Breaking2nd MLC-SLM Challenge Launches to Advance Multilingual Conversational Speech Understanding

2nd MLC-SLM Challenge Launches to Advance Multilingual Conversational Speech Understanding

by Joseph Wilson
3 minutes read

The 2nd Multilingual Dialogue Speech Language Model Challenge (MLC-SLM Challenge 2026) is now officially open. Designed for researchers and practitioners worldwide, this year’s challenge releases a multilingual conversational speech training set covering 14 languages and approximately 2,100 hours of audio, while introducing upgraded tasks in speaker diarization, automatic speech recognition, and dialogue understanding. Together, these updates are designed to push Speech LLMs beyond transcription toward deeper conversational understanding.


A Stronger Foundation from the First Edition

The direction of the second edition is grounded in the outcomes of the first MLC-SLM Challenge, which was held as a satellite event of Interspeech 2025. The inaugural challenge attracted 78 teams from 13 countries and regions, generated 489 valid leaderboard submissions across two tracks, and received 14 high-quality technical reports. Its summary paper has also been accepted by ICASSP 2026. The first edition showed that while speech recognition performance had already made clear progress, speaker diarization remained a major challenge in complex multilingual, multi-turn conversational settings. These findings helped define the focus of the new edition: not only to expand the scale of the dataset, but also to make both the data and the task design more reflective of real-world conversations.


Expanded Data for Real-World Conversational Research

Based on that goal, the 2026 challenge introduces major upgrades at the data level. The training set covers English, French, German, Italian, Portuguese, Spanish, Japanese, Korean, Russian, Thai, Vietnamese, as well as the newly added Tagalog, Urdu, and Turkish. English accounts for around 500 hours, most other languages contribute roughly 100 hours, and French, Portuguese, and Spanish each contribute about 200 hours, bringing the total to approximately 2,100 hours. All recordings are natural two-speaker conversations built around randomly assigned topics, making the dataset better suited to real-world conversational research.

Broader Language and Accent Diversity

Beyond expanded language coverage and scale, the dataset also strengthens accent and regional diversity. The English portion includes American, British, Australian, Indian, and Filipino English, while Canadian French, Mexican Spanish, and Brazilian Portuguese have also been added. This makes the dataset valuable not only for cross-lingual training, but also for studying how models generalize across regional language varieties.

Two Core Research Tracks

The value of these upgrades lies not only in expanding the dataset itself, but also in supporting more challenging research tasks. The challenge includes two core tracks: Multilingual Conversational Speech Diarization and Recognition and Multilingual Conversational Speech Understanding. In Track 1, systems must model both who spoke when and what was said without access to pre-segmented utterances or speaker labels. Track 2 focuses on acoustic and semantic understanding of entire conversations, pushing research beyond transcription toward more complete dialogue-level understanding.

Registration Now Open

Registration for the 2nd MLC-SLM Challenge is now open. According to the challenge schedule, the training data will be released on April 10, 2026, the development set and baseline systems will be released on April 24, the evaluation set and leaderboard will open on June 15, and the workshop will take place on October 2 at Interspeech2026.

● March 30, 2026: Registration opens

● April 10, 2026: Training data release

● April 24, 2026: Development set and baseline system release

● June 15, 2026: Evaluation set release and leaderboard open

● June 25, 2026: Leaderboard freeze and paper submission portal opens (CMT system)

● July 10, 2026: Paper submission deadline

● July 20, 2026: Notification of acceptance

● October 2, 2026: Workshop date

By offering open data, realistic tasks, and an international exchange platform, the challenge aims to bring together more research teams to advance multilingual conversational speech language modeling. The launch of the second edition also provides a new benchmark for pushing Speech LLMs beyond transcription toward deeper dialogue understanding.

Registration Link:
Google Form: https://forms.gle/jfAZ95abGy4ZiNHo7

Official Website:
https://www.datatang.com/mcslm

Contact information:

mlc-slmw@nexdata.ai

Slack Link:https://join.slack.com/t/mlc-slm/shared_invite/zt-3u6aoxhxh-4WrpJ5M9DBrxR5FB30~iWg

You may also like

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?