Code-switching (CS) is the phenomenon of alternating between two or more languages or language varieties within a single conversation, sentence, or even word. It is the natural mode of communication for hundreds of millions of multilingual speakers worldwide and presents fundamental challenges for speech technology systems that are typically designed for monolingual input. Code-switching in speech requires the ASR system to simultaneously handle the phonologies, vocabularies, and grammars of multiple languages, and to switch seamlessly between them at unpredictable points.
Types and Linguistic Patterns
"I finished my homework. Enseguida vamos al parque."
Intra-sentential: language switch within a sentence
"I was going to the tienda to buy some groceries."
Tag-switching: insertion of a tag in a different language
"That was amazing, n'est-ce pas?"
Constraints: Matrix Language Frame model (Myers-Scotton, 1993)
Linguistic theories of code-switching, such as Myers-Scotton's Matrix Language Frame (MLF) model, posit that one language (the matrix language) provides the grammatical framework while the other (the embedded language) contributes content morphemes at permitted structural positions. These theories predict where switches can and cannot occur, providing linguistic constraints that can inform computational models. However, the diversity of code-switching patterns across language pairs and communities means that no single model captures all observed behavior.
ASR for Code-Switched Speech
Building ASR systems for code-switched speech faces several challenges. The acoustic models must handle phoneme inventories from multiple languages, including sounds that exist in one language but not the other. The language model must assign reasonable probabilities to mixed-language sequences, which are rare in monolingual training corpora. The pronunciation lexicon must cover words from all involved languages. End-to-end models that jointly learn acoustic and language modeling have shown promise for CS-ASR, as they can implicitly learn the patterns of language alternation without explicit language boundary annotation.
A prerequisite for many CS processing tasks is knowing which language is being spoken at each point in the utterance. Frame-level or word-level language identification in code-switched speech is far more challenging than utterance-level language ID for monolingual speech, because the system must detect switches that can occur at any word boundary (or even within words in the case of morphological code-switching). Acoustic-based language ID struggles because many phonemes are shared across languages, while lexical-based approaches fail when the ASR transcript is unreliable. Joint acoustic-lexical models that combine both sources of evidence achieve the best performance.
Data scarcity is the primary bottleneck for CS speech technology. Code-switched speech is predominantly conversational and informal, making it difficult to collect and transcribe at scale. Researchers have addressed this through data augmentation (splicing monolingual utterances to simulate code-switching), synthetic code-switching in text (for language model training), and multilingual pre-training that provides shared representations across languages. Shared tasks such as the LREC-COLING code-switching workshops have established benchmarks for popular language pairs including Mandarin-English, Spanish-English, and Hindi-English.
Beyond ASR, code-switching affects every stage of the NLP pipeline: part-of-speech tagging, parsing, named entity recognition, and sentiment analysis all must handle mixed-language input. The study of computational approaches to code-switching also connects to sociolinguistic questions about when and why speakers switch languages, contributing to our understanding of multilingual cognition and communication.