Translation Memory

Translation memory (TM) systems store pairs of source and target language segments — typically sentences — that have been previously translated and approved by human translators. When a new source segment is encountered, the TM system searches for similar or identical segments in its database and presents the stored translations as suggestions. This technology, which emerged in the 1980s, has become indispensable in professional translation workflows, dramatically increasing translator productivity for repetitive and technical content.

Fuzzy Matching

Edit-Distance Similarity sim(s, s') = 1 − editDist(s, s') / max(|s|, |s'|)

Exact match: sim = 1.0 (100% match)
Fuzzy match: 0.7 ≤ sim < 1.0 (typically 70–99%)
No match: sim < threshold

editDist computed at word or character level

The core functionality of a TM system is segment retrieval based on similarity. Exact matches (100% match) directly reuse stored translations. Fuzzy matches — segments that are similar but not identical to stored entries — are presented with the differences highlighted, allowing the translator to edit the suggested translation rather than translating from scratch. Most systems use edit distance (Levenshtein distance) at the word level as the primary similarity metric, though modern systems also incorporate sub-segment matching, terminology recognition, and semantic similarity measures.

TM in Professional Translation

Translation memory technology is most effective for content with high repetition rates, such as software documentation, legal texts, product manuals, and regulatory filings. In these domains, TM leverage rates — the proportion of new text covered by exact or fuzzy matches — can exceed 60–80%, yielding substantial productivity gains and cost savings. TM systems also promote terminological consistency across large translation projects and multiple translators, which is critical for technical and legal content.

TMX Standard

The Translation Memory eXchange (TMX) format, developed by the Localization Industry Standards Association (LISA), is the standard XML format for exchanging translation memory data between tools. TMX files store translation units (source-target segment pairs) along with metadata including creation date, creator, language codes, and change tracking information. This interoperability standard allows translators to move their valuable TM assets between different CAT (Computer-Assisted Translation) tools.

Integration with Machine Translation

Modern translation workflows increasingly combine TM with MT. In a typical hybrid workflow, exact and high-fuzzy TM matches are used directly, while segments with low or no TM matches are processed by an MT engine, and the translator post-edits the MT output. This approach leverages the reliability of TM for repetitive content and the broad coverage of MT for novel text. Some systems implement adaptive MT that learns from translator corrections in real time, effectively bridging the gap between TM and MT.

Research has also explored using TM as a knowledge source for neural MT. Translation pairs retrieved from TM can be provided as additional context to the NMT model, improving translation quality for domain-specific content. Conversely, MT output can be used to pre-populate TM databases, particularly for new domains or language pairs where no prior translations exist. The convergence of TM and MT represents a significant trend in the translation industry.

Fuzzy Matching

TM in Professional Translation

Integration with Machine Translation

References

External Links

Fuzzy Matching

TM in Professional Translation

Integration with Machine Translation

Related Topics

References

External Links