Technical Writing
  I  
June 5, 2020
  I  
xx min read

What is Translation Memory Technology & How It Works

If you work with structured content like DITA, you already understand the power of modularity and reuse. You create content in small, manageable components that can be assembled and published in countless combinations. This same principle of intelligent reuse should apply to your translation workflow. Sending entire documents for translation every time a small component changes is inefficient. Translation memory technology works hand-in-hand with your structured content strategy. It breaks down content into segments, stores the translations, and automatically reuses them wherever that component appears. This creates a seamless, efficient process that preserves the integrity of both your content and its translations.

Stop Translating from Scratch. Start Using Translation Memory.

Take a look at a few of the most formidable foes of translation:

Time, cost, inconsistency.

Historically, these vexing few have plagued translation technology universally. Enough that some organizations won’t translate their content at all to avoid the hassle altogether.

The problem is, avoiding translation removes an organization’s content from countless potential user bases across the world.

However, the answer isn’t poor translation either. Poor translation conveys a careless, inaccurate message to foreign audiences. Oh, and you’re still paying for that message. Apprehension is only natural.

It’s crucial to have translation and localization partners aligned with your organization’s goals, budget, and content objectives. Partners that make sure you’re translating only what you need and translating it accurately.

We’re no longer copying page to page translations like Medieval monks hunched over illuminated manuscripts meant for a few literate aristocrats!

We have technology. We have translation memory.

For localizing and translating content, translation memory is your most powerful ally against the aforementioned foes of time, cost, and consistency.

So, What Exactly Is Translation Memory?

Translation memory (TM) is a collection of translated pieces of text that are stored in a database so they can be recalled and reused later. It can be sentences, paragraphs, sections, really any string of text.

The original (source) content within this database becomes eternally linked to the associated translation. This eternal linking is crucial to the usefulness of TM. Because when there are parts of text that have already been translated, they can be reused repeatedly in more than one place, rather than source text being translated over and over again.

Translation memory is fascinating, but what it allows you to do with your content is a technical superpower. Heretto’s Localization Manager takes translation memory and harnesses these superpowers so you have them right at your fingertips. These are a few of my favorite things.

Core Concepts and Terminology

To really get a handle on how translation memory works its magic, it helps to know the lingo and the tools of the trade. Let's break down the key components that make this technology so effective for managing multilingual content.

CAT Tools, Termbases, and Translation Management Systems

At its heart, a translation memory (TM) is a database. It stores segments of text—like sentences or paragraphs—that you've already translated. Each entry in this database is a pair, containing the original "source" text and its "target" translation. These pairs are called "translation units." TMs are the engine inside special software called computer-assisted translation (CAT) tools. When a translator works on a new document, the CAT tool uses the TM to scan the text for anything it recognizes. If it finds an identical segment, that's a "100% match." If it finds something similar but not quite the same, it's a "fuzzy match," which the translator can then adapt. This entire process is often managed within a larger framework, like a Translation Management System (TMS), which oversees the entire localization workflow from start to finish.

Understanding Segments, Units, and Ownership

It's important to know that translation memories work with "segments," not just individual words. A segment is typically a sentence or a complete thought, which allows the TM to preserve context and grammar, ensuring that reused translations make sense. By remembering these past translations, human translators can focus their expertise on new content instead of re-translating the same phrases over and over. These stored translation units are a valuable asset. And it's an asset that you, the client, typically own, not the translation provider. This ownership is key because it ensures consistency in language and tone across all your projects, even if you switch vendors. Reusing these pre-approved translations also leads to significant cost savings, as work based on TM matches is usually billed at a much lower rate than translating new content from scratch.

Why Structured Content Is Translation Memory's Best Friend

DITA is a topic-based structured content standard. These topics render content easier to be developed in modular pieces. The “A” in DITA refers to Architecture for a valuable reason. Structured content is built by small topics that can exist by themselves. Then, just as bricks are stacked to build a wall, so a whole piece of content is built by stacking topics, which we call components of content.

That way, those components can be pieced together to make a document. When content is stored in components, it’s a much easier way to manage and track translated content in your translation memory.

With a topic-based structured content standard like DITA XML, the way your content is organized is constructed to support the full power of translation memory, making each subsequent job quicker to complete, track, and reuse.

Stop Paying to Translate the Same Sentence Twice

In Heretto’s Localization Manager, we have a nice way of making sure you never translate the same content twice. Once you establish your translation memory, the system crawls through your content and gives three different localization statuses: Current, Out-Of-Date, or Unavailable.

You’ve probably sussed this out, but those color-coded statuses pull from your translation memory to let you know whether the selected parts of your content are currently translated, have out-of-date translations that could use some tweaks, or have never been translated.

This way, when it’s time to package and ship your content to your translators, you will only select the content that needs to be translated. Inconsistency gets crushed here and you’ll never unnecessarily translate the same string of words again. Everything is stored in your translation memory for future use and you can see the status of everything in it.

Ready to Go Global? How Translation Memory Scales With You

You’re building an asset that will grow with you. Everyone starts with no translation memory, similar to how everyone starts with no content. It’s a process!

When your content is written, stored, and organized in components, translation memory works with a more manageable library of translated content. You may start with a few pieces of translated content in your translation memory, but, over time, that library of translated content will grow.

As it grows, your translation jobs will be quicker and quicker. This is especially important because as you scale, your content delivery will also need to scale. Think of how important a robust translation memory is for scaling your content to global markets. The more you add, the more you can reuse later. Your growing content library and translation memory will have you doing less work as time goes on.

---

Whether your organization has five people or 5,000, when it’s time to get your content accurately translated, translation memory will make the process ultimately smoother. Coupled with structured content, you’re able to choose what you translate, save those translations, and reuse again wherever and whenever you want.

Tired of reading about it and want to see it in action? Meet with one of our experts to see the power of built-in translation memory in our Component Content Management System.

How Translation Memory Works in Detail

At its core, translation memory works by breaking down source text into smaller chunks called "segments." When you submit new content for translation, the system scans its database to see if it has ever translated that exact segment—or a similar one—before. This process is managed within a Computer-Assisted Translation (CAT) tool, which presents these matches to the human translator. The translator can then accept, edit, or reject the suggestion. This simple but powerful mechanism is what drives the efficiency of TM. It’s not just about finding identical sentences; the system is smart enough to identify partial similarities, which still saves a significant amount of time and effort.

The Matching Process

The real magic of translation memory lies in its matching process. When a translator begins working on a document, the TM software automatically compares each new segment of source text against the database of previously translated segments. It then categorizes the results based on how closely they match. This isn't a simple yes-or-no check; the system provides a detailed analysis, often including a percentage score, to show the degree of similarity. This allows translators to quickly assess whether a stored translation can be reused as-is, needs a slight modification, or if they need to start from scratch, ensuring they focus their expertise where it's needed most.

Exact and In-Context Exact (ICE) Matches

The best-case scenario in the matching process is a 100% match, also known as an exact match. This means the new source segment is identical to a segment already stored in the translation memory. An even better result is an In-Context Exact (ICE) match. An ICE match is a 100% match that also appears with the exact same surrounding segments as it did in a previous translation. This is crucial for technical documentation, where a term’s meaning can change based on its context. Leveraging ICE matches ensures the highest level of consistency and accuracy, which is much easier to achieve when you create structured content in modular components.

Fuzzy Matches and Concordance Search

When a segment is similar but not identical to one in the TM, it’s called a "fuzzy match." The system assigns a percentage to indicate how close the match is—for example, an 85% match might only have one different word. The translator can then edit the existing translation to fit the new context, which is still much faster than translating the entire sentence from the beginning. For more specific queries, translators can use a concordance search to look up how individual words or phrases have been translated in the past, even if they don't appear in a full segment match. This helps maintain consistency with key terminology across all documentation.

Building Your First Translation Memory with Alignment

If your organization already has a library of translated documents but is new to using a TM system, you don't have to start from zero. You can build your first translation memory using a process called "alignment." Alignment tools take your existing source documents and their corresponding translations, automatically matching up the segments side-by-side to create translation units. This process populates your TM database with all your past work, turning your existing content into a valuable, reusable asset. It’s an effective way to jumpstart your localization efforts and begin seeing the cost and time benefits of TM immediately.

Translation Memory vs. Machine Translation

It’s easy to confuse translation memory with machine translation (MT), but they are fundamentally different technologies. Translation memory is a database of human-approved translations. It reuses content that a professional translator has already created and verified, ensuring consistency and quality. Think of it as a smart copy-and-paste tool that never forgets a translation. In contrast, machine translation, like Google Translate, uses artificial intelligence to generate a new translation automatically, without any prior human input for that specific text. While MT can be useful for getting the gist of a text, TM is essential for the precision and control required in professional and technical documentation. They can be used together, but TM is built on human expertise, while MT is built on algorithms.

Challenges and Costs of Translation Memory

While translation memory is a powerful tool, it’s not a "set it and forget it" solution. Implementing and maintaining a TM system comes with its own set of challenges and costs. The initial setup requires an investment in software and potentially in the alignment of existing translated assets. Furthermore, the quality of the TM is entirely dependent on the quality of the translations fed into it. Without proper oversight and maintenance, a translation memory can become cluttered with outdated or incorrect entries, which can diminish its value over time. Recognizing these potential hurdles is the first step toward creating a sustainable and effective localization strategy.

Potential Drawbacks to Consider

Before diving in, it's important to be aware of the potential drawbacks of relying on a translation memory. The primary risk is the propagation of errors. If an incorrect translation is saved to the TM, that mistake can be automatically reused across multiple documents until someone catches it and corrects it. This highlights the need for a robust review process. Additionally, there's a risk that focusing too heavily on reusing existing segments can lead to translations that feel disjointed or unnatural. A successful TM strategy requires a balance between leveraging past work and ensuring the final text flows smoothly and reads well for the target audience.

The "Peep-Hole" Effect and Error Repetition

One specific challenge translators face is the "peep-hole" effect. When working segment by segment, it can be difficult to see the broader context of the document, which can lead to translations that are technically correct but stylistically inconsistent. This is also how errors get repeated. If a mistranslated term or an awkward phrase is saved as a 100% match, it will be suggested again and again, reinforcing the initial mistake. Strong content governance and clear style guides are essential to prevent these issues and ensure that your TM remains a reliable and accurate resource for your entire team.

Ongoing Maintenance and Technology Costs

A translation memory is a living database that requires ongoing care. As your products evolve and terminology is updated, your TM must be maintained to reflect those changes. This maintenance is often a manual process of reviewing, editing, and cleaning out old or incorrect entries. Without regular upkeep, the TM can become less effective, with fuzzy match percentages dropping and inconsistencies creeping in. This maintenance effort, combined with software licensing and update costs, should be factored into your overall localization budget. Treating your TM as a critical content asset is key to maximizing its long-term return on investment.

The Evolution of Translation Memory Technology

Translation memory technology is not a new invention; it has a long history of helping organizations streamline their localization efforts. The core concept has been around for decades, but advancements in software and computing power have made it more accessible and powerful than ever before. From its early days as a complex system used by large corporations to its current form as an integrated feature in modern content management systems, TM has continuously evolved. Understanding this evolution helps appreciate the stability and maturity of the technology that technical documentation teams rely on today for global content delivery.

A Brief History

The idea of reusing previous translations to speed up new projects emerged in the 1970s, but it wasn't until the rise of personal computers in the 1990s that translation memory became a commercially viable tool. Early systems were groundbreaking, allowing translators to build databases of their work for the first time. This marked a major shift in the localization industry, moving from a completely manual process to one assisted by technology. Over the years, these tools have become more sophisticated, offering better match analysis, greater integration with other systems, and more user-friendly interfaces, solidifying TM's role as a cornerstone of modern translation workflows.

Technical Standards and File Compatibility

To ensure that translation memories could be shared between different software tools and vendors, the industry developed technical standards. The most important of these is TMX (Translation Memory eXchange), an XML-based format that allows for the seamless exchange of TM data between different CAT tools. This interoperability is critical for companies that work with multiple translation agencies or use a variety of software in their content lifecycle. A platform that supports standards like TMX, such as a Component Content Management System, ensures that your valuable translation assets are never locked into a single proprietary system, giving you flexibility and control over your localization process.

Frequently Asked Questions

How is translation memory different from machine translation like Google Translate? Think of translation memory as a database of your own pre-approved, human-translated content. It intelligently reuses exact phrases and sentences that your team has already verified. Machine translation, on the other hand, uses AI to generate a brand new translation from scratch. Translation memory provides consistency and control based on past human work, while machine translation offers a quick, automated interpretation.

We have a lot of translated documents but have never used a TM system. Do we have to start over? Not at all. You can use a process called "alignment" to build your first translation memory from your existing content. An alignment tool takes your source documents and their corresponding translations and matches them up, segment by segment. This process populates your new TM database, effectively turning all your past translation work into a valuable, reusable asset from day one.

Why is structured content so important for getting the most out of translation memory? Structured content is created in small, self-contained components or topics. This modular approach works perfectly with translation memory, which also breaks content down into small segments. When you update a single component, you only need to translate that one small piece of text. The TM then recognizes that translated segment wherever the component is reused, which dramatically reduces costs and ensures consistency across all your documentation.

What happens to our translation memory if we decide to switch to a new translation provider? Your translation memory is an asset that belongs to your organization, not your translation vendor. As long as your TM is managed in a standard format like TMX (Translation Memory eXchange), you can easily transfer it to a new partner or a different software tool. This portability gives you complete control and prevents you from being locked into a single provider.

How do you prevent a mistake from being saved and reused over and over again? Preventing errors from spreading requires a strong review and governance process. New translations should always be reviewed by a human expert before they are saved to the memory. It's also wise to perform regular maintenance on your TM to clean out any outdated or incorrect entries. Just like any other critical content asset, your translation memory needs consistent oversight to maintain its quality and reliability.

Key Takeaways

  • Automate reuse and cut translation costs: Translation memory stores your human-approved translations in a database. This allows you to automatically reuse content across projects, ensuring you only pay to translate new or updated text while maintaining brand consistency.
  • Combine structured content with TM for maximum efficiency: Modular content, like DITA XML, is perfectly suited for translation memory. Breaking content into smaller, reusable components creates more opportunities for exact matches, which speeds up localization and improves accuracy.
  • Treat your translation memory as a strategic asset: Your TM is a valuable resource that grows with your content library. To protect its integrity, it requires ongoing maintenance and governance to ensure all stored translations remain accurate and up-to-date.

Related Articles

Create great content together

Write, review, translate, and publish all from one system. Heretto is the only ContentOps platform that allows multiple authors to work together at the same time.