Creating a Glossary for Machine Translation: Best Practices
Glossaries for real-time translation are different from those used to translate structured text. When you create a machine translation glossary for use with Language I/O through the glossary imposition process, consider the basic guidelines outlined in this article.
What to include in a glossary for Machine Translation
As a rule, only include Machine Translation glossary terms to cater to these specific situations:
- Customer-specific terminology, especially do not translate terms.
For example, the following terms are specific to Language I/O and would be set as “do not translate”:- Language I/O (the company name)
- Discovery (the name of one of Language I/O’s subscription plans)
- Rosey (the name of the Language I/O robot)
- Luna (the name of the Language I/O bot)
- Domain-specific terminology, especially do not translate terms.
For example, for a company like Language I/O, these could be relevant terms:- multi-lingual bot
- SaaS
- machine translation
- User Interface (UI) options
For example, Language I/O could add some of the user interface options from the golinguist portal to its own glossary:- the "Submit" button
- the "Compare Translation Vendors" checkbox
- Ambiguous terms
One of the examples in the glossary template is the term “dashboard” to refer to the Language I/O dashboards to visualize word count usage. If this term was not defined in the glossary, it could have many different translations into Spanish such as "salpicadero" in the context of a "car dashboard" or "tablero". - Terms that have proven to be problematic based on feedback from agents while using Language I/O.
- Company preferences on addressing end-users
For example:- Greetings:
- Hi there!
- Dear CustomerName
- Welcome!
- Closing:
- Best regards
- Sincerely
- Greetings:
Best Practices
Less is More
It is better to start small and focus on key terminology, rather than try to start with a very extensive list of terms. If you add too many terms to a glossary for Machine Translation without a human in the loop, it can affect the overall fluency of the Machine Translated content. For existing customers, Language I/O also offers a Self-Improving Glossary process (SIGLO) that automatically identifies over time new term candidates for the glossary based on Artificial Intelligence. This ensures your glossary is constantly evolving at the same pace as your content does.
Focus on compound terms with two or more words
Avoid adding terms that are only formed by one word: these tend to be generic words that should not belong to a glossary. The exception to this rule is when the term could be ambiguous (like “dashboard” above) or when you need to use an untranslatable term.
Add whole sentences if needed
You can add whole sentences to your glossary. These could be useful in the following scenarios:
- script sentences that could be problematic for Machine Translation
- recurring sentences in your communications that are important to transmit your tone of voice, for example:
- "Welcome to Language I/O, the future of conversational translation!"
- "Thanks. I've connected you to one of our specialists who'll look after you from here."
Terms with variations based on context
You must enter any terms for which Language I/O must allow for variations according to context (Do Not Mutate=False) in all required forms for the source.
For example
- in English:
- any key terms that are nouns should be entered in the glossary in both singular and plural forms.
- if you need to add a specific verb to the glossary, that requires a very specific translation according to your domain, we should add the most common forms of the verb to the glossary, for example:
- in Finnish or Turkish or any other highly inflected languages as the source, you must add the most common forms or cases of each term if they are likely to happen in your content.
Pay attention to capitalization
Make sure that the source and target terms use the correct capitalization according to the language and part of speech. For example, for English only use initial capitalization in proper nouns or all capitals when required for acronyms. Other languages typically use all lower case, with the exception of German, which requires initial capitalization for nouns.
Under the Ignore Case column of the glossary template, only use FALSE for proper nouns, acronyms or any entries that require the use of camelCase or very specific capitalization.
Examples
Incorrect glossary entry:
Correct glossary entries:
Final Checks
After you finish building your glossary, before you send it back to Language I/O, carry out a final check on the following criteria:
- Make sure that there are no duplicated entries.
- In Excel, select the range of cells that may have duplicate values Click Data > Remove Duplicates.
- Make sure that there is only one term entry per cell. For example:
- Make sure that there are not two or more target terms that correspond to the same source term.
Note: You can have two or more different source terms with the same target term. - Do not add any extra information to the source or target term. If you need to add any additional information or context for the Language I/O team to help them process your glossary, do so in the Explanation column:
- Make sure that there are no line breaks in Excel:
- Select the area where you want to identify the line breaks.
- Press Ctrl + F to open the Find & Replace dialog, and select Replace.
- In the "Find what" field enter: Ctrl + J; in the "Replace with" field enter a space.
- Carry out spell check for each of the languages:
- Select the range of cells that you want to spellcheck for the same language
- Click Review > Spelling.
Next Steps
You are now ready to share your glossary for Machine Translation with Language I/O.
If you have any questions about how to create your glossary for Machine Translation, contact Language I/O Support by going to languageio.zendesk.com by clicking on Submit a request in the top-right corner:
Comments
0 comments
Please sign in to leave a comment.