Idiom World Server Alignment Tutorial

Introduction

I am writing this document hoping that my experiences from my previous Idiom alignment tasks help you understand the process a little better.

Feel free to correct me if you see something wrong in the tutorial.

If you have any additional questions regarding Idiom WS, please visit http://www.idiominc.com/resources/documentation/.

First, let’s talk about when asset alignment is needed.

When your company decided to migrate to Idiom, there are a few things that need to be completed before you are ready for production work:

1. Tool Migration

2. TM Migration

3. Workflow Migration

4. Training

The asset alignment will be needed as a part of TM Migration.

I will briefly talk about all of them, but my main focus in this tutorial will be TM Migration.

1. Tool Migration

The most prevalent reasons that Idiom customers are adapting the system are mainly enhanced TM technology, consolidated production environment, extendable SDK, and perhaps something new :) . However, while migrating your legacy tools to Idiom, I am sure you will find that keeping some of your legacy tools will have huge advantages over Idiom, or sometimes hard(awkward?) to implement in Idiom.

My suggestion is trying to implement Idiom around your legacy tools if you believe there is a huge benefit in keeping them.

Also, once you become familiar with Idiom, you may find a easier way to port your legacy tools to Idiom later on.

Other than customized autoactions, these tools might need to be created to help your production work:

Linkage Creation Tool

Project Creation Tool

Search Tool

Import & Export WS Objects

2. Translation Memory Migration

There are a lot of things to consider when it comes to TM Migration. Here are some questions you might need to ask:

1. Are translations in your legacy TMs good or identical to translations in your localized files? Has linguistic bug fixes well-propagated to the TMs?

2. Do your TMs contain all of previous translations?

3. Do your localized files contain any linguistic hacks that your TMs can not have? How extensive are they?

4. Do you have matching English and localized files to be used for alignment? Are they customized heavily?

5. Do you have linguists who can help resolve alignment issues?

Examine the questions above and any additional questions that might come up and decide whether you can align or not.

There are many ways to create ICE TMs from your legacy assets and TMs. Here are three I can think of:

1. Asset Alignment

2. Translation from scratch

3. 3rd party TM Migration

The three above are TM migration methodology, not strategy. I think a good TM migration strategy is the first thing you might need to contemplate before you do anything else. Since, I think, there is no silver bullet to make this all work at once without running into some problems, some sort of hybrid model should be adapted.

Before I go into alignment topic, I would like to mention that, once you started your alignment, there is no going back :), so it is extremely important to temper your filters as perfect as you can before you start the alignment.

Here is one approach from my experience, but please do research yourself since there are many other ways to achieve this:

1. Start Asset Alignment using Idiom Align autoaction

What’s needed: English Assets, Matching Localized Assets

This will result in two sets of files:

- Perfectly Aligned (Idiom Align status: Perfect)

Even though Idiom returned ‘Perfect’ status, there will be some wrongly aligned assets among perfectly aligned assets. You can find out by comparing two sets of the files that you used - Idiom generated vs. Localized files you used for alignment (Using Unix diff, Araxis Merge Report, Win Diff, etc will work)

  • Well-aligned: Idiom aligned files are identical to corresponding original localized assets
    • If your Save autoaction is set to ‘Yes’ on ‘Update TM?’ option, the TMs associated with these assets should show all the TM entries populated by the alignment. Nothing more to do.
  • Wrongly-aligned: Idiom aligned files are not identical to corresponding original localized assets
    • Two ways to fix this:
      • Remove only wrongly-aligned segments from the TMs (I found using scripting languages like Perl or Python is easier to do) and translate them using your legacy TMs (if your linguists do not have much bandwidth to take this task, send them to your vendor for translation.)
      • Assign this task to your linguists to go through the comparison report and fix them directly in the TMs (if you linguists have some bandwidth, this is a good way to get them involved in the review process from the beginning.)

- Partially Aligned (Idiom Align status: Partial)

  • If you have separate budget assigned to this task, do send it out to vendor, and ask your linguists to review them.
  • In my previous experience, it seems easier and faster to have these segments translated (using your current TMX TMs) & reviewed than to have them aligned (copy & paste to match the original files by referring to the comparison report) & reviewed. It is depending on how easy it is to fix mis-aligned segments. If you choose to go with the latter - “Copy & Paste & Matching”, your linguists or vendor translators will need to look at the comparison report and propagate the matched translations to Idiom TM to create ICE entries. Although it is a time-consuming task, the benefit of it would be your Idiom TMs will be exactly same as the localized files in your source control system. It’s a perfect alignment scenario.

Note: This step can be skipped if you decided to translate them as you run into them during production. If you take care of them up front, no additional work later on. Since alignment process is to create ICE matches for your existing assets, you can either create them all from the beginning, or as you work on them.

I think if the success rate of the alignment is 80 - 90%, it is pretty good. With well-structured assets, the alignment success rate can go a lot higher.

*Alignment Problems:

Idiom Alignment works similar to Trados WinAlign. It first segments both source and target and do matching, so swapped result tags can not be aligned without human intervention, or major tweaks in alignment AA.

1. Might be easily enhanced in Alignment AA

- removed or additional attributes in a tag

2. Might need significant tweaks in Alignment AA or need to be reviewed

- tag swapping(no additional tags), tag swapping(some additional tags), segment breakers (additional/removed periods, commas, etc), etc

3. Can be fixed before alignment

- Validation Errors

4. Localized File Bugs: need to be reviewed

- duplicated tags, empty translations, etc

2. Test the ICE segments created by the alignment

Push all the source files that you’d used for alignment through Idiom. All assets should be ICE’d, and the re-generated localized files should be identical to the localized files that you used for the alignment. The latter might not be true in some cases. Linguists might have corrected some segments in the TMs, because they might find something wrong in the localized files, so it is no longer matching the original localized files. If #1 and #2 are correctly done, you should only check to see if all the assets are ICE’d with the TMs.

3. Convert your 3rd party TMX TMs to Idiom (optional)

This is to use a converted TM as a secondary Idiom TM for better leveraging.

There are two ways to import 3rd party TMX files to Idiom:

1. Import it directly to Idiom using Web UI.
2. Use Trados2Idiom utility

java com.idiominc.ws.autoalignment.Trados2Idiom -guaranteedEntriesOnly> <-showStatsSegments> -statsFile file> <TRADOS TMX file> <error file> <WorldServer-friendly TMX file>

Note: #2 above has two more command line options than #1 - suppression of non-guaranteed entries and ability to capture statistics

3. Workflow Migration

Basically mapping what you are doing currently to workflow steps.

- Glossarization

- Pulling Drop

- Sending Drop out

- Receiving Drop
- Linguistic Review

- File Integration with Source Control System

- Cost Analysis

- Bug Fixes

- etc.

4. Training

Everyone involved in the localization process should participate in many rounds of demos and sessions to understand the system before they start using them in production environment.

1. Linguists

  • Translating
  • Linguistic Review
  • Linguistic Bug Fixes - a single or global terminology replacement
  • Working with TMs & TDs
  • Working with Browser & Desktop Workbench
    • Tag Re-ordering, Merge & Split, Communication, Error Checking, Spell Checking, etc

2. Project Managers

  • Planning
  • Managing projects
  • Issue Tracking
  • Managing vendors and translators
  • Cost Models
  • Service Desk

3. Engineers

  • Managing World Servers
  • World Server SDK
  • Managing Tools - Linkage , Project Creation, Asset Search, etc

4. L10N Requesters from other dept.

  • Service Desk

5. Vendors

  • Translation Tool - 3rd party Translation tools to Idiom Browser or Desktop Workbench.
    • Tag Re-ordering, Merge & Split, Communication, Error Checking, Spell Checking, etc

No Comments

Leave a comment

mukkamu