Featured
Tagging existing content in a translation memory
Reference Number: AA-00265 Created: 11-10-2012 12:52 Last Updated: 30-08-2013 10:37 0 Rating/ Voters

Title:  Tagging existing content in a translation memory

Context: 

You find yourself in a situation where you're leveraging older content coming from other tools, or you start using the built-in custom regex tagger, and you face a situation where you have content that you can tag in your source files, but the existing translation memory holds content which is NOT tagged.

- Your original document contains entries like "%1" or '%1' or just %1

- You tagged those entries using the custom regex tagger, showing only simple tags, like this:



BUT… your translation memory still contains the raw "%1" and similar text:


How can you increase your leverage by creating memoQ tags in the TM?… how can you boldly tag where no one has tagged before?


Howto:

You can edit a TM the same way you do for a document in memoQ:

1. Export your TM as TMX. Go to Project home > Translations memories. Select your TM, then click Export to TMX.

2. Create a project or use an existing project with the same language combination as your TM. Import the TMX file. memoQ supports TMX file import in projects. You can edit the imported TMX file as any other document you imported into a memoQ project.

Using the custom regex tagger is then possible. But you can also freely edit that TM in a regex-enabled text editor like Notepad++ (http://notepad-plus-plus.org/).

If you want to deal with tags outside memoQ, you can do the following:

1 - Finding out how memoQ marks tags in its TMs

• Create a simple document, with one sentence containing all the cases you want to cover, or if you prefer, one sentence per case to cover.

• Import in memoQ and use the custom regex tagger to mark your tags the way you want

• Confirm the segment(s) in a freshly created TM

• Open the exported TMX in Notepad++. it should show something like this:


• There, you can see how memoQ marks its custom tags:

They are <ph> tags, with displaytext and val as attributes. displaytext is the text displayed in the translation UI and val is the actual text which is tagged (and will be exported).


Of course, as with any XML content (TMX is XML code) inside tags, all quotes, ampersands and many other characters need to be properly escaped. That's why you see &quot; for quotes ("), etc.

<ph>&lt;mq:rxt displaytext=&quot; <your_tag_text> &quot; val=&quot; <actual_text_tagged> &quot; /&gt;</ph>


2 - Search text and replace it with "tagged version" using regular expressions

• In our example, we want to replace "%1", '%1' and %1, in that specific order (to avoid confusion)


Searchreplace with…
 "%(\d)" <ph>&lt;mq:rxt displaytext=&quot;(plq\1)&quot; val=&quot;&amp;quot;%(plq\1)&amp;quot;&quot; /&gt;</ph>
 '%(\d)'<ph>&lt;mq:rxt displaytext=&quot;(pla\1)&quot; val=&quot;&amp;apos;%(pla\1)&amp;apos;&quot; /&gt;</ph>
(%\d) <ph>&lt;mq:rxt displaytext=&quot;(\1)&quot; val=&quot;%(\1)&quot; /&gt;</ph>
 pl.(\d) \1







Here, the last entry matches replaced the "pla\d" and "plq\d" placeholders used to avoid bad replacements of already replaced %1.

We have to use this trick, as Notepad++ unfortunately does not support lookbehind operators.

• Now that all entries have been replaced, create a new TM and import your TMX file in it, content will be properly "tagged" in the matches.




Rss Comments (Please do not post support requests here.) 2
  • #
    [Denis Hay]: Context should also be tagged 08-07-2014 10:52

    Hi Cherif,

    If you're tagging everything, it makes sense that the previous/following strings used for context are also tagged, as they would be in your document!

  • #
    [ Cherif]: Tagging-existing-content-in-a-translation-memory 14-11-2012 12:25

    Hi,

    Thank you for the tutorial, this was really helpful!

    Although there is a problem for TM using context.
    memoQ refuses to upload my tagged .tmx because using your method also tags the "x-context-pre" and "x-context-post" entries and I believe those should remain plain text.

    Do you have an idea to replace the wanted entries with tagged entries without modifying the in context ones ?
    I guess I need to isolate those in context strings somehow, I am not sure how to proceed.

Info Add Comment
Nickname: Your Email: Subject: Comment:
Enter the code below:
Quick Jump Menu
Info Missing an article? Let us know!