The first question you might ask when talking about templates in Across is: why should I invest my precious time in creating something that already exists? While it is true that Across comes with templates to segment HTML and XML, you should always keep in mind that those templates are for standard use and thus pretty generic. I for instance created my own for a knowledge system that uses basic HTML only. This means there will be no scripting and nearly no attributes that have to be translated. So for the sake of code safety it is much easier to simply hide everything the translators will never need. It makes work for them easier and it makes me sleep better.

The following text refers to HTML, but it is exactly the same for XML

So let us have a look into that special case of restricted HTML, because it shows which steps need to be taken to create a good template. First of all you need to determine which tags will be used in your documents. In my case that is:

  • br, hr
  • div, p, table, tbody, tr, td
  • span, a

As you can see, all forms of text formatting is done via the span tag, which reduces the amount of tags we need a lot. Now we can proceed to determining the attributes we will need to translate. Attributes are added based on tags, so in my case that is:

  • a: href, name

Everything else has attributes which should not be touched. Span for instance is used to allocate CSS styles. One could debate whether it is a good idea to translate href and name (for document internal links), but an author who doesn’t speak my language will find it incredibly hard to work with German anchor names.

Now to the practice part. In the menu bar, go to Tools > System settings and in the tree Document settings to the left, select Tagged HTML or Tagged XML, depending on what you need. There click New to create a new filter template.

Let’s start by adding a new HTML tag by clicking on Add. The name you see there is the tag, so let’s start with br. Content type will be empty, because there is nothing in between br tags. It’s the same for hr, by the way. Element type is set to external, because we won’t find br (or hr) in the middle of a sentence. This might of course be different in your setting, but external means that the tag will not be found inside a sentence. The consequence is that an external tag provokes a new segment.

We will keep the external setting to unconditional, but we will change it from normal to hidden. Hidden means that the tag will not appear for the translator, but it will be saved and later exported to the translated document. So why can we hide those tags? The reason is pretty simple. The position of these tags is defined by the end of the previous segment and the beginning of the next segment and there is nothing to translate in there. This reduces the overall chance to meddle with tags and cause damage.

Next we will move to div, p, etc. Those are external tags as well, because they all divide the document. They are also unconditional, but they are normal and not hidden. If you hide these tags, then Across will think that everything between <p> and </p> needs to be hidden, which is clearly not what we intend. After all the text between those tags needs to be translated. Will your text be full of useless tags? The answer is no. By making this tag external we told Across that this tag will not be found inside a sentence, which means it’s not relevant for translation. Therefore Across will automatically hide it from the translator.

The last set of tags is span and a. Both may appear inside a sentence, which is why they need to be set to internal and normal. Setting a tag to internal will make it visible in crossDesk, but the tag itself can still not be changed by the translator. All he can do is move the position around, which is useful because links and certain formats can appear at different places in a translation. Small hint: adding the tag to the target editor can be done by double clicking on the tag in the segment window.

Now to the next part, attributes. Attributes are parts of code found inside tags, but not all need to be translated. If you don’t need it translated, just don’t add it and Across will handle it by itself. In my case the href and the name attributes will need to be translated for the reasons mentioned earlier. All we need to do for that is select the a tag in the list and click Edit. Then we click the Attribute tab and Add. Now enter a name and click Save. You will now be able to select the attribute in the list and choose Translatable. This will mean that the attribute can be translated so if I have an anchor with name=“Anker“, Across will show that Anker can be translated.

The last settings can be found by clicking on Configure. In Splitting properties I activated splitting of paragraphs into sentences, because most of the time, the number of sentences will be equal in source and target text. Of course there will be exceptions, but it’s always possible to merge segments. Removing white spaces is a bit more tricky, though. When I tried that with my test documents, certain tags were not treated correctly and thus not the whole text was shown. If I ever find out why, I’ll update this document. In Advanced Settings you will a lot of – as the name suggests – advanced options. Script translation is not important in my case and charset and encoding should be set to never change. The target documents will be used in the same knowledge base as the source documents, so nothing needs to be changed. The standard element type is not important as well, because only a very restricted set of HTML will be used. Convert character entities is completely deactivated in my case, because the document encoding supports every special character used in the documents. What I activated in my case is Normalize white spaces, because contrary to removing white spaces in the Splitting properties, this doesn’t seem to affect tags.

So that’s it. We created a custom template that helps us filter custom HTML or XML documents. It makes translating easier because you don’t see every part of the code and it makes the code safer because things that should not be changed are simply locked.

Questions? Like? Hate? Feel free to comment and I will try to answer as soon as possible or edit my text.