Wednesday, May 26, 2010

Are There Any Standards in Translation?

One of the things that struck me at the aQuatic conference I attended recently, was how empowering, functioning standards can be. In a presentation on TCS2 and globalization MK Gupta of Adobe provided some clear examples.This may not be a big deal in the content management world but it certainly is in professional translation workflows.

What a huge improvement over the sorry mess that we call standards in localization e.g. TMX, TTX, TBX etc… I loved the fact that I can edit a document downstream with an application that did not create the original data and send it on to others who can continue the editing in other preferred applications. I think this is a big deal. I think this is the future, as data flows more freely in and out of organizations.

I do not know very much about translation industry standards except that they do not work very well and I invite anybody who reads this to come forward and comment or even write a guest post to explain what does work to me and others who are interested. I was involved with the logical file system standard ISO 9660 early on in my career, so I know what a real working standard looks and feels like. This standard allows CDs to be read across hundreds of millions of devices, across multiple versions of PC, Mac, Unix and mainframe operating systems. Data recorded on a DOS PC in 1990 can be read today on a Mac or Windows 7 machine without problem. (Though if you saved Wordperfect files you may still have a problem.) The important factor is that your data is safe if it can be read today.

The value of standards is very clear in the physical world: electric power plugs, shipping containers, tires, CD and DVD discs etc… Life would indeed be messy if these things were not standardized. Even in communications we have standards that enable us to easily communicate: GSM, TCP/IP, HTTP, SMTP and the whole set in the OSI layers. Even regular people care and know about some of these. These standards make many things possible: exchange, interoperability, integration into larger business processes, evolving designs and architecture. In the software world it gets murkier, standards are often de-facto (RTF, SQL?, PDF, DOC?, DOCX?) or just really hard to define. In software it is easier to stray, so MP3 becomes WMA and AIFF and there is always a reason, usually involving words like better and improved to move away from the original standard. The result: You cannot easily move your music collection from iPod to Zune or vice versa, or to a new better technology without some pain. You are stuck with data silos or a significant data conversion task.

The closest we have to a standard in the translation is TMX 1.4 (not the others) and with all due respect to the good folks at LISA, it is a pretty lame “standard” mostly because it is not standard, and mostly because some vendors choose to break away from the LISA specification. It does sort of work but is far from transparent. SDL has it’s own variant and so do others, and data interchange and exchange is difficult without some kind of normalization and conversion effort even amongst SDL products!!! And data exchange among tools usually means at least some loss in data value. Translation tools often trap your data in a silo because the vendors WANT to lock you in and make it painful for you to leave. (Yes Mark, I mean you). To be fair, this is the strategy that IBM, Microsoft and especially Apple follow too. (Though I have always felt that SDL is more akin to DEC.) Remember that a specification is not a standard - it has to actually be USED as a matter of course by many to really be a standard. 

In a world with ever increasing amounts of data, the data is more important than the application that created it.

For most people it is becoming more and more about the data.  That is where the long-term value is. As tools evolve I want to be able to take my data to new and better applications easily. I want my data to be in a state where it does not matter if I change my application tool, and all related in-line applications can easily access my data and further process it as needed. I want to be able to link my data up, down, backwards and forward in the business process chain I live in, and I want to be able to do this without asking the vendor(s). I care about my data, not the vendor or the application I am using. If better tools appear, I want to be able to leave with my data, intact and portable.

So what would that that look like in the translation world? If a real standard existed for translation data I would be able to move my data from Authoring and IQ systems to CMS to TM to TMS to DTP or MT or Web sites and back with relative ease. And the people in the chain would be able to use whatever tool they preferred without issue. (Wouldn’t that be nice?) It could mean that translators could use whatever single TM tool they preferred for every job they did. The long-term leverage possible from this could be huge in terms of productivity improvements, potential new applications and making translation ubiquitous. The graphic below is my mental picture of it. (Who knows if it really makes sense?)

None of the “standards” in the picture today would be able to do this and perhaps real standards will come from the CMS world or elsewhere where standards are more critical. @Localization pointed out a good article on translation related standards at Sun. I think a strong and generic XML foundation (DITA compliant according to an IBM expert I talked to) will be at the heart of a “meaningful” standard. Ultan (aka @localization) has an interesting blog entry on DITA that warns about believing the (over) promises. I keep hearing that XLIFF and possibly OAXAL could lead us to the promised land but of course it requires investment. To work, any of these need commitment and collaboration from multiple parties and this is where the industry falls short. We need a discussion focused on the data and keeping it safe and clean, not the tools. Let them add value within the tool but they should always hand over a standard format so other apps can use it. Again, Ultan who knows much more about this issue than I do says: " We need to move from bringing data to people to bringing people to data. Forget XML as a transport. Use it as structure...:)"

Meanwhile, others are figuring out what XML based standards can do. XBRL is set to become the standard way of recording, storing and transmitting business financial information.  It is capable of use throughout the world, whatever the language of the country concerned, for a wide variety of business purposes.  It will deliver major cost savings and gains in efficiency, improving processes in companies, governments and other organizations.  Check out this link to see how powerful and revolutionary this already is and will continue to be.

As we move to more dynamic content and into intelligent data applications in the “semantic web” of the future, standards are really going to matter as continuous data interchange between key applications from content creation to SMT and Websites will be necessary, and I for one hope that the old vanguard (yes it starts with an S) does not lead us into yet another rabbit hole with no light in sight.You can vote by insisting and making standard-based data preservation a big deal for any product you buy and use. I hope you do.

I would love to hear from any others who have an opinion on this, as you may have gathered I am really fuzzy on how to proceed on the standards issue. (My instincts tell me the two that matter the most are generic and standard XML and XLIFF, but what do I know?). Please enlighten me (us).


  1. Thank you for a great article.

    I completely agree with you that standards are very important for efficiency.

    My feedback to your article is that forget XML, TTX, TMX etc. Forget file formats and their applications. The next (great) step is to have everything online. As you mention: applications becomes less and less important. I would even say that they disappear! You simply get access to "applications" and content and translation jobs in browsers and then you work in them. You see online TEnT tools appear everywhere - we even made our own such tool at LanguageWire: AGITO Translate. So now our translators do not need to have anything beside a browser to work for us. And we even made an online tool to work with InDesign Documents ( - so you do not have to work with indd/inx files anymore (inspired by Google Docs). You simply log in and modify InDesign files with only a browser.
    My point is: you need to take your thoughts in your article a step further: it is only about content - applications will disappear. And when content, workflows and tools are presented in a browser and not as different file formats for different applications - it will all be standardized and simplified. The revolution has begun: All the technical challenges we have today will move away from the translators desktop - and be handled by real technicians and then the translators will be asked only to do what they do best: translate...

  2. Don't they say that "Standards are so good, that's why everyone wants to have their own?"

    I am not an expert in the topic, I am just another victim. SDL used to promoted interoperability and wanted strict enforcement of TMX when they had SDLX and Trados owned the market. Once SDL became the market-maker, that requirement became irrelevant.

    I agree with you that the content is more important and more relevant than the container. Or - as you say - data is more important than the application.

    Because of the progress of collaboration and cloud computing, I think that translation technologies are approaching - maybe they are smack in the middle - of the saturation and decline stage of product lifecycles.

    The characteristics of that stage are:

    1. costs become counter-optimal
    2. sales volume decline or stabilize
    3. prices, profitability diminish
    4. profit becomes more a challenge of production/distribution efficiency than increased sales

    As I have been saying for the last 18 months. I believe that by 2015, translation memories as we know them, will become free or irrelevant. We will just do things differently.

  3. There is a lot of truth in this article: translation standards can use a lot of improvement. There are also a few exaggerations: The current standards work well in many cases.

    But I understand, the point here is to underline the fact that the situation could be much better. And I agree.

    Maybe the first step toward such goal could be participation. I may be wrong but don't think Asia-Online is active in any standard committee currently. I'm sure your experience and insight in creating 'standards that work' would be helpful there.

  4. @Thor Thanks for your comment, I agree that a revolution is underway and big changes are coming. I do think that for data to flow easily in the cloud through different value adding processes you do need to establish a useful standard. Just like HTTP and HTML enable lost of web content to be put up and even there many people do not follow but there is enough standards structure in place that millions of sites can be added everyday.

    @Renato I think the old approach may be reaching the maturity stage but we have barely started on the new connected and collaborative stage. Collaboration between people and business processes will require better data connectivity and interchange to really have translation flowing everywhere.

    @Anonymous I would like to clarify that this is a PERSONAL blog and opinion and is not necessarily aligned or representative of Asia Online's views on this matter.

    Asia Online is involved with LISA and provides feedback to the team there, thus my comment about the good folks there. And I am sure AO would offer suggestions if asked, but I am not sure that they have the expertise to create a comprehensive standard that works, as they could only provide perspective on the MT to TM/TMS connectivity.

    Again, my views here are those of an outside observer (not Asia Online), I wrote this to raise awareness about the question and possibility of developing a real translation segment standard. I clearly do not have a solution but hope that others might pick up on the question and that we could build a collaborative dialog. I believe that we (professional translation industry)would all benefit from such an initiative.

  5. > ... And I am sure AO would offer suggestions
    > if asked, but I am not sure that they have
    > the expertise to create a comprehensive
    > standard that works, as they could only
    > provide perspective on the MT to TM/TMS
    > connectivity.

    I think the statement above illustrates one of the main reasons why the current translation standards are not what they could be. If a professional or an organization feels standards are important for their work they need to be *pro-active*, not to wait to "be asked" to give an opinion. Standards at LISA or OASIS are done by volunteers from various companies, not by "LISA" or "OASIS": those are just the facilitators of the work.
    Also no-one has an all-encompassing knowledge to provide a comprehensive solution. But that should not be a reason to avoid bringing the specific and precious expertise they have.

    > ...others might pick up on the question and
    > that we could build a collaborative dialog.
    > I believe that we (professional translation
    > industry) would all benefit from such an
    > initiative.

    One of the problems of developing standards is that there are often too many initiatives because many people may be reluctant to join existing efforts and prefer to start from scratch. There are existing dialogs going-on: Maybe joining them would be more useful in the long term that starting a new one.

    For example, you mentioned the need for a "real translation segment standard". Currently work is being done for XLIFF 2.0, and it includes a possible sub-set for marking up codes inside segments; in other words a way to represent a "segment". Maybe that is what you have in mind. Looking at the wiki here it seems such sub-set could be also used by TMX 2.0. So, there are things happening, and I'm sure just about any professional from the translation industry could be useful in that effort.

    Sorry if I sound a bit harsh :) It's actually nice to see that someone like you, Kirti, is aware of the problem and takes the time to discuss it. But to me this is also old news. Creating standards requires people, pro-active people, who volunteer their time, who are evangelists inside their companies. There is a small set of groups/people already involved: joining them may be a good way to make things progress. In short: the talk is fine, but at some point one has to walk the talk.

  6. I am heartened to hear this questions posed. Not only are standards a way of establishing value, but they are an excellent way to respond to clients about quality concerns. Thank you for that contribution. I would be interested on hearing more about linguistic review.

    Posted by Miria Vargas

  7. Great post, as usual now. We're getting accustomed.
    I don't believe in the semantic Web. I think it will erase the linguistic approach to search engine that allow for the Web to be considered as an immense corpus, but this is almost off-topic. Anyway, the rush to the semantic Web will hopefully and eventually lead to some standardization in data representation and in content creation.
    So far, translation standards have worked well because of the industry's fragmentation. The example of SDL Renato brought in is an indirect demonstration. Think of SGML: the strength of a giant (specifically IBM with GML) was necessary to have it made an ISO stardard. Without SGML we would not have HTML and the Web, and probably could not have XML.
    So, does any superior, independent entity exist in the translation industry to enforce any standard whatsoever? Is LISA actually independent? Does it have (kept) the necessary authoritativeness and strength? Will any from the myriad industry associations ever have them?
    I strongly believe that once again standards will come from outside the industry, to face with the inconsistencies, weakness, and unsubstantiality of even the major players.

  8. Hi Kirti,

    It is all about the data...just like cars are about people and not petrol...fuel is the standard and people are the content. We need good fuel to get the people around. Substandard fuels will damage the engine...

    I found the article intriguing, if not a slightly aggressive, perhaps, misplaced aggression towards the standards body’s (have a I misinterpreted?). It has been stated ad nauseam that unlike the CD industry, the translation industry is far more fragmented...if the CD companies did not make their CD players compatible then no one would buy them....

    There needs to be a large gravitational lensing force acting upon the many players in the industry.

    Regarding Cloud based translation and standards... I know that the company I spend my days working in have built exactly this using the LISA standards. They may not be pristine standards but they are a big step in the right direction and better than us developing our own.

    Re LISA, it would help if there were more funds available to the standards committees, this way they could maybe get paid for their work and spend more time on it.

    In conclusion, I agree with your points and am looking forward to the next few years. There are many new and exciting developments. I know that the XTM development team and others are pushing the envelope in cloud based translation tools and 2010 is set to be a big year.

    Every Cloud has a 0111001101101001011101100110010101110010 lining.

    All the best,

  9. I agree with Kirti on the woeful set of standards in place in the GILT industry today. Sven Andra made the same point at a recent IMUG Meeting. Clearly it is time to revisit the set of standards we have in place in the industry. The problem is that there is no one place to go to for standards. LISA? OASIS? W3C? I think it is time to form an ad hoc committee of all the interested parties to decide (a) what standards need to be created or revised and (b) what organization will sponsor the new round of standards.

    Posted by Merle Tenney

  10. @Merle: Why not discuss all this with the existing groups?

    Maybe a good place for expressing those ideas and could be the up-coming XLIFF symposium in Limeric in September, I'm sure telecommuting there should be possible if one cannot go in person.

    It would be a little strange to start yet another group to solve, among other things, the problem of having fragmented standards.

  11. Hi Kirti,

    Please note that OAXAL (Open Architecture for XML Authoring and Localization) covering TMX, XLIFF, SRX, GMX-V, xml:tm, W3C ITS as well as Unicode and Unicode TR#29 do actually provide the comprehensive solution that you seek. OAXAL is an OASIS TC recommendation. OAXAL fully supports the online architecture that you seek. I have personally implemented it a number of times both tightly integrated to a CMS system (EMC's Docato, IxiaSoft's TextML and Alfresco - the flagship example of this is as well as an LSP comprehensive workflow system: XTM from XMl-INTL (

    I have found that working with existing Open Standards is very effective and that they do provide the required structures, even though as you point out TMX has limitations. Even with these limitations it still provides an invaluable service. LISA OSCAR is actually working currently on TMX 2.0 which will address most of the previous limitations and will synchronize inline element markup with XLIFF 2.0.

  12. @Elliot It certainly was not my intention to say that the existing efforts to developing standards are invalid. I assure you there is no aggression towards the standards bodies. Just an observation that they seem to be stuck and not making progress.

    I don't really have answers, but I feel this question is very important in helping to build real scalability for the whole translation process in a much more dynamic content flow.

    @Merle I think you have a point, and it would probably make most sense to see if the existing efforts could develop a collaboration model that not only unites the the various initiatives that are already in place, but also connects them to initiatives in the content creation world (and content consumption web standards). With all due respect, SDL is no IBM and we would all be better off in getting an open, community based initiative going. Possibly this could be led by people from the CMS world?

    @Anonymous The reason I wrote this was, I feel that we all (enterprise users, vendors, tools developers, translators) could benefit from a real (i.e. widely used, almost mandatory structural element) standard like XLIFF 2.0 (I think?). Many of us just want to be able to pass data to and fro with ease, nothing more. Save our data and take it to new tools like IQ, SMT and more open TM products. The standards initiatives seem to have some idea of the talk but why are so few walking the walk?

    What will it take to get lots of people walking?

    Does it make sense for the existing translation bodies to communicate and establish a common objective of universal interchange and storage.

    As @Luigi says If we don't do this somebody else will, because the forces that need and demand real easy data flow/interchange are stronger than any of the L10N standards bodies, tools vendors and "experts". We can be architects of this or possibly be marginalized because we never got this done.

    I will restate the Buckminister Fuller quotes that I think are so appropriate for this:

    "We are called to be architects of the future, not its victims. "

    "Don't fight forces, use them." (This line is also frequently used by martial arts and Zen masters.)

  13. If XLIFF was a customer deliverable installed on the customer's site, you'd see how fast it would become a standard. Looking to the localization industry to create standards is like looking to the tail to create a standard for dog-wagging. Regardless, I still think what we need to look at here is data - how it's structured, searched, retreived without ever being sent to anybody else in the first place.

  14. Hi Kirti,

    Thank you for raising these important issues. Part of the problem, I feel, is not the lack of standards, over the last decade we have established all of the core ones and provided in OAXAL a framework for how to use them in a unified architecture. The problem is a lack of certification. Rodolfo Raya has provided some very useful tools for validating TMX and XLIFF files and all credit to him for his efforts. A formal certification effort would improve things considerably.

    Best Regards,


  15. Merle mentioned the standards organizations, LISA, OASIS, and W3C.

    I just thought I would add ISO TC37, which is the ISO technical committee that produces standards for language resources.

    Kara Warburton

  16. Kirti,

    one of the standards that is missing is a single segment interchange protocol for public TMs as well as machine translation engines. Translators should be able to use their TM/editor of choice (Trados/Wordfast/OmegaT...) and have that software offer them candidates from various sources (public TMs, private TMs, SMT engines, RBMT engines...). Several tools now interface with Google translate, a few are starting to interface with ProMT. Are we going to see that communication progress on a pair-by-pair basis? (engine/DB+translator tool) Or will we be lucky enough to have LISA offer a single-segment optimized version of TMX packaged with a transfer protocol, which would allow every translator tool to query all the sources a translator wants to work with?

  17. @Thomas

    I agree that a single segment interchange protocol would be very handy for interchange, and long term storage as well. If the metadata structure was rich enough it would also allow packages of TM to be created in much more intelligent ways and also be very useful to data sharing initiatives like TDA or other more open collaborations.

  18. @Thomas: I think a segment-level standard is what the XLIFF TC tries to implement, working with OSCAR, for the 2.0 version.

  19. TAUS also had an interesting comment on this issue in this link which essentially supports many of the comments made here but also suggests that a large repository of TM could in fact become a way in which a solid TMX and XLIFF standard could come into being.

  20. Some interesting updates to this at

  21. I just received the following e-mail from LISA after they announced that the organization was being disbanded. I think it is relevant to the discussion that is going on here.

    Dear colleagues,

    In light of LISA’s insolvency, there is considerable interest in the fate of the OSCAR standards portfolio ( and where either a part or all of the works will reside going forward.

    It is important to understand that in order for any of the LISA standards to continue, members must become active by participating regularly to help define the deliverables and support the development work. There are very qualified and highly experienced organizations who are willing to carry the work forward by placing a focus on localization industry standards.

    Kara Warburton and Arle Lommel have helped to identify three primary organizations for where work on these standards (and other related localization standards) can continue. Each of these groups has expressed interest in working with LISA standards and, with members who are committed to the ongoing development of the standards.

    The following options (in no order of preference are):
    European Telecommunications Standards Institute (ETSI)

    At least four (4) LISA Members whose companies are already members of ETSI could join an Industry Specification Group (ISG) for localization standards-focused work. ETSI views localization as a strategic sector to broaden its already substantial works in interoperability formats.

    1. No additional fees (beyond general membership) would be required to join their localization ISG. There are no issues with (LISA) trademarks, assets, nor liability. The OSCAR work continues under a new roof. With over 700 members in 62 countries and an annual turnover of 27 million Euros, ETSI is a financially sound non-profit organization specialized in interoperability standards for the communications industry.

    1. ISGs have their own membership which may consist of both ETSI Members and Non-members (under certain conditions), they have their own voting rules, they can decide their own work programme, and approve their own deliverables.

    1. ISGs have access to all ETSI services: web, portal&docbox (public on-line work program), ISG mailing list (e.g and mail archives, Conf Call, Virtual meetings (GotoMeeting & webex), secretariat meeting support (registration, badges, coffee, ETSI room allocations, maps, hotels support, etc.), member registration & invoicing, drafting and editing support, pdf conversion, public catalogue provisioning, free on-line download of all standards, quarterly publication on DVD of standards for members, etc.

    1. Common ETSI-LISA members are: Adobe, Cisco, Hewlett-Packard, Huawei Technologies Co. Ltd., Intel Corporation, Alcatel-Lucent, Siemens PLM Software, Siemens Enterprise Comms., Siemens AG, Nokia Siemens Networks, Sony Computer Entertainment America, Sony Europe Limited, WHP and Lionbridge. Any four of these companies can be the founders of the Localization ISG.

    For more information about ETSI please feel free to contact Patrick Guillemin, Senior Officer for Strategy and New Initiatives or Gaby Lenhart, Senior Research Officer for Strategy and New Initiatives

    (Continued below)

  22. OASIS


    LISA Members who are also members of the OASIS Consortium may proceed as follows. Note that a quick-start procedure is in place which only requires five (5) existing members to:

    1. form a Technical Committee (TC) in OASIS. No money is needed to form the TC. There are no issues with (LISA) trademarks, assets, nor liability. The OSCAR work continues under a new roof.

    1. OASIS would receive the OSCAR specification(s) as a donation;

    1. the TC’s charter would pledge to continue the work(s) using the standards (specifications) as a starting point.

    1. common OASIS-LISA members are: Adobe, Alcatel-Lucent, Avaya Business Communications, CA Technologies, Cisco Systems, Inc. EMC, Fuji Xerox, Hewlett-Packard, Intel, SAS, and SDL.

    For an example of a related TC, review the OAXAL TC as an example of how this can be structured and put in practice:

    For more information about OASIS please freel free to contact Laurent Liscia, Executive Director or James Bryce Clark, General Counsel

    The OMG is an international not-for-profit software consortium that sets standards in the area of distributed object computing. They focus on programming languages, language mappings, data modeling, metadata, and business modeling. Its board of directors includes many large companies:

    OMG is a liaison to TC37 as well as with a number of other ISO committees ( It is also a liaison with OASIS, CEN, ECMA and the W3C in addition to other organizations (see link above). It has a “reciprocal membership arrangement” with a number of standards organizations, including OASIS and W3C (see link above). Thus it is well connected in the standards world. OMG is extremely productive.

    OMG does not have as substantial a membership overlap with LISA as the other two options. Currently only Adobe Systems and Hewlett Packard are members of both organizations. The lowest membership fee with voting rights is $3000 USD. There is an observer membership fee of $2,100 and a university member fee of $550.

    All published OMG specifications, RFIs and RFPs are freely available to the public on the OMG’s document server.

    For more information about OMG please freel free to contact Richard Mark Soley, CEO or Bill Hoffmann, President/ COO

    NOTE: Consistent with Article 31 of the LISA Statutes, all standards developed and ratified by the OSCAR Steering Committee are, by definition, open and freely available to the general public and may not make use of patent-encumbered technologies except under royalty-free (RF) terms. We would anticipate that a similar RF license will continue to exist in the future. Each of these organizations would be able to provide this framework.

    Once again, we encourage you to become active supporters of these and other related localization standards by joining one or more of the above groups, as appropriate for your company.

    If you have any questions about any of the options above, please contact Kara Warburton or Arle Lommel In the best interests of the industry, both Kara and Arle are prepared to offer technical guidance and can help assure that quality localization standards will be developed.


    Michael Anobile
    (former) LISA Managing Director

  23. This is from a discussion in Linked In discussing the impact of LISA' demise at

    Hi! I am just joining this discussion on interoperability standards, so I have a lot of background reading to do. For now, I will simply mention that there is an existing organization, of which TAUS is already a member, that could serve to coordinate work on standards. It is called the T&I Summit advisory council (TISAC). Its administrative home is a neutral body, the Center for Language Studies at BYU (which has no commercial interest in the standards question). The members of TISAC include (1) TAUS, (2) the international federation of translators (FIT), to represent the interests of individual translators, (3) ALC (which represents a number of language service providers), (4) AMTA a machine translation association), (5) a representative of ISO TC 37 (where most language standards are developed), (6) the Canadian federal translation bureau (a huge client), and quite a few other organizations (over 15 total). I will be discussing with the representatives of these TISAC member organizations whether they would like TISAC to coordinate work on standards, such as sharing ideas on what standards are needed and promoting the use of existing standards and providing input to standards bodies. TISAC is not a standards body like ISO, OASIS, and OMG. Thus, TISAC would not want to actually develop standards. It seems that there is a need for two very separate roles: coordinating standards and developing standards. As some of the recent posts to this discussion have pointed out, "This overarching org would be non-profit, a coordinating body, a place where we could shape our industry and profession". TISAC is an overarching organization that a number of organizations interested in translation have already voluntarily joined in order to coordinate in various ways. TISAC is surely not the entire answer, but it could be part of the answer.
    Posted by Alan K. Melby