The design of Content Translation

Content Translation is a tool that facilitates the translation of Wikipedia articles. The initial beta version was launched for several languages and it has been used already to create more than one thousand articles.

Designing this new translation experience has been an exciting journey, but before going into the details you can check how the tool works in the video below.

The translation process with Content Translation is quite simple: click on a paragraph to get an initial automatic translation, improve it, and repeat for the next paragraph until you are ready to publish your new article. The tool feels natural to our users and saves them time:

I can now translate a 20-line article in less than 5 minutes, saving lots of time.

However, not so long ago, when I started the design process, a translation tool for Wikipedia was just a blurry idea full of interesting challenges.

The problem to solve: translate the sum of all human knowledge

Translating Wikipedia articles was not a new concept. Multilingual users currently make about 30% of Wikipedia edits. However, little support was provided in the software for the process of translation. As a result, users had to take care of many boring aspects that prevented them from focusing just on crafting great content.

There have been previous attempts to improve this process by big names in the tech industry. Tools such as Google Translator Toolkit or Microsoft Research Wikibhasha are interesting tools in this area, but they didn’t get a wider adoption. As it is detailed next, we think the reason is that these tools were probably too rigid to fit in "the wiki way" of translating.

Design principles: translating the wiki way

What is “the wiki way” of translating? What means to translate in the Wikipedia context? what is different from translating other kinds of content such as technical documentation or user interface messages?

The project started with many unanswered questions, but we relied on the design process to lead the path to clarity. The support and appreciation for design by all the members of the Language Team was key to figure out the right answers. Simplifying the experience for users often means to move the complexity from the user to the software, which the engineers at the Language team always perceived as a challenge worth taking.

During the project we collected a lot of information from many different sources: conversations with experts from the Language team, existing documentation on community conventions and recommendations for translation, research on multilingual users behaviour, and interactions with different members of the community with different expertise in translation (below you can see a screenshot from our round table with some of the early adopters of our tool).

We identified three different user profiles for the tool: the casual translator (a multilingual editor not always confident about the proper way to translate some words or sentences), the advanced translator (focused on increasing the language coverage and expecting a fluent process that does not get in the way), and the new editor (for which translation could be a simpler route to start contributing than starting from scratch). Casual translators represented our main target audience but we wanted to keep a balance between the simplicity new users needed and the shortcuts to speed up the process for advanced translators.

We organised periodic user research sessions to better understand the different user needs during the existing translation process, and to validate new ideas on how to improve this process. We recruited participants through a survey and organised 17 research sessions. Most of the sessions were conducted remotely using Google Hangouts with participants from all around the world.

As the project evolved, we learnt more about the current process of translation, and several ideas were explored on how to improve it. As a consequence, a greater percentage of each session was devoted to the testing of more detailed prototypes. Once the initial implementation for Content Translation was available, it was also included as part of the sessions.

The research sessions were instrumental to understand the particularities of translating in the Wikipedia context, and guided the design of the translation experience. Below you can find some of the principles we applied in the design of Content Translation.

Freedom of translation. There is a significant diversity in Wikipedia content across languages. On average, two articles from different languages on the same topic have just 41% of common content. In contrast to other kinds of content, such as software user interface strings or technical documentation, Wikipedia articles in different languages are not intended to be exact translations that are always kept in sync.

In order to support that content diversity, Content Translation does not force users to translate the full article. Users can add one paragraph at a time to the translation, deciding how much to translate. When a paragraph is added, an initial automatic translation is provided (if it is available for the language), and users are free to correct words, rearrange sentences or start with the source text or an empty paragraph if that is preferred.

Provide context information. In Content Translation, the original article and the translation are shown side-by-side. In addition, each paragraph is dynamically aligned vertically with the corresponding translated paragraph, regardless of the difference in length. This allows users to quickly have an overview of what has already been translated and what has not (with just a single scrollbar). This is one one of those details most users will not even notice because it just feels natural.

Contextual information is provided at different levels to reduce the need for the user to navigate and reorient. When working on a sentence in the translation, the tool will highlight the corresponding sentence in the original document to allow translators to easily check the original context. In addition, when manipulating the content, options such as exploring linked articles (in both languages) are provided to anticipate the user's next steps in order to make the experience more fluent.

Focus on the translation. During user observations we identified steps in the translation process that could be automated. Users spend time making sure each link they translated points to the correct article in the target Wikipedia, recreating the text formatting that was lost when using an external translation service. They also look for categories available in the target Wikipedia to properly classify the translated article, and save constantly during the process to avoid losing their work.

Content Translation deals with those aspects automatically. When adding a paragraph, the initial translation preserves the text format and links point to the right articles if existing. In a similar way, existing categories are added to the article and user modifications to the translated content are saved automatically.

In addition to removing distractions for the user, it was important to focus on the scenarios the tool was intended to solve, and more importantly, those it was not. Working closely with the product manager we decided to focus on the creation of new articles by a single user. Other scenarios we identified initially (such as multiple users editing in real time the same translation, or extending existing articles) had to wait. Focusing in one scenario allowed the team to iterate faster, provide value to our users early, and learn from the actual use of the tool to inform future steps.

Quality is key. One of the concerns raised early by the participants was about machine translation quality. Several users were concerned about the potential proliferation of low quality content in Wikipedia articles if the translation process was made so easy and machine translation was into the mix.

In order to respond to that concern, Content Translation keeps track of the amount of text that is added using machine translation without further modification by the users. When the amount of text exceeds a given threshold, the users are provided with a warning message. This message is intended to educate the users about the purpose of the tool and to encourage them to focus on quality more than quantity. Finally, if a user publishes an article with a high level of automatically-generated content, the resulting article can be marked for the community to review. So far we have not received complaints on the quality of the articles produced with the tool. After three months of launching only two articles got deleted form 708 translations published.

Exploring solutions: iterate, iterate, and iterate

The above principles and solutions were the result of many iterations. We explored multiple options for several aspects from the content layout to how to show multiple translation suggestions. For those explorations I like to apply quick sketching techniques such as 6-8-5, where you timebox yourself to quickly explore as many different design directions as possible. Below is an example of 6 different ways to show language information for the translation content.

Once we had some promising candidate ideas, I prepared some mockups at a higher fidelity level and illustrated how they would work in a video. The video, which was useful to frame the conversations with the Language team, is shown below.

In order to test our ideas with users, we build several prototypes at different fidelity levels (and using different tools). From basic clickable prototypes made with Pencil, to an advanced HTML prototype where users can have rich interactions with the content (e.g., typing and applying different language tools on the content). The later went under 40 revisions to simulate new features based on the testing sessions (including the translations to test with users in different languages). Finally, some small prototypes made with Hype with a very specific focus were created to test aspects such as the card-based layout for language tools or how to add links using text auto-completion.

As the tool was taking shape, the design team at the Wikimedia Foundation was also growing. This resulted in feedback from all my great colleagues that was very useful to improve the tool, review the testing process, and make the designs better aligned with the general design direction for Wikimedia projects.

More details on the design of Content translation can be found at the project page and on this research paper.

Next steps

Content translation has the potential to reduce the language barriers in Wikipedia. Currently the amount of knowledge you can access depends heavily on the languages you are able to speak. Although it has been a great start, there are many pending steps in this road.

For the initial stage of the process, covered in this post, the main focus was to create a translation editor that made translation easier and faster for our users. Our results show that once users start a translation with the editor they are able to complete it successfully and users are happy with it.

The goal for the next stage is to bring new users to the translation process and keep them translating by helping them to find interesting content to translate. Some of this work has already started with the great help of Nirzar Pangarkar and is showing already an increase in the number of translators using the tool, but there are many questions yet to be answered.

You can get the latest news about Content Translation through this twitter account, and if you are around Mexico this summer we can discuss more details about the design of the tool during my talk at Wikimania 2015.

#wikimedia #content translation #project