Description of B-Translator

Table of Contents

The goals of the project

This software helps to get feedback about l10n (translations of the programs). It also helps to unify all the different translations and to ensure consistency among the translations. It is intended to be used for the translations of programs into Albanian, but it can be used for any other languages as well.

The motivation for developing such a software is that the traditional (current) l10n workflow requires highly dedicated people, and does not allow (or at least does not facilitate) small contributions from random people that do not have such a high dedication, determination and enough free time.

Also, the process of reviewing and correcting translations is not easy and does not facilitate the feedback from the users of the translated programs. Although the translators are usually very good and professional, they can make mistakes too, and sometimes they may miss the best translation for some certain terms. Some feedback from the crowd of the users would be more than welcome, if there are tools to collect and facilitate it.

Another problem with translations is that sometimes they are not consistent. The same string has different translations in different programs, and sometimes even the same translator may have provided different translations for the same string in different cases. This happens mainly because each program/project has its own translations and there is no central repository for all the translations.

To summarize, the problems that this software tries to solve are these:

  • Getting feedback about the translations from a wide crowd of people and users. This feedback can be in terms of votes for the best translation (when there are more than one translations for the same string), or it can be a new alternative translation (for an existing translation), or it can be a new translation suggestion (for a string that is not translated yet).
  • Helping to ensure consistency among the translations.
  • Merging translations from different sources (for example translations made on Launchpad and those made on KDE or GNOME).

What means B-Translator

The codename B-Translator can be decoded like Bee Translator, since it aims at collecting very small translation contributions from a wide crowd of people and to dilute them into something useful.

It can also be decoded like Be Translator, as an invitation to anybody to give his small contribution for translating programs or making their translations better.

If you could come up with some other interesting explanations, please let me know.

How it works

Build a dictionary of l10n strings

The source of the translation data used by the software are the POT/PO files of the projects. The PO template files (POT) contain the list of translatable strings of a project (in English), and the PO translation files contain the strings and the corresponding translations for a certain language. (More information and details about PO/POT formats and the translation process is provided by `info gettext`.)

These PO files are imported into the DB of the software. This import creates a dictionary of strings and their corresponding translations. The same string can be used in more than one projects, but in the dictionary it is stored only once. However, if the same string has different translations in several projects, all of the distinct translations will be stored into the DB.

Collect feedback from users/reviewers

These strings and the corresponding translations are presented for review to a large community of reviewers/users. The reviewers indicate which translation they think is the best by voting for it. They can also suggest any new translations (or suggest translations for strings that are yet un-translated). These new translations and the votes/likes of the reviewers are stored in the DB as well

The review process happens slowly and gradually during a long time. We can assume that each reviewer checks only one string each day, and that there is a very large number of reviewers that give feedback each day. The feedback can be collected through different channels, like web interface, social networks (Facebook, Google+, Twitter), email, mobile apps, etc.

Export the revised translations

Besides the dictionary of strings and translations, the import of PO files saves also the structure of these files and all the relevant data that are needed to export them again from the DB. However, during the export of the PO files, the most voted translations for each string are retrieved from the DB, instead of the original translations that were imported. This is how the input/feedback of the reviewers is transfered into the PO files. These exported PO files can then be uploded/commited into the repositories of the corresponding projects.

The process/workflow for a project without translation

According to the steps decsribed above, the process/workflow for a project that has no translation yet, would be like this:

  1. Checkout POT files from the repository of the project.
  2. Import them into the DB.
  3. Over some time, collect translation suggestions from the users. These translations can also be reviewed and evaluated by other users.
  4. Export the PO files from the DB.
  5. Review, fix and reformat them as needed.
  6. Upload/commit the PO files into the repository of the project.
  7. When a new POT file is released, start over again from the begining (but this time we also import the PO file, besides the POT file).

This process works well if there are no traditional translators to the project, and there is no other translation workflow happening concurrently (in parallel) with this one. Otherwise there would be a need to integrate these two workflows so that they don't override each-other.

Exporting only the latest suggestions (diffs)

In practice actually there is an existing translation workflow for almost all the projects. This translation is done either by using a Pootle system or by using PO editors. So, it is important that our workflow integrates with this existing workflow.

This integration is helped by exporting diffs instead of exporting PO files. These diffs are retrieved by the maintainers of the existing translation workflow (translators), and they contain the latest translation suggestions made by the reviewers through the feedback system. Such diffs can then be easily checked by the translators, and if they find them appropriate they can apply them to the PO files on the existing workflow.

Diffs are made between the current state of translations and the last snapshot of the translations. This ensures that diffs do not contain any suggestions that have been included already in the previous diffs, and so making more easy the work of the translators. The translator is usually interested only on the last diff, however the previous diffs are saved in the DB as well, in order to have a full history of the suggested translations over the time. Whenever a translator checks the latest diff, he should also make a snapshot, so that the translations that have been already suggested to him are not suggested again. Making a snapshot will also generate the diff with the previous snapshot and store this diff on the DB as well.

The process/workflow for an integrated translation

The process/workflow for the case when the feedback provided by the system is integrated in the mainstream translation workflow is like this:

  1. Checkout the latest version of the POT and PO files from the repository of the project.
  2. Import POT files and PO files into the DB.
  3. Over some time, collect votes and new translation suggestions from the users.
  4. Time after time (for example each month), the mainstream translator checks out the last diffs, containing the latest suggestions (and makes a snapshot as well).
  5. The translator reviews the latest suggestions and applies them in the mainstream translation, if he finds them appropriate.
  6. Periodically (for example once or twice a year) go back to steps (1) and (2) and import the POT and PO files again. This re-import may introduce new strings and translaions, but will not affect the existing strings, translations and votes.

Functional requirements

Open access

Everybody should be able to use the system for the purpose of getting translation suggestions for a certain string, even unauthenticated (anonymous/guest) users. Furthermore, it should be possible to use an API (web services), so that these suggestions can be retrieved and used even by external applications.

Authenticated voting

Submitting votes or new suggestions will be allowed only for the subscribed users (which have agreed to help and contribute). No contributions from anonymous/guests will be accepted.

Tracking votes

Votes and suggestions will not be anonymous. For each vote or suggestion, the user who submitted it will be recorded and saved. This will allow the user to see all the strings that he has already voted for, and also to change any of the votes, if he later changed his mind. At the same time it will prevent multiple votes by the same user for the same translation.

Highly customizable

The system will have a flexible configuration and customization page. This means that the user should be able to customize how much he would like to help and contribute. For example:

  • how many translation votes per day (an upper limit)
  • which communication means he preferes (email, facebook, google+, twitter, website, android app, iPhone app, etc.)
  • which projects or packages he would like to focus on (for example, if the user selects the package KDE, only strings that belong to a project on this package will be sent to him for review and feedback)
  • which languages he would like to use as primary and secondary source languages (for example a user that is not confident in English, may choose to use French as a primary language and Italian+Spanish as secondary/helper languages)
  • sequential or random selection of strings (random is the default, but if the user is interested in just one or a few projects, he may prefer to review the strings sequentially)

Evaluation algorithms

The contribution and performance of the users should be measured and evaluated using certain algorithms and/or heuristics. The users will be awarded points based on their performance. Probably some rewarding mechanizms can be integrated later for the top contributers.

Detailed and comprehensive reporting and statistics

Different kinds of reports and statistics related to users, projects, activity etc. should be supported and provided. (What are exactly these reports? To be elaborated.)

Integration with the existing workflow of the project translations

Project translators will continue to work with their prefered tools (like Pootle, Lokalize, etc.). They will also continue to use their prefered workflows (the way that they coordinate their translation work with each-other and with the project releases).

This system should help them to get feedback and possibly any new suggestions or translations from a big crowd of the contributers. The system should provide means and tools for easy integration with the workflow of the project translations.

For example, it should allow the translation maintainers to import their existing translation files (PO files), and to export translation files that contain the most voted translations, as well as new suggestions (for translated strings) or new translations (for untranslated strings). It should also allow them to get the latest changes (suggestions, translations, etc.) since the last time that they checked, or since a predefined moment in the past.

The latest changes should be exported in a format that is easy to review, modify and apply (diff or ediff).

Author: Dashamir Hoxha <dashohoxha@gmail.com>

Date: 2014-06-23 19:32:12 CEST

HTML generated by org-mode 6.33x in emacs 23