start
Table of Contents

Using csv2po

csv2po allows you to create CSV files from PO files. This allows you to send translation work to translators who do not or cannot use PO Editors but who can use a Spreadsheet.

Quickstart

  1. pofilter --fuzzy --review -t untranslated <po-dir> <po-filtered-dir> (this step is optional)
  2. divide into sections
  3. po2csv <po-dir|po-filtered-dir> <csv-out>
  4. edit in Excel or OpenOffice.org Calc
  5. csv2po --charset=windows-1250 -t templates <csv-in> <po-in> (you must work against a template directory, the charset option corrects problems with characters sets)
  6. phase - to do basic checks sort out encoding issues
  7. pomerge --mergeblank=no -t <po-dir> <po-in> <po-dir>
  8. cvs diff -u > x.diff --- check the changes
  9. cvs ci --- commit changes

Detailed Description

po2csv allows you to send CSV files, which can be edited in any spreadsheet, to a translator. This document outlines the process to follow from the raw po files → CSV files → back to PO. We also look at a case where you may have submitted a subset of the PO files for translation and you need to integrate these.

Creating a subset

This step is optional.

To send a translator only those messages that are untranslated, fuzzy or need review run:

pofilter --fuzzy --review -t untranslated <po-dir> <po-filtered-dir>

Divide into sections

You might want to divide the work into sections if you are apportioning it to different translators. In that case create new directories:

eg. po-filtered-dir-1 po-filtered-dir-2
or  po-filtered-dir-bob po-filtered-dir-mary

Copy files from po-filtered-dir to po-filtered-dir-N in a way that balance the work or apportions the amounts you want for each translator. Try to keep sections together and not break them up to much eg. Give one translator all the OpenOffice.org Calc work don’t split it between two people - this is just a simple measure to ensure consitancy.

Now continue as normal and convert to CSV and perform wordcounts for each seperate directory.

Creating the CSV files

po2csv <po-dir|po-filtered-dir> <csv-out>

This will create a set of CSV files in csv-out which you can compress using zip (we use zip because most people are Windows users)

Creating a wordcount

Professional translators work on source word counts. So we create a wordcount to go with the file

pocount ` find po-dir|po-filtered-dir -name "*.po"`

We work on source words regardless of whether the string is fuzzy or not. You might want to get a lower rate for work on fuzzy strings.

Place the wordcount file in both the PO and CSV directory to avoid the problem of finding it later. Check the number to make sure you haven’t inadvertantly including something that you didn’t want in.

Package the CSV files

zip -r9 work.zip <csv-out>

Translating

Translators can use most Spreadsheets. Excell works well. However there are a few problems with spreadsheets:

Converting Excel spreasheets to CSV file

You can, and should, keep your files as CSV files. However, many translators are not the best wizzes at using their spreasheet. In this case many files will have been changed to XLS files. To convert them by hand is tedious and error prone. Rather make use of xlHtml which can do all the work for you.

xlhtml -xp:0 -csv file.xls > file.csv

Converting CSV back to PO

Extract the CSV files here we assume they are in csv-in.

csv2po --charset=windows-1250 -t <templates> <csv-in> <po-in>

This will extract create new PO files in po-in based on the data in the csv-in CSV files merged with the templates template files. You shouldn’t run the csv2po command without templates as this allows you to preserve the original file layout. Only run it without -t if you are dealing with a partial part of the PO that you will merge back using a pomerge.

Note (1): running csv2po using the input PO files as templates give spurious results. It should probably be made to work but doesn’t

Note (2): you might have encoding problems with the returned files. Use the --charset option to convert the file from another encoding (all PO files are created using UTF-8). Usually Windows user will be using something like WINDOWS-1250. Check the file after conversion to see that characters are in fact correct if not try another encoding.

Checking the new PO files

FIXME we don’t have the progress script anymore but it might be good to look at how to do this with phase

We run the progress script against the files as this allows the gettext tools to pickup encoding and other errors.

Manually edit the files to correct these or use iconv to convert between character sets.

Removing fuzzies

When you merge work back that you know is good you want to make sure that it overrides the fuzzy status of the existing translations, in order to do that you need to remove the “#, fuzzy” markers.

This is best performed against CVS otherwise who knows what changed.

po-in-dir=your-incomming-po-files
po-dir=your-existing-po-files
 
for pofile in `cd $po-in-dir; find . -name "*.po"`
do
       egrep -v "^#, fuzzy" < $po-dir/$pofile > $po-dir/${pofile}.unfuzzy && \
       mv $po-dir/${pofile}.unfuzzy $po-dir/$pofile
done

Merging PO files into the main PO files

This step would not be necisary if the CSV contained the complete PO file. It is only needed when the translator has been editing a subset of the whole PO file.

pomerge --mergeblank=no -t po-dir -i po-in -o po-dir 

This will take PO files from po-in merge them with those in po-dir using po-dir as the template -- ie overwriting files in po-dir. It will also ignore entries that have blank msgstr’s ie it will not merge untranslated items. The default behaviour of pomerge is to take all changes from po-in and apply them to po-dir by overwriding this we can ignore all untranslated items.

There should be an option to override the status of the destination PO files with that of the input PO. This works with setting fuzzy status but you cannot remove fuzzy status.

Therefore all your entries that were fuzzy in the destination will still be fuzzy even thought the input was corrected. If you are confident that all your input is correct then relook at the previous section on removing fuzzies.