Luca's meaningless thoughts  

Translation of e-mails using Mutt

by Leandro Lucarella on 2012- 09- 24 12:45 (updated on 2012- 10- 02 12:58)
tagged e-mail, en, floss, gmail, google translate, mutt, python, release, script, software, translate - with 0 comment(s)

Update

New translation script here, see the bottom of the post for a description of the changes.

I don't like to trust my important data to big companies like Google. That's why even when I have a GMail, I don't use it as my main account. I'm also a little old fashion for some things, and I like to use Mutt to check my e-mail.

But GMail have a very useful feature, at least it became very useful since I moved to a country which language I don't understand very well yet, that's not available in Mutt: translation.

But that's the good thing about free software and console programs, they are usually easy to hack to get whatever you're missing, so that's what I did.

The immediate solution in my mind was: download some program that uses Google Translate to translate stuff, and pipe messages through it using a macro. Simple, right? No. At least I couldn't find any script to do the translation, because Google Translate API is now paid.

So I tried to look for alternatives, first for some translation program that worked locally, but at least in Ubuntu's repositories I couldn't find anything. Then for online services alternatives, but nothing particularly useful either. So I finally found a guy that, doing some Firebuging, found how to use the free Google translate service. Using that example, I put together a 100 SLOC nice general Python script that you can use to translate stuff, piping them through it. Here is a trivial demonstration of the script (gt, short for Google Translate... Brilliant!):

$ echo hola mundo | gt
hello world
$ echo hallo Welt | gt --to fr
Bonjour tout le monde

And here is the output of gt --help to get a better impression on the script's capabilities:

usage: gt [-h] [--from LANG] [--to LANG] [--input-file FILE]
          [--output-file FILE] [--input-encoding ENC] [--output-encoding ENC]

Translate text using Google Translate.

optional arguments:
  -h, --help            show this help message and exit
  --from LANG, -f LANG  Translate from LANG language (e.g. en, de, es,
                        default: auto)
  --to LANG, -t LANG    Translate to LANG language (e.g. en, de, es, default:
                        en)
  --input-file FILE, -i FILE
                        Get text to translate from FILE instead of stdin
  --output-file FILE, -o FILE
                        Output translated text to FILE instead of stdout
  --input-encoding ENC, -I ENC
                        Use ENC caracter encoding to read the input (default:
                        get from locale)
  --output-encoding ENC, -O ENC
                        Use ENC caracter encoding to write the output
                        (default: get from locale)

You can download the script here, but be warned, I only tested it with Python 3.2. It's almost certain that it won't work with Python < 3.0, and there is a chance it won't work with Python 3.1 either. Please report success or failure, and patches to make it work with older Python versions are always welcome.

Ideally you shouldn't abuse Google's service through this script, if you need to translate massive texts every 50ms just pay for the service. For me it doesn't make any sense to do so, because I'm not using the service differently, when I didn't have the script I just copy&pasted the text to translate to the web. Another drawback of using the script is I couldn't find any way to make it work using HTTPS, so you shouldn't translate sensitive data (you shouldn't do so using the web either, because AFAIK it travels as plain text too).

Anyway, the final step was just to connect Mutt with the script. The solution I found is not ideal, but works most of the time. Just add these macros to your muttrc:

macro index,pager <Esc>t "v/plain\n|gt|less\n" "Translate the first plain text part to English"
macro attach <Esc>t "|gt|less\n" "Translate to English"

Now using Esc t in the index or pager view, you'll see the first plain text part of the message translated from an auto-detected language to English in the default encoding. In the attachments view, Esc t will pipe the current part instead. One thing I don't know how to do (or if it's even possible) is to get the encoding of the part being piped to let gt know. For now I have to make the pipe manually for parts that are not in UTF-8 to call gt with the right encoding options. The results are piped through less for convenience. Of course you can write your own macros to translate to another language other than English or use a different default encoding. For example, to translate to Spanish using ISO-8859-1 encoding, just replace the macro with this one:

macro index,pager <Esc>t "v/plain\n|gt -tes -Iiso-8859-1|less\n" "Translate the first plain text part to Spanish"

Well, that's it! I hope is as useful to you as is being to me ;-)

Update

Since picking the right encoding for the e-mail started to be a real PITA, I decided to improve the script to auto-detect the encoding, or to be more specific, to try several popular encodings.

So, here is the help message for the new version of the script:

usage: gt [-h] [--from LANG] [--to LANG] [--input-file FILE]
          [--output-file FILE] [--input-encoding ENC] [--output-encoding ENC]

Translate text using Google Translate.

optional arguments:
  -h, --help            show this help message and exit
  --from LANG, -f LANG  Translate from LANG language (e.g. en, de, es,
                        default: auto)
  --to LANG, -t LANG    Translate to LANG language (e.g. en, de, es, default:
                        en)
  --input-file FILE, -i FILE
                        Get text to translate from FILE instead of stdin
  --output-file FILE, -o FILE
                        Output translated text to FILE instead of stdout
  --input-encoding ENC, -I ENC
                        Use ENC caracter encoding to read the input, can be a
                        comma separated list of encodings to try, LOCALE being
                        a special value for the user's locale-specified
                        preferred encoding (default: LOCALE,utf-8,iso-8859-15)
  --output-encoding ENC, -O ENC
                        Use ENC caracter encoding to write the output
                        (default: LOCALE)

So now by default your locale's encoding, utf-8 and iso-8859-15 are tried by default (in that order). These are the defaults that makes more sense to me, you can change the default for the ones that makes sense to you by changing the script or by using -I option in your macro definition, for example:

macro index,pager <Esc>t "v/plain\n|gt -IMS-GREEK,IBM-1148,UTF-16BE|less\n"

Weird choice of defaults indeed :P