Luca's meaningless thoughts   SponsorGitHub SponsorsLiberapayPaypalBuy Me A CoffeePatreonFlattr

Translation of e-mails using Mutt

by Leandro Lucarella on 2012- 09- 24 12:45 (updated on 2012- 10- 02 12:58)
tagged e-mail, en, floss, gmail, google translate, mutt, python, release, script, software, translate - with 0 comment(s)


New translation script here, see the bottom of the post for a description of the changes.

I don't like to trust my important data to big companies like Google. That's why even when I have a GMail, I don't use it as my main account. I'm also a little old fashion for some things, and I like to use Mutt to check my e-mail.

But GMail have a very useful feature, at least it became very useful since I moved to a country which language I don't understand very well yet, that's not available in Mutt: translation.

But that's the good thing about free software and console programs, they are usually easy to hack to get whatever you're missing, so that's what I did.

The immediate solution in my mind was: download some program that uses Google Translate to translate stuff, and pipe messages through it using a macro. Simple, right? No. At least I couldn't find any script to do the translation, because Google Translate API is now paid.

So I tried to look for alternatives, first for some translation program that worked locally, but at least in Ubuntu's repositories I couldn't find anything. Then for online services alternatives, but nothing particularly useful either. So I finally found a guy that, doing some Firebuging, found how to use the free Google translate service. Using that example, I put together a 100 SLOC nice general Python script that you can use to translate stuff, piping them through it. Here is a trivial demonstration of the script (gt, short for Google Translate... Brilliant!):

$ echo hola mundo | gt
hello world
$ echo hallo Welt | gt --to fr
Bonjour tout le monde

And here is the output of gt --help to get a better impression on the script's capabilities:

usage: gt [-h] [--from LANG] [--to LANG] [--input-file FILE]
          [--output-file FILE] [--input-encoding ENC] [--output-encoding ENC]

Translate text using Google Translate.

optional arguments:
  -h, --help            show this help message and exit
  --from LANG, -f LANG  Translate from LANG language (e.g. en, de, es,
                        default: auto)
  --to LANG, -t LANG    Translate to LANG language (e.g. en, de, es, default:
  --input-file FILE, -i FILE
                        Get text to translate from FILE instead of stdin
  --output-file FILE, -o FILE
                        Output translated text to FILE instead of stdout
  --input-encoding ENC, -I ENC
                        Use ENC caracter encoding to read the input (default:
                        get from locale)
  --output-encoding ENC, -O ENC
                        Use ENC caracter encoding to write the output
                        (default: get from locale)

You can download the script here, but be warned, I only tested it with Python 3.2. It's almost certain that it won't work with Python < 3.0, and there is a chance it won't work with Python 3.1 either. Please report success or failure, and patches to make it work with older Python versions are always welcome.

Ideally you shouldn't abuse Google's service through this script, if you need to translate massive texts every 50ms just pay for the service. For me it doesn't make any sense to do so, because I'm not using the service differently, when I didn't have the script I just copy&pasted the text to translate to the web. Another drawback of using the script is I couldn't find any way to make it work using HTTPS, so you shouldn't translate sensitive data (you shouldn't do so using the web either, because AFAIK it travels as plain text too).

Anyway, the final step was just to connect Mutt with the script. The solution I found is not ideal, but works most of the time. Just add these macros to your muttrc:

macro index,pager <Esc>t "v/plain\n|gt|less\n" "Translate the first plain text part to English"
macro attach <Esc>t "|gt|less\n" "Translate to English"

Now using Esc t in the index or pager view, you'll see the first plain text part of the message translated from an auto-detected language to English in the default encoding. In the attachments view, Esc t will pipe the current part instead. One thing I don't know how to do (or if it's even possible) is to get the encoding of the part being piped to let gt know. For now I have to make the pipe manually for parts that are not in UTF-8 to call gt with the right encoding options. The results are piped through less for convenience. Of course you can write your own macros to translate to another language other than English or use a different default encoding. For example, to translate to Spanish using ISO-8859-1 encoding, just replace the macro with this one:

macro index,pager <Esc>t "v/plain\n|gt -tes -Iiso-8859-1|less\n" "Translate the first plain text part to Spanish"

Well, that's it! I hope is as useful to you as is being to me ;-)


Since picking the right encoding for the e-mail started to be a real PITA, I decided to improve the script to auto-detect the encoding, or to be more specific, to try several popular encodings.

So, here is the help message for the new version of the script:

usage: gt [-h] [--from LANG] [--to LANG] [--input-file FILE]
          [--output-file FILE] [--input-encoding ENC] [--output-encoding ENC]

Translate text using Google Translate.

optional arguments:
  -h, --help            show this help message and exit
  --from LANG, -f LANG  Translate from LANG language (e.g. en, de, es,
                        default: auto)
  --to LANG, -t LANG    Translate to LANG language (e.g. en, de, es, default:
  --input-file FILE, -i FILE
                        Get text to translate from FILE instead of stdin
  --output-file FILE, -o FILE
                        Output translated text to FILE instead of stdout
  --input-encoding ENC, -I ENC
                        Use ENC caracter encoding to read the input, can be a
                        comma separated list of encodings to try, LOCALE being
                        a special value for the user's locale-specified
                        preferred encoding (default: LOCALE,utf-8,iso-8859-15)
  --output-encoding ENC, -O ENC
                        Use ENC caracter encoding to write the output
                        (default: LOCALE)

So now by default your locale's encoding, utf-8 and iso-8859-15 are tried by default (in that order). These are the defaults that makes more sense to me, you can change the default for the ones that makes sense to you by changing the script or by using -I option in your macro definition, for example:

macro index,pager <Esc>t "v/plain\n|gt -IMS-GREEK,IBM-1148,UTF-16BE|less\n"

Weird choice of defaults indeed :P

How can you don't love FLOSS?

by Leandro Lucarella on 2010- 06- 12 00:11 (updated on 2010- 06- 12 00:11)
tagged en, floss, jabber, mcabber, migration, psi, python, script - with 0 comment(s)

Let me tell you my story.

I'm moving to a new jabber server, so I had to migrate my contacts. I have several jabber accounts, collected all over the years (I started using jabber a long time ago, around 2001 [1]; in that days ICQ interoperability was an issue =P), with a bunch of contacts each, so manual migration was out of the question.

First I thought "this is gonna get ugly" so I thought about using some XMPP Python library to do the work talking directly to the servers, but then I remember 2 key facts:

  1. I use Psi, which likes XML a lot, and it has a roster cache in a file.
  2. I use mcabber, which has a FIFO for injecting commands via the command line.

Having this two facts in mind, the migration was as easy as a less than 25 SLOC Python script, without any external dependencies (just Python stdlib):

import sys
import xml.etree.ElementTree as et

def ns(s):
        return '{}' + s

tree = et.parse(sys.argv[1])

accounts = tree.getroot()[0]

for account in accounts.getchildren():
        roster_cache = account.find(ns('roster-cache'))
        if roster_cache is None:
        for contact in roster_cache:
                name = contact.findtext(ns('name')).strip().encode('utf-8')
                jid = contact.findtext(ns('jid')).strip().encode('utf-8')
                print '/add', jid, name
                print '/roster search', jid
                g = contact.find(ns('groups')).findtext(ns('item'))
                if g is not None:
                        group = g.strip().encode('utf-8')
                        print '/move', group


Now all you have to do is know where your Psi accounts.xml file is (usually ~/.psi/profiles/<your_profile_name>/accounts.xml), and where your mcabber FIFO is (usually ~/.mcabber/mcabber.fifo, but maybe you have to configure mcabber first) and run:

python /path/to/accounts.xml > /path/to/mcabber.fifo

You can omit the > /path/to/mcabber.fifo first if you have to take a peek at what mcabber commands will be executed, and if you are happy with the results run the full command to execute them.

The nice thing is it's very easy to customize if you have some notions of Python, for example, I didn't want to migrate one account; adding this line just below the for did the trick (the account is named Bad Account in the example):

if account.findtext(ns('name')).strip() == 'Bad Account':

Adding similar simple lines you can filter unwanted users, or groups, or whatever.

And all of this is thanks to:

Thank god for that!


A few people will be interested in this, but I think the ones that are will appreciate this link :) (in spanish):