Luca's meaningless thoughts

Translation of e-mails using Mutt

by Leandro Lucarella on 2012- 09- 24 12:45 (updated on 2012- 10- 02 12:58)
tagged e-mail, en, floss, gmail, google translate, mutt, python, release, script, software, translate - with 0 comment(s)

Update

New translation script here, see the bottom of the post for a description of the changes.

I don't like to trust my important data to big companies like Google. That's why even when I have a GMail, I don't use it as my main account. I'm also a little old fashion for some things, and I like to use Mutt to check my e-mail.

But GMail have a very useful feature, at least it became very useful since I moved to a country which language I don't understand very well yet, that's not available in Mutt: translation.

But that's the good thing about free software and console programs, they are usually easy to hack to get whatever you're missing, so that's what I did.

The immediate solution in my mind was: download some program that uses Google Translate to translate stuff, and pipe messages through it using a macro. Simple, right? No. At least I couldn't find any script to do the translation, because Google Translate API is now paid.

So I tried to look for alternatives, first for some translation program that worked locally, but at least in Ubuntu's repositories I couldn't find anything. Then for online services alternatives, but nothing particularly useful either. So I finally found a guy that, doing some Firebuging, found how to use the free Google translate service. Using that example, I put together a 100 SLOC nice general Python script that you can use to translate stuff, piping them through it. Here is a trivial demonstration of the script (gt, short for Google Translate... Brilliant!):

$ echo hola mundo | gt
hello world
$ echo hallo Welt | gt --to fr
Bonjour tout le monde

And here is the output of gt --help to get a better impression on the script's capabilities:

usage: gt [-h] [--from LANG] [--to LANG] [--input-file FILE]
          [--output-file FILE] [--input-encoding ENC] [--output-encoding ENC]

Translate text using Google Translate.

optional arguments:
  -h, --help            show this help message and exit
  --from LANG, -f LANG  Translate from LANG language (e.g. en, de, es,
                        default: auto)
  --to LANG, -t LANG    Translate to LANG language (e.g. en, de, es, default:
                        en)
  --input-file FILE, -i FILE
                        Get text to translate from FILE instead of stdin
  --output-file FILE, -o FILE
                        Output translated text to FILE instead of stdout
  --input-encoding ENC, -I ENC
                        Use ENC caracter encoding to read the input (default:
                        get from locale)
  --output-encoding ENC, -O ENC
                        Use ENC caracter encoding to write the output
                        (default: get from locale)

You can download the script here, but be warned, I only tested it with Python 3.2. It's almost certain that it won't work with Python < 3.0, and there is a chance it won't work with Python 3.1 either. Please report success or failure, and patches to make it work with older Python versions are always welcome.

Ideally you shouldn't abuse Google's service through this script, if you need to translate massive texts every 50ms just pay for the service. For me it doesn't make any sense to do so, because I'm not using the service differently, when I didn't have the script I just copy&pasted the text to translate to the web. Another drawback of using the script is I couldn't find any way to make it work using HTTPS, so you shouldn't translate sensitive data (you shouldn't do so using the web either, because AFAIK it travels as plain text too).

Anyway, the final step was just to connect Mutt with the script. The solution I found is not ideal, but works most of the time. Just add these macros to your muttrc:

macro index,pager <Esc>t "v/plain\n|gt|less\n" "Translate the first plain text part to English"
macro attach <Esc>t "|gt|less\n" "Translate to English"

Now using Esc t in the index or pager view, you'll see the first plain text part of the message translated from an auto-detected language to English in the default encoding. In the attachments view, Esc t will pipe the current part instead. One thing I don't know how to do (or if it's even possible) is to get the encoding of the part being piped to let gt know. For now I have to make the pipe manually for parts that are not in UTF-8 to call gt with the right encoding options. The results are piped through less for convenience. Of course you can write your own macros to translate to another language other than English or use a different default encoding. For example, to translate to Spanish using ISO-8859-1 encoding, just replace the macro with this one:

macro index,pager <Esc>t "v/plain\n|gt -tes -Iiso-8859-1|less\n" "Translate the first plain text part to Spanish"

Well, that's it! I hope is as useful to you as is being to me ;-)

Update

Since picking the right encoding for the e-mail started to be a real PITA, I decided to improve the script to auto-detect the encoding, or to be more specific, to try several popular encodings.

So, here is the help message for the new version of the script:

usage: gt [-h] [--from LANG] [--to LANG] [--input-file FILE]
          [--output-file FILE] [--input-encoding ENC] [--output-encoding ENC]

Translate text using Google Translate.

optional arguments:
  -h, --help            show this help message and exit
  --from LANG, -f LANG  Translate from LANG language (e.g. en, de, es,
                        default: auto)
  --to LANG, -t LANG    Translate to LANG language (e.g. en, de, es, default:
                        en)
  --input-file FILE, -i FILE
                        Get text to translate from FILE instead of stdin
  --output-file FILE, -o FILE
                        Output translated text to FILE instead of stdout
  --input-encoding ENC, -I ENC
                        Use ENC caracter encoding to read the input, can be a
                        comma separated list of encodings to try, LOCALE being
                        a special value for the user's locale-specified
                        preferred encoding (default: LOCALE,utf-8,iso-8859-15)
  --output-encoding ENC, -O ENC
                        Use ENC caracter encoding to write the output
                        (default: LOCALE)

So now by default your locale's encoding, utf-8 and iso-8859-15 are tried by default (in that order). These are the defaults that makes more sense to me, you can change the default for the ones that makes sense to you by changing the script or by using -I option in your macro definition, for example:

macro index,pager <Esc>t "v/plain\n|gt -IMS-GREEK,IBM-1148,UTF-16BE|less\n"

Weird choice of defaults indeed :P

The day CouchSurfing died

by Leandro Lucarella on 2012- 09- 13 16:08 (updated on 2012- 09- 14 13:54)
tagged bewelcome, couchsurfing, cs, en, freedom, personal information, privacy - with 0 comment(s)

Some time ago CouchSurfing announced that they will become a socially responsible B-Corporation. In case you forgot about it, here is what the creators of this utopic project said then:

I believed them, and when everybody went bananas about it I thought there was some extremism and overreaction. After all, you need money to maintain the structure for a service like CS and if this change would help, it was fine with me as long as the spirit of the community was the same.

Unfortunately, now I think all these people were right. A few days ago CS announced a change in the terms of usage and privacy policy. The new ones include terms as stupid and abusive as:

5.3 Member Content License. If you post Member Content to our Services, you hereby grant us a perpetual, worldwide, irrevocable, non-exclusive, royalty-free and fully sublicensable license to use, reproduce, display, perform, adapt, modify, create derivative works from, distribute, have distributed and promote such Member Content in any form, in all media now known or hereinafter created and for any purpose, including without limitation the right to use your name, likeness, voice or identity.

They also removed all mention about being a socially responsible B-Corp, so, where is this heading at? I don't have a Facebook account because I appreciate my privacy and feel like FB never cared about it (among other things), but this terms of CS makes FB looks like the EFF!

Here is one of the many related discussions about the issue inside CS forums:

http://www.couchsurfing.org/group_read.html?gid=7621&post=13000298

People are even making complaints in their's respective country data protection and privacy agencies. But I see little sense in it, at least from a point of view of being part of CS. Even if they fix now the terms to be more reasonable, I don't want to be part of it any more.

So, from tomorrow all the content you leave in CS will be theirs forever, irrevocably. Unfortunately there is no way to opt out, or keep the old content under the old terms, so they give me no option but to remove all the content I don't want to give them perpetual, irrevocable, sublicensable, etc. rights to. And that's what I'm doing right now.

I think for now I will keep my account open (with fake information about me, even when I'm violation the terms and conditions which says I should provide truthful information about me) because there is still a great community behind it and I don't want to loose all contact with it. But my plan is to start using BeWelcome.org instead, hoping they don't eventually follow the same path CS did (their website is open source so at least there is a chance to clone the service if they do).

So, thanks for all the good times CS, and RIP!

Update

It looks like CS needs more time to see the situation, in the terms and conditions page now says the new terms are going to be applicable starting on the 21st instead of the 14th of September.

Also, the issue hit the media, at least in Germany (Google translate is your friend):

Anyway, as I said before, even if they fix the terms, is game over for me, all trust in CS being a community rather than just another company trying to do data mining is gone.