Luca's meaningless thoughts   SponsorGitHub SponsorsLiberapayPaypalBuy Me A CoffeePatreonFlattr

CDGC done

by Leandro Lucarella on 2010- 09- 28 15:16 (updated on 2010- 09- 28 15:16)
tagged cdgc, d, dgc, done, en, gc - with 0 comment(s)

I'm sorry about the quick and uninformative post, but I've been almost 2 weeks without Internet and I have to finish the first complete draft of my thesis in a little more than a week, so I don't have much time to write here.

The thing is, to avoid the nasty effect of memory usage being too high for certain programs when using eager allocation, I've made the GC minimize the heap more often. Even when some test are still a little slower with CDGC, but that's only for tests that only stress the GC without doing any actual work, so I think it's OK, in that cases the extra overhead of being concurrent is bigger than the gain (which is inexistent, because there is nothing to do in parallel with the collector).

Finally, I've implemented early collection, which didn't proved very useful, and tried to keep a better occupancy factor of the heap with the new min_free option, without much success either (it looks like the real winner was eager allocation).

I'm sorry I don't have time to show you some graphs this time. Of course the work is not really finished, there are plenty of things to be done still, but I think the GC have come to a point where it can be really useful, and I have to finish my thesis :)

After I'm done, I hope I can work on integrating the GC in Tango and/or Druntime (where there is already a first approach done by Sean Kelly).

Telecom y LA PUTA QUE TE PARIÓ

by Leandro Lucarella on 2010- 09- 27 23:10 (updated on 2010- 09- 27 23:10)
tagged en, es, rant, self, telecom - with 0 comment(s)

Note

English version below.

Huy, tal vez estuve un poco fuerte con el título... O tal vez no.

Hace 13 días que estoy sin teléfono (y por consiguiente Internet, porque tengo ADSL). En realidad vengo con problemas con el teléfono desde junio; los días de lluvia se escuchaba con ruido hasta cortarse (pero extrañamente se cortaba el teléfono pero no Internet) y luego volvía solo.

La fantástica empresa, tan apreciada y conocida por todos, llamada Telecom Argentina funciona tan bien, que llamás al 114 (bah, llama algún conocido porque vos no tenés teléfono) y te atiende un contestador y te dice que tomó el reclamo. Nadie habla de tiempos, nadie habla de nada. Ni te gastes en buscar algún lugar físico de atención al cliente porque al menos en la factura no figura nada, es una empresa fantasma.

Llamé varias veces por el tema de los cortes por lluvia pero nada, solo te amenazan con que cobran la visita si es problema interno de tu casa así que te dicen que te fijes bien que en la entrada del cable a tu casa ande todo bien. El problema es cuando nunca te dijeron donde pusieron la mágica cajita de Telecom que divide tu reino del de ellos. Tener un techo de tejas altísimo no ayuda, porque en general eso está en la terraza.

Luego de probar de todo, llamé y pedí un técnico, cagándome en las amenazas. Pero no, claro, ellos no pueden concertar una cita para un día y hora... Bueno, nadie te dice hora, pero ellos ni siquiera día te dicen. La mecánica es así, te mandan a alguien cuando ellos quieren, si estás bien, y si no recién ahí ven si combinan algún día. ¡Súper eficiente! Y no es que tampoco tarden un día en venir, perder una visita no es poca cosa.

La cosa es que un día cayeron tipo 8 de la mañana, y estaba más dormido que alguien que no está tan tan dormido como yo estaba, que era bastante mucho, y no escuché el timbre. Pero fueron lo suficientemente amables como para llamarme al celular hasta despertarme, a lo que después de regañar un poco por ser unos forros por caer cuando se le canta le dije que ahora los atendía, que me banquen 10 minutos que me levantaba. Ahhhh, la amabilidad de Telecom, tocaron impacientemente el timbre un par de veces más mientras me cambiaba y cuando salí ya se habían ido. Llamé a Telecom para ver que onda, y obviamente nadie sabe nada, es la compañía con peor comunicación interna del mundo, o con peor atención del mundo (me inclino por la segunda). Además, tampoco nadie te dice nada sobre plazos, le pregunté explícitamente si podían tardar un día o una semana o un mes y el muy descarado me dijo "sí". Todo es posible en el maravilloso mundo de Telecom.

Finalmente me llamaron para concertar la entrevista pero el día anterior había llovido y no había pasado nada así que la cancelé (porque lo último que faltaba era que cuando vengan no puedan encontrar el problema y me quieran cobrar diciendo que era problema mío).

Y bueno, así pasaron los días y se hizo 14 de septiembre y el teléfono se cortó a las 16:00hs para nunca más volver, así que llamé de nuevo. El viernes 17, con la velocidad de una babosa sedada, llegó el técnico (de nuevo sin avisar y que me enganchó de pedo antes de salir al laburo) y quiso entrar a la casa, pero le advertí que no había nada que ver adentro, que Telecom a mí nunca me dijo ni donde había puesto la caja. Así que totalmente perdido, se fue a la puerta a verificar una caja externa, con su escalerita de metro y medio que apenas le alcanzó para llegar a esa caja (a mi techo de unos 5 o 6 metros mucho no podía llegar). Y ahí había tono. Con cara de preocupado me dijo que tenía que pedir a otra gente para que lo arregle, a lo que, con cara de resignado, le pregunté "y cuando van a venir", y me dijo, siguiendo firme la política de Telecom "ni idea, ahora con la lluvia viste como es, aparecen todas estas cosas y hay poco personal" a lo que le imploro con cara de desesperado que al menos me de un mínimo o máximo, y me dice que mínimo una semana... ¡Groso!

Llamé de nuevo a Telecom, al área comercial porque en reparaciones te atiende un contestador que dice "su línea ya está en reparaciones" y te corta, y me dicen que ellos no saben nada, que llame a reparaciones. Unos FE NO ME NOS.

El fin de semana, de pedo mirando por la ventana, noté que de la pared del fondo salía un cable, que iba todo por la medianera del fondo hasta el jardín de la vecina, que colgaba largamente hasta subir a su tejado, pero que por la mitad tenía un hermoso empalme, sin siquiera un poco de cinta. Así a lo machote, al aire libre.

Me brillaron los ojitos un poco, pero la indignación casi me mata. Haciéndola corta, le pedí escalera larga a mis viejos prestada, permiso a la vecina y esta tarde por fin pude poner mis manos en él, y voilá! Tengo teléfono/Internet luego de 13 días de corte, gracias a mi alpedismo de mirar por la ventana, mi vecina que se copó, y mis viejos que se ultra-coparon en traerme la escalera. NO gracias a Telecom. Ahora tendré que seguir peleándome para que por un lado me descuenten los días sin servicio, por otro me paguen la multa del doble del abono proporcional por cada día sin servicio por no arreglarlo dentro de los 3 días y finalmente porque vengan a hacer una instalación como la gente, no ese mamamarracho que hicieron de cables colgados y empalmes (hay 2 cables más iguales colgando en el jardín de la vecina).

En fin, están advertidos, si tienen problemas con Telecom, empiecen a buscar la forma de arreglárselos ustedes mismos...

English

I'm sorry but I'll only do a short version in English. I was without phone and Internet for 13 days (and having problems when it rained since June) because the damn Telecom Argentina did my phone installation by throwing an spliced wire, hanging in my neighbor's backyard, that eventually got rusty and finally cut.

They just not respond, they won't do anything, I had to fix it myself. Now I have to keep trying to make them come to do a proper installation, as what I did is a patch over a patch.

El hombre de al lado

by Leandro Lucarella on 2010- 09- 14 00:06 (updated on 2010- 09- 14 00:06)
tagged alberto laiseca, daniel aráoz, el hombre de al lado, es, movie, sergio pángaro - with 0 comment(s)

Ayer fui a ver El hombre de al lado y la verdad que es muy recomendable. Se trata de una comedia negra con una trama muy simple (un conflicto entre dos vecinos porque uno hace una ventana en la medianera) que resalta la calidad de los personajes. El cordobés Daniel Araoz está realmente impecable en su papel, ya por eso solo vale la pena.

No voy a hablar mucho más al respecto, si quieren aún más spoilers pueden buscar en Google o ver el trailer oficial, aunque no sé si se los recomiendo porque para mí el trailer es muy poco representativo de la película (la venden como una película de acción/terror más o menos y nada que ver).

Como una curiosidad, la música original es cortesía de Sergio Pángaro (Baccarat, San Martín Vampire, Las Mil Copas), quien además tiene una pequeña aparición. Acá dejo lo que parece ser una escena recortada de la película (o un backstage) en donde canta un rato. Pueden verla sin temor, es spoiler-free.

Otra perlita es que hay agradecimientos a Alberto Laiseca (cuyo bigote inconfundible van a poder reconocer en el video de arriba si miran con atención), de quien nunca leí nada, pero tal vez debería (aunque no acostumbre a leer mucho), porque he sabido disfrutar de sus micro-relatos Cuentos de Terror que pasaban por por I.Sat (y que pueden revivir gracias a vos-tubo).

64 bits support for mutest

by Leandro Lucarella on 2010- 09- 13 19:51 (updated on 2010- 09- 13 19:51)
tagged 64 bits, en, mkmutest, mutest, x86_64 - with 0 comment(s)

All the millions of users of mutest that were loosing their minds trying to figure out why mkmutest was choking in 64 bits OSs, your suffering days are over, since I added 64 bits OSs (Linux) support.

Yeah, a great day of pure joy for the entire humanity!

Truly concurrent GC using eager allocation

by Leandro Lucarella on 2010- 09- 10 03:01 (updated on 2010- 09- 10 03:01)
tagged cdgc, concurrent, d, dgc, eager allocation, en, fork - with 0 comment(s)

Finally, I got the first version of CDGC with truly concurrent garbage collection, in the sense that all the threads of the mutator (the program itself) can run in parallel with the collector (well, only the mark phase to be honest :).

You might want to read a previous post about CDGC where I achieved some sort of concurrency by making only the stop-the-world time very short, but the thread that triggered the collection (and any other thread needing any GC service) had to wait until the collection finishes. The thread that triggered the collection needed to wait for the collection to finish to fulfill the memory allocation request (it was triggered because the memory was exhausted), while any other thread needing any GC service needed to acquire the global GC lock (damn global GC lock!).

To avoid this issue, I took a simple approach that I call eager allocation, consisting on spawn the mark phase concurrently but allocating a new memory pool to be able to fulfill the memory request instantly. Doing so, not only the thread that triggered the collection can keep going without waiting the collection to finish, the global GC lock is released and any other thread can use any GC service, and even allocate more memory, since a new pool was allocated.

If the memory is exhausted again before the collection finishes, a new pool is allocated, so everything can keep running. The obvious (bad) consequence of this is potential memory bloat. Since the memory usage is minimized from time to time, this effect should not be too harmful though, but let's see the results, there are plenty of things to analyze from them (a lot not even related to concurrency).

First, a couple of comments about the plots:

  • Times of Dil are multiplied by a factor of 0.1 in all the plots, times of rnddata are too, but only in the pause time and stop-the-world plots. This is only to make the plots more readable.
  • The unreadable labels rotated 45 degrees say: stw, fork and ea. Those stand for Stop-the-world (the basic collector), fork only (concurrent but without eager allocation) and eager allocation respectively. You can click on the images to see a little more readable SVG version.
  • The plots are for one CPU-only because using more CPUs doesn't change much (for these plots).
  • The times were taken from a single run, unlike the total run time plots I usually post. Since a single run have multiple collections, the information about min, max, average and standard deviation still applies for the single run.
  • Stop-the-world time is the time no mutator thread can run. This is not related to the global GC lock, is time the threads are really really paused (this is even necessary for the forking GC to take a snapshot of threads CPU registers and stacks). So, the time no mutator thread can do any useful work might be much bigger than this time, because the GC lock. This time is what I call Pause time. The maximum pause time is probably the most important variable for a GC that tries to minimize pauses, like this one. Is the maximum time a program will stay totally unresponsive (important for a server, a GUI application, a game or any interactive application).
Stop-the-world time for 1 CPU

The stop-the-world time is reduced so much that you can hardly see the times of the fork and ea configuration. It's reduced in all tests by a big margin, except for mcore and the bigarr. For the former it was even increased a little, for the later it was reduced but very little (but only for the ea* configuration, so it might be a bad measure). This is really measuring the Linux fork() time. When the program manages so little data that the mark phase itself is so fast that's faster than a fork(), this is what happens. The good news is, the pause times are small enough for those cases, so no harm is done (except from adding a little more total run time to the program).

Note the Dil maximum stop-the-world time, it's 0.2 seconds, looks pretty big, uh? Well, now remember that this time was multiplied by 0.1, the real maximum stop-the-world for Dil is 2 seconds, and remember this is the minimum amount of time the program is unresponsive! Thank god it's not an interactive application :)

Time to take a look to the real pause time:

Pause time for 1 CPU

OK, this is a little more confusing... The only strong pattern is that pause time is not changed (much) between the swt and fork configurations. This seems to make sense, as both configurations must wait for the whole collection to finish (I really don't know what's happening with the bh test).

For most tests (7), the pause time is much smaller for the ea configuration, 3 tests have much bigger times for it, one is bigger but similar (again mcore) and then is the weird case of bh. The 7 tests where the time is reduced are the ones that seems to make sense, that's what I was looking for, so let's see what's happening with the remaining 3, and for that, let's take a look at the amount of memory the program is using, to see if the memory bloat of allocating extra pools is significant.

Test Maximum heap size (MB)
Program stw ea ea/stw
dil 216 250 1.16
rnddata 181 181 1
voronoi 16 30 1.88
tree 7 114 16.3
bh 80 80 1
mcore 30 38 1.27
bisort 30 30 1
bigarr 11 223 20.3
em3d 63 63 1
sbtree 11 122 11.1
tsp 63 63 1
split 39 39 1

See any relations between the plot and the table? I do. It looks like some programs are not being able to minimize the memory usage, and because of that, the sweep phase (which still have to run in a mutator thread, taking the global GC lock) is taking ages. An easy to try approach is to trigger the minimization of the memory usage not only at when big objects are allocated (like it is now), but that could lead to more mmap()/munmap()s than necessary. And there still problems with pools that are kept alive because a very small object is still alive, which is not solved by this.

So I think a more long term solution would be to introduce what I call early collection too. Meaning, trigger a collection before the memory is exhausted. That would be the next step in the CDGC.

Finally, let's take a look at the total run time of the test programs using the basic GC and CDGC with concurrent marking and eager allocation. This time, let's see what happens with 2 CPUs (and 25 runs):

Total run time for 2 CPUs (25 runs)

Wow! It looks like this is getting really juicy (with exceptions, as usual :)! Dil time is reduced to about 1/3, voronoi is reduced to 1/10!!! Split and mcore have both their time considerably reduced, but that's because another small optimization (unrelated to what we are seeing today), so forget about those two. Same for rnddata, which is reduced because of precise heap scanning. But other tests increased its runtime, most notably bigarr takes almost double the time. Looking at the maximum heap size table, one can find some answers for this too. Another ugly side of early allocation.

For completeness, let's see what happens with the number of collections triggered during the program's life. Here is the previous table with this new data added:

Test Maximum heap size (MB) Number of collections
Program stw ea ea/stw stw ea ea/stw
dil 216 250 1.16 62 50 0.81
rnddata 181 181 1 28 28 1
voronoi 16 30 1.88 79 14 0.18
tree 7 114 16.3 204 32 0.16
bh 80 80 1 27 27 1
mcore 30 38 1.27 18 14 0.78
bisort 30 30 1 10 10 1
bigarr 11 223 20.3 305 40 0.13
em3d 63 63 1 14 14 1
sbtree 11 122 11.1 110 33 0.3
tsp 63 63 1 14 14 1
split 39 39 1 7 7 1

See how the number of collections is practically reduced proportionally to the increase of the heap size. When the increase in size explodes, even when the number of collections is greatly reduced, the sweep time take over and the total run time is increased. Specially in those tests where the program is almost only using the GC (as in sbtree and bigarr). That's why I like the most Dil and voronoi as key tests, they do quite a lot of real work beside asking for memory or using other GC services.

This confirms that the performance gain is not strictly related to the added concurrency, but because of a nice (finally! :) side-effect of eager allocation: removing some pressure from the GC by increasing the heap size a little (Dil gets 3x boost in run time for as little as 1.16x of memory usage; voronoi gets 10x at the expense of almost doubling the heap, I think both are good trade-offs). This shows another weak point of the GC, sometimes the HEAP is way too tight, triggering a lot of collections, which leads to a lot of GC run time overhead. Nothing is done right now to keep a good heap occupancy ratio.

But is there any real speed (in total run time terms) improvement because of the added concurrency? Let's see the run time for 1 CPU:

Total run time for 1 CPU (25 runs)

It looks like there is, specially for my two favourite tests: both Dil and voronoi get a 30% speed boost! That's not bad, not bad at all...

If you want to try it, the repository has been updated with this last changes :). If you do, please let me know how it went.

Scola

by Leandro Lucarella on 2010- 09- 07 20:31 (updated on 2010- 09- 10 12:15)
tagged 2010, argentina, basket, deporte, es, luis scola, mundial, scola, turquía - with 0 comment(s)

Qué animalada, bestialidad, monstruosidad lo que está haciendo Luis Scola...

¡Imparable, de otro planeta!

Update

¡Ups! Lo yetee... ¡Perdón Luisito!

The Wilderness Downtown

by Leandro Lucarella on 2010- 09- 01 01:40 (updated on 2010- 09- 01 01:40)
tagged arcade fire, chris milk, en, html5, music, the wilderness downtown, video, we used to wait - with 0 comment(s)

The Wilderness Downtown is a new interactive film by Chris Milk (done in HTML5 as a Google Chrome Experiment), and the new video for the song We Used To Wait from Arcade Fire. Judging from this, and the Unstaged show it looks like they are willing to play with the new technologies.

I like the video, for some reason it reminds me of House Of Cards (maybe because Google was involved too). The downside is, it only works on Chrome / Chromium.