Tuesday, June 23, 2009

EuroPython

EuroPython is coming. We have two 30-minutes talks that we will present. In addition, the sprint takes place the 29th of June (there will be no-one from the team on the 28th of June), as well as on the 3rd and 4th of July.

JIT progress

In the last days I finally understood how to do virtualizables. Now the frame overhead is gone. This was done with the help of discussion with Samuele, porting ideas from PyPy's first JIT attempt.

This is of course work in progress, but it works in PyPy (modulo a few XXXs, but no bugs so far). The performance of the resulting code is quite good: even with Boehm (the GC that is easy to compile to but gives a slowish pypy-c), a long-running loop typically runs 50% faster than CPython. That's "baseline" speed, moreover: we will get better speed-ups by applying optimizations on the generated code. Doing so is in progress, but it suddenly became easier because that optimization phase no longer has to consider virtualizables -- they are now handled earlier.

Update:Virtualizables is basically a way to avoid frame overhead. The frame object is allocated and has a pointer, but the JIT is free to unpack it's fields (for example python level locals) and store them somewhere else (stack or registers). Each external (out of jit) access to frame managed by jit, needs to go via special accessors that can ask jit where those variables are.

Tuesday, June 16, 2009

News from the jit front

As usual, progress is going slower then predicted, but nevertheless, we're working hard to make some progress.

We recently managed to make our nice GCs cooperate with our JIT. This is one point from our detailed plan. As of now, we have a JIT with GCs and no optimizations. It already speeds up some things, while slowing down others. The main reason for this is that the JIT generates assembler which is kind of ok, but it does not do the same level of optimizations gcc would do.

So the current status of the JIT is that it can produce assembler out of executed python code (or any interpreter written in RPython actually), but the results are not high quality enough since we're missing optimizations.

The current plan, as of now, looks as follows:

  • Improve the handling of GCs in JIT with inlining of malloc-fast paths, that should speed up things by a constant, not too big factor.
  • Write a simplified python interpreter, which will be a base for experiments and to make sure that our JIT does correct things with regard to optimizations. That would work as mid-level integration test.
  • Think about ways to inline loop-less python functions into their parent's loop.
  • Get rid of frame overhead (by virtualizables)
  • Measure, write benchmarks, publish
  • Profit

Cheers,
fijal