Friday, April 27, 2012

STM update (and thanks everybody)

A short update on the Software Transactional Memory (STM) side. Let me remind you that the work is to add STM internally into PyPy, with the goal of letting the user's programs run on multiple cores after a minor adaptation. (The goal is not to expose STM to the user's program.) I will soon write some official documentation that explains in more details exactly what you get. For now you can read the previous blog posts, and you can also find technical details in the call for donation itself; or directly look at how I adapted the examples linked to later in this post.

I have now reached the point where the basics seem to work. There is no integration with the JIT so far; moreover the integration with the Garbage Collection subsystem is not finished right now, but at least it is "not crashing in my simple tests and not leaking memory too quickly". (It means that it is never calling __del__ so far, although it releases memory; and when entering transactional mode or when going to the next transaction, all live objects become immortal. This should still let most not-too-long-running programs work.)

If you want to play with it, you can download this binary (you need to put it in a place with the paths lib-python and lib_pypy, for example inside the main directory from a regular nightly tarball or from a full checkout). This version was compiled for Linux x86 32-bit from the stm-gc branch on the 25th of April. It runs e.g. the modified version of richards. This branch could also be translated for Linux x86-64, but not for other OSes nor other CPUs for now.

The resulting pypy-stm exposes the same interface as the pure Python transaction module, which is an emulator (running on CPython or any version of PyPy) which can be used to play around and prepare your programs. See the comments in there. A difference is that the real pypy-stm doesn't support epoll right now, so it cannot be used yet to play with a branch of Twisted that was already adapted (thanks Jean-Paul Calderone); but that's coming soon. For now you can use it to get multi-core usage on purely computational programs.

I did for example adapt PyPy's own see the tweak in rpython/ Lines 273-281 are all that I needed to add, and they are mostly a "simplification and parallelization" of the lines above. There are a few more places in the whole that could be similarly modified, but overall it is just that: a few places. I did not measure performance, but I checked that it is capable of using multiple cores in the RTyping step of translation, with --- as expected --- some still-reasonable number of conflicts, particularly at the beginning when shared data structures are still being built.

On a few smaller, more regular examples like richards, I did measure the performance. It is not great, even taking into account that it has no JIT so far. Running pypy-stm with one thread is roughly 5 times slower than running a regular PyPy with no JIT (it used to be better in previous versions, but they didn't have any GC; nevertheless, I need to investigate). However, it does seem to scale. At least, it scales roughly as expected on my 2-real-cores, 4-hyperthreaded-cores laptop (i.e. for N between 1 and 4, the N-threaded pypy-stm performs similarly to N independent pypy-stm's running one thread each).

And finally...

...a big thank you to everyone who contributed some money to support this! As you see on the PyPy site, we got more than 6700$ so far in only 5 or 6 weeks. Thanks to that, my contract started last Monday, and I am now paid a small salary via the Software Freedom Conservancy (thanks Bradley M. Kuhn for organizational support from the SFC). Again, thank you everybody!

UPDATE: The performance regression was due to disabling an optimization, the method cache, which caused non-deterministic results --- the performance could vary from simple to double. Today, as a workaround, I made the method cache transaction-local for now; it is only effective for transactions that run for long enough (maybe 0.1ms or 1ms), but at least it is there in this situation. In the version of richards presented above, the transactions are too short to make a difference (around 0.015ms).

Tuesday, April 17, 2012

NumPy on PyPy progress report


A lot of things happened in March, like pycon. I was also busy doing other things (pictured), so apologies for the late numpy status update.

However, a lot of things have happened and numpy continues to be one of the main points of entry for hacking on PyPy. Apologies to all the people whose patches I don't review in timely manner, but seriously, you do a lot of work.

This list of changes is definitely not exhaustive, and I might be forgetting important contributions. In a loose order:

  • Matti Picus made out parameter work for a lot of (but not all) functions.

  • We merged record dtypes support. The only missing dtypes left are complex (important), datetime (less important) and object (which will probably never be implemented because it makes very little sense and is a mess with moving GCs).

  • Taavi Burns and others implemented lots of details, including lots of ufuncs. On the completely unscientific measure of "implemented functions" on numpypy status page, we're close to 50% of numpy working. In reality it might be more or less, but after complex dtypes we're getting very close to running real programs.

  • Bool indexing of arrays of the same size should work, leaving only arrays-of-ints indexing as the last missing element of fancy indexing.

  • I did some very early experiments on SSE. This work is seriously preliminary - in fact the only implemented operation is addition of float single-dimension numpy arrays. However, results are encouraging, given that our assembler generator is far from ideal:



    PyPy SSE


    GCC non-looped

    GCC looped



















    The benchmark repo is available. GCC was run with -O3, no further options specified. PyPy was run with default options, the SSE branch is under backend-vector-ops, but it's not working completely yet.

    One might argue that C and Python is not the same code - indeed it is not. It just shows some possible approach to writing numeric code.

Next step would be to just continue implementing missing features such as

  • specialised arrays i.e. masked arrays and matrixes
  • core modules such as fft, linalg, random.
  • numpy's testing framework

The future is hard to predict, but we're not far off!


UPDATE:Indeed, string and unicode dtypes are not supported yet. They're as important as complex dtype

Friday, April 13, 2012

PyCon 2012 wrap up

So, PyCon happened. This was the biggest PyCon ever and probably the biggest gathering of Python hackers ever.

From the PyPy perspective, a lot at PyCon was about PyPy. Listing things:

  • David Beazley presented an excellent keynote describing his experience diving head-first into PyPy and at least partly failing. He, however, did not fail to explain bits and pieces about PyPy's architecture. Video is available.
  • We gave tons of talks, including the tutorial, why pypy by example and pypy's JIT architecture
  • We had a giant influx of new commiters, easily doubling the amount of pull requests ever created for PyPy. The main topics for newcomers were numpy and py3k, disproving what David said about PyPy being too hard to dive into ;)
  • Guido argued in his keynote that Python is not too slow. In the meantime, we're trying to prove him correct :-)

We would like to thank everyone who talked to us, shared ideas and especially those who participated in sprints - we're always happy to welcome newcomers!

I'm sure there are tons of things I forgot, but thank you all!

Cheers, fijal

Friday, April 6, 2012

Py3k status update #3

This is the third status update about my work on the py3k branch, which I can work on thanks to all of the people who donated to the py3k proposal.

A lot of work has been done during the last month: as usual, the list of changes is too big to be reported in a detalied way, so this is just a summary of what happened.

One of the most active areas was killing old and deprecated features. In particular, we killed support for the __cmp__ special method and its counsins, the cmp builtin function and keyword argument for list.sort() and sorted(). Killing is easy, but then you have to fix all the places which breaks because of this, including all the types which relied on __cmp__ to be comparable,, fixing all the tests which tried to order objects which are no longer ordeable now, or implementing new behavior like forbidding calling hash() on objects which implement __eq__ but not __hash__.

Among the other features, we killed lots of now-gone functions in the operator module, the builtins apply(), reduce() and buffer, and the os.* functions to deal with temporary files, which has been deprecated in favour of the new tempfile module.

The other topic which can't miss in a py3k status update is, as usual, string-vs-unicode. At this round, we fixed bugs in string formatting (in particular to teach format() to always use unicode strings) and various corner cases about when calling the (possibly overridden) __str__ method on subclasses of str. Believe me, you don't want to know the precise rules :-).

Other features which we worked on and fixed tests include, but are not limited to, marshal, hashlib, zipimport, _socket and itertools, plus the habitual endless lists of tests which fail for shallow reasons such as the syntactic differences, int vs long, range() vs list(range()) etc. As a result, the number of failing tests dropped from 650 to 235: we are beginning to see the light at the end of the tunnel :-)

Benjamin finished implementing Python 3 syntax. Most of it was small cleanups and tweaks to be compatible with CPython such as making True and False keywords and preventing . . . (note spaces between dots) from being parsed as Ellipsis. Larger syntax additions included keyword only arguments and function annotations.

Finally, we did some RPython fixes, so that it is possible again to translate PyPy in the py3k branch. However, the resuling binary is a strange beast which mixes python 2 and python 3 semantics, so it is unusable for anything but showing friends how cool it is.

I would like to underline that I was not alone in doing all this work. In particular, a lot of people joined the PyPy sprint at Pycon and worked on the branch, as you can clearly see in this activity graph. I would like to thank all who helped!

Antonio and Benjamin

Thursday, April 5, 2012

PyPy sprint in Leipzig, Germany (June 22-27)

The next PyPy sprint will be held --- for the first time in a while --- in a place where we haven't been so far: Leipzig, Germany, at the Python Academy's Teaching Center. It will take place from the 22nd to the 27th of June 2012, before EuroPython. Thanks to Mike Müller for organizing it!

This is a fully public sprint, everyone is welcome to join us. All days are full sprint days, so it is recommended to arrive the 21st and leave the 28th.

Topics and goals

Open. Here are some goals:

  • numpy: progress towards completing the numpypy module; try to use it in real code
  • stm: progress on Transactional Memory; try out the transaction module on real code.
  • jit optimizations: there are a number of optimizations we can still try out or refactor.
  • work on various, more efficient data structures for Python language. A good example would be lazy string slicing/concatenation or more efficient objects.
  • any other PyPy-related topic is fine too.


For students, we have the possibility to support some costs via PyPy funds. Additionally, we can support you applying for grants from the PSF and other sources.


If you'd like to come, please sign up either by announcing yourself on pypy-dev, or by directly adding yourself to the list of people. (We need to have a head count for the organization.) If you are new to the project please drop a note about your interests and post any questions.


For more information, please see the sprint announcement.