Tuesday, July 10, 2012

Py3k status update #5

This is the fifth status update about our work on the py3k branch, which we
can work on thanks to all of the people who donated to the py3k proposal.

Apart from the usual "fix shallow py3k-related bugs" part, most of my work in
this iteration has been to fix the bootstrap logic of the interpreter, in
particular to setup the initial sys.path.

Until few weeks ago, the logic to determine sys.path was written entirely
at app-level in pypy/translator/goal/app_main.py, which is automatically
included inside the executable during translation. The algorithm is more or
less like this:

  1. find the absolute path of the executable by looking at sys.argv[0]
    and cycling through all the directories in PATH
  2. starting from there, go up in the directory hierarchy until we find a
    directory which contains lib-python and lib_pypy

This works fine for Python 2 where the paths and filenames are represented as
8-bit strings, but it is a problem for Python 3 where we want to use unicode
instead. In particular, whenever we try to encode a 8-bit string into an
unicode, PyPy asks the _codecs built-in module to find the suitable
codec. Then, _codecs tries to import the encodings package, to list
all the available encodings. encodings is a package of the standard
library written in pure Python, so it is located inside
lib-python/3.2. But at this point in time we yet have to add
lib-python/3.2 to sys.path, so the import fails. Bootstrap problem!

The hard part was to find the problem: since it is an error which happens so
early, the interpreter is not even able to display a traceback, because it
cannot yet import traceback.py. The only way to debug it was through some
carefully placed print statement and the help of gdb. Once found the
problem, the solution was as easy as moving part of the logic to RPython,
where we don't have bootstrap problems.

Once the problem was fixed, I was able to finally run all the CPython test
against the compiled PyPy. As expected there are lots of failures, and fixing
them will be the topic of my next months.

8 comments:

Anonymous said...

Would be nice to have a PyPy distribution embeded in OpenOffice 3.4.2

haypo said...

I solved a similar issue in Python 3.2. Python 3 did use the wrong encoding to encode/decode filenames. When I tried to use the filesystem encoding instead, I had an ugly bootstrap issue with encodings implemented in Python (whereas ASCII, latin1 and utf-8 are implemented in C with a fast-path).

The solution is to use C function to encode to/decode from the locale encoding, because the filesystem encoding is the locale encoding. mbstowcs() and wcstombs() are used until the Python codec machinery is ready.

Anonymous said...

Did you try to compare PyPy to Pythran? According to his author, Pythran is on some benchmarks 30x faster than PyPy: http://linuxfr.org/users/serge_ss_paille/journaux/pythran-python-c#comment-1366988

see also the manual here: https://github.com/serge-sans-paille/pythran/blob/master/MANUAL

What do you think of this approach of translating Python to C++ ?

Maciej Fijalkowski said...

@Anonymous - there is extremely little point in comparing python with whatever-looks-like-python-but-is-not. It's beyond the scope of this blog for sure.

Anonymous said...

To be fair to @Anonymous, the pypy developers commonly compare pypy to C in benchmarks so it's not so unreasonable. The point is that only that one should understand that they are different languages, not that all comparisons between languages are pointless.

Maciej Fijalkowski said...

Oh yes sure. It's as producting to compare pypy to shedskin as it is to compare pypy with g77. It still *is* or might be a valuable comparison, but it is important to keep in mind that those languages are different.

ArneBab said...

Any news on the py3k side?

That’s actually what’s most interesting to me on a practical level and it would be nice to know how long it will still take till I can test it :)

Antonio Cuni said...

@arne due to EuroPython and some personal issues not much has happened on the py3k side in the past month.

It is hard to give estimates about when things will be ready, because it depends a lot on how much time I'll be able to dedicate on it. At this point, most of the major features are implemented and I am fixing all the smaller ones which are highlighted by failing CPython tests. However, sometimes a small feature might take much more time to fix than a big one