Wednesday, June 17, 2015

PyPy and ijson - a guest blog post

This gem was posted in the ijson issue tracker after some discussion on #pypy, and Dav1dde kindly allowed us to repost it here:

"So, I was playing around with parsing huge JSON files (19GiB, testfile is ~520MiB) and wanted to try a sample code with PyPy, turns out, PyPy needed ~1:30-2:00 whereas CPython 2.7 needed ~13 seconds (the pure python implementation on both pythons was equivalent at ~8 minutes).

"Apparantly ctypes is really bad performance-wise, especially on PyPy. So I made a quick CFFI mockup: https://gist.github.com/Dav1dde/c509d472085f9374fc1d

Before:

CPython 2.7:
    python -m emfas.server size dumps/echoprint-dump-1.json
    11.89s user 0.36s system 98% cpu 12.390 total 

PYPY:
    python -m emfas.server size dumps/echoprint-dump-1.json
    117.19s user 2.36s system 99% cpu 1:59.95 total


After (CFFI):

CPython 2.7:
     python jsonsize.py ../dumps/echoprint-dump-1.json
     8.63s user 0.28s system 99% cpu 8.945 total 

PyPy:
     python jsonsize.py ../dumps/echoprint-dump-1.json
     4.04s user 0.34s system 99% cpu 4.392 total

"



Dav1dd goes into more detail in the issue itself, but we just want to emphasize a few significant points from this brief interchange:
  • His CFFI implementation is faster than the ctypes one even on CPython 2.7.
  • PyPy + CFFI is faster than CPython even when using C code to do the heavy parsing.
 The PyPy Team

Monday, June 1, 2015

PyPy 2.6.0 release

PyPy 2.6.0 - Cameo Charm

We’re pleased to announce PyPy 2.6.0, only two months after PyPy 2.5.1. We are particulary happy to update cffi to version 1.1, which makes the popular ctypes-alternative even easier to use, and to support the new vmprof statistical profiler.
You can download the PyPy 2.6.0 release here:
We would like to thank our donors for the continued support of the PyPy project, and for those who donate to our three sub-projects, as well as our volunteers and contributors.
Thanks also to Yury V. Zaytsev and David Wilson who recently started running nightly builds on Windows and MacOSX buildbots.
We’ve shown quite a bit of progress, but we’re slowly running out of funds. Please consider donating more, or even better convince your employer to donate, so we can finish those projects! The three sub-projects are:
  • Py3k (supporting Python 3.x): We have released a Python 3.2.5 compatible version we call PyPy3 2.4.0, and are working toward a Python 3.3 compatible version
  • STM (software transactional memory): We have released a first working version, and continue to try out new promising paths of achieving a fast multithreaded Python
  • NumPy which requires installation of our fork of upstream numpy, available on bitbucket
We would also like to encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on pypy, or general help with making RPython’s JIT even better. Nine new people contributed since the last release, you too could be one of them.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It’s fast (pypy and cpython 2.7.x performance comparison) due to its integrated tracing JIT compiler.
This release supports x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows, OpenBSD, freebsd), as well as newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux.
While we support 32 bit python on Windows, work on the native Windows 64 bit python is still stalling, we would welcome a volunteer to handle that. We also welcome developers with other operating systems or dynamic languages to see what RPython can do for them.

Highlights

  • Python compatibility:
    • Improve support for TLS 1.1 and 1.2
    • Windows downloads now package a pypyw.exe in addition to pypy.exe
    • Support for the PYTHONOPTIMIZE environment variable (impacting builtin’s __debug__ property)
    • Issues reported with our previous release were resolved after reports from users on our issue tracker at https://bitbucket.org/pypy/pypy/issues or on IRC at #pypy.
  • New features:
    • Add preliminary support for a new lightweight statistical profiler vmprof, which has been designed to accomodate profiling JITted code
  • Numpy:
    • Support for object dtype via a garbage collector hook
    • Support for .can_cast and .min_scalar_type as well as beginning a refactoring of the internal casting rules
    • Better support for subtypes, via the __array_interface__, __array_priority__, and __array_wrap__ methods (still a work-in-progress)
    • Better support for ndarray.flags
  • Performance improvements:
    • Slight improvement in frame sizes, improving some benchmarks
    • Internal refactoring and cleanups leading to improved JIT performance
    • Improved IO performance of zlib and bz2 modules
    • We continue to improve the JIT’s optimizations. Our benchmark suite is now over 7 times faster than cpython
Please try it out and let us know what you think. We welcome success stories, experiments, or benchmarks, we know you are using PyPy, please tell us about it!
Cheers
The PyPy Team