IronPython hammers CPython when not mutating class attributes
Earlier today I posted the second article in what is turning out to be a short series in the investigation into why the performance of IronPython is around 100× slower than CPython, when running the front-end of my OWL BASIC compiler.
The most informative comment was from Curt Hagenlocher who works on IronPython in the Visual Studio Managed Languages group at Microsoft.
Curt suggested,
Try storing the counter as a global variable instead of a class-level member of Node — I think you’ll notice a dramatic improvement.
The modified benchmark program looks like this:
counter = 0
class Node(object):
def __init__(self, children):
global counter
counter += 1
self._children = children
def make_tree(depth):
if depth > 1:
return Node ([make_tree(depth - 1), make_tree(depth - 1)])
else:
return Node([])
def main(argv=None):
global counter
if argv is None:
argv = sys.argv
depth = int(argv[1]) if len(argv) > 1 else 10
root = make_tree(depth)
print counter
return 0
if __name__ == '__main__':
import sys
sys.exit(main())
A dramatic improvement!
Well, Curt wasn’t wrong. This made a phenomenal difference with IronPython completing in only 12% of the time taken by CPython – over 8× faster with a binary tree depth of 20.
Let’s look in detail at the results. All results are from a dual quad-core 1.86 GHz Xeon with 4 GB RAM, and as before each benchmark was run five times, and the shortest time of the five taken.
The three test environments are:
- Python 2.5.2 x86 32-bit
- Jython 2.5rc2 on Java Hotspot 1.6 32-bit
- IronPython 2.0 on .NET 2.0 x64

Figure 1. The relative performance of the three main Python implementations on a benchmark that uses a global counter, rather than mutating a class attribute.
Here we can see how IronPython’s performance has been improved hugely by this simple change. Although startup time dominates for the smaller problem size, now both Jython and IronPython surpass CPython at around half-a-million nodes.
Removing start-up time, which may be irrelevant for long-running processes, gives us the following chart:

Figure 2. The performance of the three main Python implementations excluding start-up time.
Again there is a lot of noise in the data below 1000 nodes, but it is clear that Jython scales better than IronPython, which in turn is scaling better than CPython.
Up until now I’ve been using a log-log scale in the charts because of the wide variation in performance between the different implementations, but now the performance gap is much closer, it’s difficult to get a sense of just how much faster IronPython is on the modified benchmark. Let’s throw in a log-linear plot to help us appreciate what’s going on:

Figure 3. A linear representation of the same data as in Figure 2, to highlight the performance multiple between IronPython and CPython in the larger tests.
It’s perhaps easier to see now that IronPython is doing in 14 seconds what takes CPython 114 seconds to achieve!
Finally, let’s plot those results as we did before, as multiples of CPython performance:

Figure 4. Execution time of Jython and IronPython as multiples of CPython performance.
It is easy to see that in this chart, once we pass half-a-million tree nodes (a tree depth of 19) that both Jython and IronPython are significantly beating CPython.
Explanation
Curt Hagenlocher offers the explanation in a comment in the thread on Reddit.
In this particular case, IronPython is slow because of the update to Node.counter. Currently, any update to a class will increment the version number for the class, which will have the effect of invalidating any rules compiled for that class. Effectively, the same rules are getting compiled over and over again. Moving the counter to a global should result in performance on par with that of CPython.
which is absolutely correct, except that he’s underselling the relative gain. IronPython is not only on a par with CPython, it can outperform it by a factor of eight!
With this knowledge in hand, I can now approach optimization of my OWL BASIC compiler, which lies back at the start of this illuminating tale.
Conclusion
- Avoiding mutation of Python class attributes can have significant benefits for IronPython performance.
- Both IronPython and Jython scale better than CPython by this benchmark, and have superior performance for large trees of nodes.







Recent Comments