<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Sufficiently Small &#187; CLR</title>
	<atom:link href="http://www.smallshire.org.uk/sufficientlysmall/tag/clr/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.smallshire.org.uk/sufficientlysmall</link>
	<description>sin(x) = x</description>
	<lastBuildDate>Sun, 11 Apr 2010 19:36:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Control Flow Graph Linearisation in OWL BASIC</title>
		<link>http://www.smallshire.org.uk/sufficientlysmall/2010/02/14/control-flow-graph-linearisation-in-owl-basic/</link>
		<comments>http://www.smallshire.org.uk/sufficientlysmall/2010/02/14/control-flow-graph-linearisation-in-owl-basic/#comments</comments>
		<pubDate>Sun, 14 Feb 2010 19:27:22 +0000</pubDate>
		<dc:creator>Robert Smallshire</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[OWL BASIC]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[BBCBASIC]]></category>
		<category><![CDATA[CLR]]></category>

		<guid isPermaLink="false">http://www.smallshire.org.uk/sufficientlysmall/?p=450</guid>
		<description><![CDATA[To compile the code comprising an OWL BASIC procedure, function or main program into CIL, we must linearise the Control Flow Graph (CFG) representing the program statements.  The CFG undergoes many transformations during compilation, for example to eliminate unreachable code or convert GOSUB routines into named procedures.  Generation of CIL using Reflection.Emit requires [...]]]></description>
			<content:encoded><![CDATA[<p>To compile the code comprising an OWL BASIC procedure, function or main program into CIL, we must linearise the <a href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a> (CFG) representing the program statements.  The CFG undergoes many transformations during compilation, for example to eliminate unreachable code or convert <code>GOSUB</code> routines into named procedures.  Generation of CIL using Reflection.Emit requires that we can define branch targets in advance of generating branch instructions or marking the target instruction and of course we want to do this in a manner which minimises the number of branches required to represent the code.  The structure of the graph may be quite complex, especially for traditional BASIC spaghetti code which uses GOTO excessively rather than the more structured alternatives such as procedures and functions or the control structures introduced in BBC BASIC V .</p>
<p>Consider the following procedure from Sphinx Adventure.  It contains three loops, one on line 271 formed with a <code>GOTO</code> back to the start of the line, and two <code>REPEAT .. UNTIL</code> loops.</p>
<pre class="brush: bbc;">
266 DEF PROCL(L)
267 LOCAL I,J:CO=0:CN=0
268 IF L=1 THEN278
269 PRINT': RESTORE L: IF O?31&lt;&gt;0 THEN O?31=0:DW=1
270 READ R$,R$:R$=&quot;You are &quot;+R$
271 IF LEN(R$)+ POS&gt;CO-CN+39 THEN R$= FNS(R$,39+CO-CN):CO=CO+39: GOTO271
272 PRINT R$: IFL=136 OR L=15 THEN O?56=L
273 IF L=16 AND FL=1 THEN PRINT&quot;The walls are very hot!&quot; ELSE IF L=16 THEN PRINT &quot;The walls are steaming!&quot;
274 IF L&lt;&gt;3 AND L&lt;&gt;142 AND L&lt;&gt;143 THEN PROCEX(L): IF ABS(L-19)=1 AND CH=1 THEN PRINT ELSE IF ABS(L-42)=1 AND VO=1 THEN PRINT
275 IF CH=1 AND ABS(L-19)=1 THEN PROC R(22): PRINT &quot;chasm.&quot;:O?53=L
276 IF VO=1 AND ABS(L-42)=1 THEN PROCR(22): PRINT&quot;glacier.&quot;:O?53=L
277 IF L=26 OR L=27 THEN O?53=L
278 J=0:I=0:CO=0
279 REPEAT:J=J+1: IF O?J=L  THEN CO=CO+1
280 UNTIL J=52: IF CO=0 AND L=1 THEN PROCR(L): GOTO 284 ELSE IF CO=0 AND L&lt;&gt;1 THEN 284 ELSE PRINT:MAX=CO
281 IF L=1 THEN PROCR(3) ELSE PROCR(4)
282 CN=0:CO=MAX: REPEAT I=I+1: IF O?I=L  THEN PROCOT(I,CO):CO=CO-1
283 UNTIL I=52
284 IF D&lt;&gt;0 THEN O?31=L
285 IF CF=1 AND L=94 THEN PRINT'&quot;The casket is open.&quot;
286 IF L=24 AND SA=1 THEN PRINT'&quot;The safe door is open.&quot;
287 PRINT: ENDPROC
</pre>
<p>The CFG for this code is shown below.  Each program statement is shown as a purple box, with control flow to the following statement(s). Conditionals are shown in diamond boxes.  The numbers in each purple box are source line numbers, where known.</p>
<p>Careful comparison of the source above and the diagram below will reveal some of the transformations that have been applied to the program; for example, <code>READ R$,R$</code> on line 270 has been transformed into two consecutive assignment statements which actually take the form <code>R$ = READ()</code> where <code>READ()</code> is a function not available in the source language.</p>
<p>The <i>statement</i> level CFG has been analysed to identify <a href="http://en.wikipedia.org/wiki/Basic_block">basic-blocks</a>, shown as yellow group nodes, thereby defining a higher level <i>basic-block</i> level CFG.  Each basic block has only one entry point statement; none of the statement within the basic block are destinations of other jump instructions. Furthermore, each block has only one exit point.</p>
<p><i>More text follows this long diagram&#8230;</i></p>
<p><img src="/images/Sphinx_basic_blocks.png" alt="Control Flow Graph for PROC L in Sphinx Adventure" /></p>
<p>Generating the CIL code for a single basic block is straightforward enough &#8211; we can simply iterate through the statements comprising the basic block in order and generate the code for each in turn.  However, there are many possible orders in which the code for the basic block themselves could be representing in the CIL, since we can branch from the end of any block to the next block, although of course we must start at the <i>entry block</i> for the procedure. Although any order starting with the entry block can be made to work, where possible we would like program control to flow smoothly from the end of a block to one of its successors without requiring a branch.</p>
<p>At first sight, some sort of topological ordering would seem to be appropriate, but a topological ordering is only well defined for a <a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">directed acyclic graph</a> (DAG), and a DAG this program is not.  The key to this conundrum is to reduce the directed graph to a DAG by identifying <a href="http://en.wikipedia.org/wiki/Strongly_connected_components">strongly connected components</a>.  By contracting each SCC to a single node we obtain what is called the <i>condensation</i> of the CFG which <i>will</i> be a DAG. To the resulting DAG we can apply a topological ordering. The ordering of vertices with each SCC is chosen by starting at the vertex with the greatest <a href="http://en.wikipedia.org/wiki/Indegree#Indegree_and_outdegree">in-degree</a>.</p>
<p>In order to identify and contract the SCCs we use an implementation of <a href="http://en.wikipedia.org/wiki/Tarjan's_strongly_connected_components_algorithm">Tarjan&#8217;s algorithm</a> during depth first traversal of the CFG.  The reverse post ordering of the primary depth first traversal is used to generate the topological ordering of the condensed CFG.</p>
<p>The resulting ordering of basic blocks is shown in the diagram by the numeric labels to the top-left of each. This will be the order in which the CIL code for them is generated, and it can be seen that in about half of the cases, fall through from one block to the next (consecutive block numbers) without explicit branching can be exploited.  Future optimisations will focus on further simplifying the generated code by removing vertices, such as block 31, which contain only jumps.</p>
<hr/>Copyright &copy; 2010 <strong><a href="http://www.smallshire.org.uk/sufficientlysmall">Sufficiently Small</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact legal@smallshire.org.uk so we can take legal action immediately.<br/><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://www.smallshire.org.uk/sufficientlysmall/2010/02/14/control-flow-graph-linearisation-in-owl-basic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IronPython 2.0 and Jython 2.5 performance compared to Python 2.5</title>
		<link>http://www.smallshire.org.uk/sufficientlysmall/2009/05/22/ironpython-2-0-and-jython-2-5-performance-compared-to-python-2-5/</link>
		<comments>http://www.smallshire.org.uk/sufficientlysmall/2009/05/22/ironpython-2-0-and-jython-2-5-performance-compared-to-python-2-5/#comments</comments>
		<pubDate>Fri, 22 May 2009 11:34:28 +0000</pubDate>
		<dc:creator>Robert Smallshire</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[IronPython]]></category>
		<category><![CDATA[Jython]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[BBCBASIC]]></category>
		<category><![CDATA[CLR]]></category>
		<category><![CDATA[dotNET]]></category>

		<guid isPermaLink="false">http://www.smallshire.org.uk/sufficientlysmall/?p=118</guid>
		<description><![CDATA[IronPython 2.0 can be hundreds of times slower than CPython on some microbenchmarks.  Jython 2.5 can scale better than CPython on those same benchmarks.]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://www.smallshire.org.uk/sufficientlysmall/2009/05/17/the-performance-of-python-jython-and-ironpython/">previous post</a> covering the performance problems I&#8217;ve been experiencing with IronPython raised some questions about whether the low performance was an effect peculiar to my system, or to my program &#8212; the <a href="http://www.smallshire.org.uk/sufficientlysmall/2007/06/10/writing-a-bbc-basic-compiler-for-the-clr/">OWL BASIC</a> compiler &#8212; where the problem was first noticed. To briefly recap, I&#8217;d determined that IronPython was around 100× slower that CPython on the same program.</p>
<p>Since then, I&#8217;ve had time to reproduce the results with a small and completely unremarkable Python program, and also to run the tests on a different system. I had suspected that in the OWL BASIC compiler, my Python visitor implementation, which is used in applying transformations to the abstract syntax tree, was to blame. I set about condensing a tree visitor down to a small example, but I never got that far.  It is sufficient to simply <i>build</i> a large binary tree to demonstrate the dramatic differences in the performance characteristics of the three main Python implementations.</p>
<h2>The benchmark</h2>
<p>Here is that test program, which just builds a simple binary tree of objects to the requested depth.</p>
<pre class="brush: python;">
class Node(object):
    counter = 0

    def __init__(self, children):
        Node.counter += 1
        self._children = children

def make_tree(depth):
    if depth &gt; 1:
        return Node ([make_tree(depth - 1), make_tree(depth - 1)])
    else:
        return Node([])

def main(argv=None):
    if argv is None:
        argv = sys.argv
    depth = int(argv[1]) if len(argv) &gt; 1 else 10

    root = make_tree(depth)
    print Node.counter
    return 0

if __name__ == '__main__':
    import sys
    sys.exit(main())
</pre>
<p>The program builds a binary tree to the depth supplied as the only command line argument, or ten if one is not supplied. It counts the number of nodes as they a built. Remember that the merits or otherwise of this program are not the point! The point is the performance difference between the Python implementations when it is run.</p>
<p>My benchmarking approach has been to run this script five times for each tree depth from a depth of one, upwards to 22, or until my patience was exhausted.  I&#8217;ve taken the minimum time from each run of five. Since there is a non-linear relationship between the depth of the tree and the number of nodes contained therein, logarithmic axes are used in all the charts that follow.</p>
<h2>64 bit Windows Vista x64</h2>
<p>Here are the results for the first test machine &#8211; with dual quad-core 1.86 GHz Xeons with 4 GB RAM running Vista x64, testing IronPython 2.0.0.0 on .NET 2.0, Jython 2.5rc2 on Java Hotspot 1.6.0 and Python 2.5.2.</p>
<div class="wp-caption alignnone" style="width: 610px"><img alt="Create time for a binary tree including Python virtual machine startup on Windows Vista x64 with 1.86 GHz Xeon processors." src="/sufficientlysmall/wp-content/ipy_performance/tree_x64_inclusive.png" title="Binary tree creation on x64" width="600" height="450" /><p class="wp-caption-text">Figure 1. Creation time for a binary tree including Python virtual machine startup on Windows Vista x64 with 1.86 GHz Xeon processors.</p></div>
<p>In Figure 1 we see that above 1000 nodes or so (tree depth of 10) performance for IronPython begin to degrade rapidly. CPython holds out for another two orders of magnitude before the significant costs begin to be felt . Its interesting to see that although Jython is in the middle of the pack, it scales much better than CPython, surpassing it at around half-a-million nodes (tree depth of 19).</p>
<p>In my application &#8212; a compiler &#8212; virtual machine (VM) start-up time is important; however, in many long-running applications this is not the case, so it is interesting to subtract VM start-up time from each series, which we see in Figure 2, below.</p>
<div class="wp-caption alignnone" style="width: 610px"><img alt="By subtracting VM start-up time, we get a picture more interesting for long-running processes." src="/sufficientlysmall/wp-content/ipy_performance/tree_x64_exclusive.png" title="Execution time excluding VM start-up, on Vista x64 with 1.87 GHz Xeon processors" width="600" height="450" /><p class="wp-caption-text">By subtracting VM start-up time, we get a picture more interesting for long-running processes.</p></div>
<p>Below 100 tree nodes, there is a lot of noise in these measurements. Above 100 nodes its easy to see that the blue IronPython curve is at least two chart divisions above the red CPython curve &#8212; that&#8217;s two orders of magnitude or 100× slower, and getting relatively worse as the size of the tree increases.</p>
<h2>32 bit Windows XP x86</h2>
<p>Responses to my earlier article suggested that trying IronPython 2.0.1 with Ngen&#8217;ed binaries on x86 may make a difference.  Well, to cut a long story short, it doesn&#8217;t, but here are the details.   These tests were run on a 900 MHz Pentium M Centrino laptop with 768 MB RAM, and so cannot be directly compared with those above, although its notable that a one year old workstation is only twice as fast as a five year old laptop.  Moore&#8217;s law certainly isn&#8217;t delivering here!</p>
<div class="wp-caption alignnone" style="width: 610px"><img alt="The performance profiles are very similar with IronPython 2.0.1 on x86." src="/sufficientlysmall/wp-content/ipy_performance/tree_x86_exclusive.png" title="Performance for building a binary tree on a 900 MHz Pentium M." width="600" height="450" /><p class="wp-caption-text">The performance profiles are very simular with IronPython 2.0.1 on x86.</p></div>
<p>On x86, IronPython is still 100× slower than CPython, and Jython still scales better.  It seems the essence of this benchmark is not dependent on which hardware or CLR platform it is run.</p>
<p>I&#8217;ll close by re-presenting the data in the x86 benchmarks as multiples of CPython performance, because it dramatically demonstrates the different responses to the scale of the problem size for IronPython and Jython. Again we see Jython catching up with CPython at a tree depth of 19, just we saw on x64. and IronPython delivering 6000× worse than CPython at a tree depth depth of 15. A tree of this size with thirty-thousand nodes is very similar in scale to the AST tree sizes found in the OWL BASIC during compilation of large programs.</p>
<div class="wp-caption alignnone" style="width: 610px"><img alt="Performance of IronPython and Jython as multiples of CPython performance." src="/sufficientlysmall/wp-content/ipy_performance/tree_x86_relative.png" title="Performance of IronPython and Jython as multiples of CPython performance." width="600" height="450" /><p class="wp-caption-text">Performance of IronPython and Jython as multiples of CPython performance.</p></div>
<h2>Conclusions</h2>
<ul>
<li>
IronPython can be <strong>very</strong> slow, even on programs in the microbenchmark category, which are doing standard operations such as building trees. Presumably there are still significant optimizations to be made in IronPython to bring its performance closer to that of the other Python implementations.  Hopefully, this example and the measurements can contribute to that improvement.
</li>
<li>
Jython may scale better than Python if your application exercises Python in similar ways to this benchmark.  Speculatively, that <i>could</i> have implications for projects such as <a href="http://www.scons.org/">SCons</a>, which build large in-memory graphs.
</li>
<li>I suppose if nothing else we have demonstrated in passing that Java <i>can</i> be faster than C for some non-trivial programs (like a Python interpreter) running a trivial program, like this benchmark.</li>
</ul>
<hr/>Copyright &copy; 2010 <strong><a href="http://www.smallshire.org.uk/sufficientlysmall">Sufficiently Small</a></strong>. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact legal@smallshire.org.uk so we can take legal action immediately.<br/><span style="float: right;font-size: 7pt"><a href="http://blog.taragana.com/index.php/archive/wordpress-plugins-provided-by-taraganacom/">Plugin</a> by <a href="http://www.taragana.com/">Taragana</a></span>]]></content:encoded>
			<wfw:commentRss>http://www.smallshire.org.uk/sufficientlysmall/2009/05/22/ironpython-2-0-and-jython-2-5-performance-compared-to-python-2-5/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
		</item>
	</channel>
</rss>
