Archive

Archive for the ‘IronPython’ Category

Specification of rich comparison protocol use by the Python standard library

March 12th, 2011 Robert Smallshire No comments

I’m in the process of writing some code which ideally would work unchanged on CPython 2.x, CPython 3.x IronPython 2.x and Jython 2.x. Given the nature of the code I’m writing, that seems eminently possible with very few workarounds.

One of the changes between Python 2 and Python 3 was the removal of the cmp() function and its corresponding __cmp__() special method. Python 3 requires the use of the rich comparison operators __lt__(), __gt__(), __ge__(), __le__(), __eq__() and __ne__(). This change is well publicised and is fine so far as it goes; it’s usually no problem to modify the code accordingly. However, the same document goes on to say,

Use __lt__() for sorting, __eq__() with __hash__(), and other rich comparisons as needed.

This is clear guidance, if not a specification, that sort routines will use __lt__() for sorting. We don’t have to look much further for a statement closer to a specification though. In the Python Sorting HOWTO we can find the following clear declaration:

The sort routines are guaranteed to use __lt__() when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an __lt__() method.

Unfortunately, this fact is not mentioned in the documentation for list.sort() or the sorted() built-in, so it’s difficult to find. Even more unfortunately for me, the documentation for the standard library heapq module does not specify at all what protocols it requires of the objects it will be ordering.

Now all of this became an issue for me today when I implemented an important optimisation which required that I implement an incremental (i.e. lazy) partial sort rather than a complete eager sort of a sequence. Partial sorts can be efficiently implemented using heaps – hence my encounter with the heapq module. My migration from sort() to heapq worked fine on CPython 2.x and CPython 3.x but when I ran my tests with IronPython I got failures for some sorting results.

Soon I determined that the problem was that, as per the documentation for sorting, I’d implemented only __lt__() and __eq__() rather than the full set of six rich comparison operators. This worked fine with my original code and with my heapq based sort in CPython because those sort implementation do indeed only call __lt__() and __eq__(). However, the IronPython implementation of heapq also calls __gt__() and so failed. Note that IronPython hasn’t done anything wrong here – it’s following the letter of the canonical CPython documentation, if not the spirit. I’m not sure it’s fair to consider this a bug in IronPython – rather it exposes a small but important weakness in the CPython documentation.

There are perhaps two morals to this story: The first is to always implement all or nothing when in comes to the rich comparison operators just like your mother taught you. The second is that we’d avoid these sort of difficulties if the Python documentation was a little more explicit about what protocols are expected of objects by its algorithms and containers.

The existence of multiple Python implementations has been extremely effective at defining what it means to be Python as opposed to CPython. Many inconsistencies such as the one noted above have been flushed out over the years. For this reason alone I’m hesitant to suggest that the folks porting the standard library from CPython to IronPython or Jython should actually read the CPython implementations to ensure compatibility. I think it’s much preferable that such things are reimplemented from the documentation and test suite with consequential improvements for both.

Categories: computing, IronPython, Python, software Tags:

OWL BASIC produces its first executable

August 4th, 2009 Robert Smallshire 6 comments

After a long haul, and diversions into other more important projects — including starting a family — OWL BASIC today produced its first executable. Its not much. In fact its hardly anything. Just 2048 bytes of Windows PE executable containing the global variable declarations from Acornsoft’s 1982 Sphinx Adventure. Each file of BASIC source code will be converted to a single .NET static class, with the global variables as private static fields.

The first executable produced from OWL BASIC.

The first executable produced from OWL BASIC.

Above you can see the executable loaded up into .NET Reflector, which can be used to introspect the executable, and in this case attempt to disassemble it into C#. Now we see what makes .NET such a great platform for compiler construction; below is the IronPython source code for the embryonic assembly generation function. It clocks in at fewer than ten lines of code to create an assembly, create a module, create a class, add one private static field to it for each global variable, and save the result as an .exe.

def generateAssembly(name, global_symbols):
    domain = Thread.GetDomain()
    assembly_name = AssemblyName(name)
    assembly_builder = domain.DefineDynamicAssembly(assembly_name, AssemblyBuilderAccess.RunAndSave)
    module_builder = assembly_builder.DefineDynamicModule(name + ".exe")
    type_builder = module_builder.DefineType(name, TypeAttributes.Class | TypeAttributes.Public, object().GetType())

    # Add global variables to the class
    for symbol in global_symbols.symbols.values():
        field_builder = type_builder.DefineField(symbol.name, ctsType(symbol),
                                                 FieldAttributes.Private | FieldAttributes.Static)

    result = type_builder.CreateType()
    assembly_builder.Save(name + ".exe")

where global_symbols is the global symbol table constructed during traversal of the Abstract Syntax Tree and the Control Flow Graph and the ctsType function maps OWL BASIC types to their equivalent Common Type System types for .NET. Everything else is provided by Reflection.Emit and other parts of .NET.

Its interesting that no validation was applied to the variable names supplied to Reflection.Emit. As you can see, the variable names still include the sigil suffixes for variable typing (e.g. $ for string) and Reflector happily dissassembles these into invalid C# identifiers. For the final version these names will need to be mangled (Hungarian notation?), or merely de-sigiled if no conflicts result, for compatibility with other .NET languages and tools.

Categories: .NET, computing, IronPython, OWL BASIC, Python Tags:

String compatibility between Python implementations

June 18th, 2009 Robert Smallshire 3 comments

Jython and IronPython run on platforms where strings are unicode capable by default. Both implementations have chosen to make str essentially an alias for unicode in Python source code. The bytes type, introduced in PEP358 as part of transition to fully unicode Python 3.0, is unambiguously a sequence of single byte values. We can see in the table below that Jython and IronPython are caught between what is on the one hand most practical for interopability with existing code and their host platforms, and on the other hand the Right Thing as delivered by Python 3.0.

Jython 2.5 IronPython 2.6 CPython 2.6 CPython 3.0
str multibyte multibyte byte multibyte
unicode multibyte multibyte multibyte multibyte
bytes byte byte byte byte

It seems clear that if you need to write code that is portable between the different Python implementations you should steer clear str and use bytes and unicode to unambigiously express your intent.

Of course, this is impossible since the Python Standard Library is littered with uses of str. For example, in IronPython pickle.dumps() returns str just like Python 2.6 but the str is actually has multibyte storage. IronPython hides this well, but the abstraction can leak, resulting in much confusion. Again Python 3.0 does what is right, and pickle.dumps() returns a bytes instance.

These difficulties are most likely to occur when interfacing with native Java or .NET APIs that expect byte arrays, for example when pickling to database blobs.

In Jython an str instance can be converted to a Java byte array as follows.

>>> import jarray
>>> a = jarray.array("This is  string", 'b')
>>> a
array('b', [84, 104, 105, 115, 32, 105, 115, 32, 32, 115, 116, 114, 105, 110, 103])

The equivalent in IronPython, as provided by Michael Foord, being,

>>> from System import Array, Byte
>>> a = Array[Byte](tuple(Byte(ord(c)) for c in "This is a string"))
>>> a
Array[Byte]((<System.Byte object at 0x000000000000002B [84]>, <System.Byte object at 0x000000000000002C [104]>, <System.Byte object at 0x000000000000002D [105]>, <System.Byte object at 0x000000000000002E [115]>, <System.Byte object at 0x000000000000002F [32]>, <System.Byte object at 0x0000000000000030 [105]>, <System.Byte object at 0x0000000000000031 [115]>, <System.Byte object at 0x0000000000000032 [32]>, <System.Byte object at 0x0000000000000033 [97]>, <System.Byte object at 0x0000000000000034 [32]>, <System.Byte object at 0x0000000000000035 [115]>, <System.Byte object at 0x0000000000000036 [116]>, <System.Byte object at 0x0000000000000037 [114]>, <System.Byte object at 0x0000000000000038 [105]>, <System.Byte object at 0x0000000000000039 [110]>, <System.Byte object at 0x000000000000003A [103]>))

Going back we can use identical code in IronPython and Jython.

>>> s = ''.join(chr(c) for c in a)
>>> s
'This is a string'