Home > IronPython, Jython, Python, computing, software > String compatibility between Python implementations

String compatibility between Python implementations

Erythromycin Online Buy Coumadin Penisole Online Buy Phentrimine Zelnorm Online Buy Elavil Flomax Online Buy Aldactone Avapro Online Buy Zelnorm

Jython and IronPython run on platforms where strings are unicode capable by default. Both implementations have chosen to make str essentially an alias for unicode in Python source code. The bytes type, introduced in PEP358 as part of transition to fully unicode Python 3.0, is unambiguously a sequence of single byte values. We can see in the table below that Jython and IronPython are caught between what is on the one hand most practical for interopability with existing code and their host platforms, and on the other hand the Right Thing as delivered by Python 3.0.

Jython 2.5 IronPython 2.6 CPython 2.6 CPython 3.0
str multibyte multibyte byte multibyte
unicode multibyte multibyte multibyte multibyte
bytes byte byte byte byte

It seems clear that if you need to write code that is portable between the different Python implementations you should steer clear str and use bytes and unicode to unambigiously express your intent.

Of course, this is impossible since the Python Standard Library is littered with uses of str. For example, in IronPython pickle.dumps() returns str just like Python 2.6 but the str is actually has multibyte storage. IronPython hides this well, but the abstraction can leak, resulting in much confusion. Again Python 3.0 does what is right, and pickle.dumps() returns a bytes instance.

These difficulties are most likely to occur when interfacing with native Java or .NET APIs that expect byte arrays, for example when pickling to database blobs.

In Jython an str instance can be converted to a Java byte array as follows.

>>> import jarray
>>> a = jarray.array("This is  string", 'b')
>>> a
array('b', [84, 104, 105, 115, 32, 105, 115, 32, 32, 115, 116, 114, 105, 110, 103])

The equivalent in IronPython, as provided by Michael Foord, being,

>>> from System import Array, Byte
>>> a = Array[Byte](tuple(Byte(ord(c)) for c in "This is a string"))
>>> a
Array[Byte]((<System.Byte object at 0x000000000000002B [84]>, <System.Byte object at 0x000000000000002C [104]>, <System.Byte object at 0x000000000000002D [105]>, <System.Byte object at 0x000000000000002E [115]>, <System.Byte object at 0x000000000000002F [32]>, <System.Byte object at 0x0000000000000030 [105]>, <System.Byte object at 0x0000000000000031 [115]>, <System.Byte object at 0x0000000000000032 [32]>, <System.Byte object at 0x0000000000000033 [97]>, <System.Byte object at 0x0000000000000034 [32]>, <System.Byte object at 0x0000000000000035 [115]>, <System.Byte object at 0x0000000000000036 [116]>, <System.Byte object at 0x0000000000000037 [114]>, <System.Byte object at 0x0000000000000038 [105]>, <System.Byte object at 0x0000000000000039 [110]>, <System.Byte object at 0x000000000000003A [103]>))

Going back we can use identical code in IronPython and Jython.

>>> s = ''.join(chr(c) for c in a)
>>> s
'This is a string'
  1. June 18th, 2009 at 20:40 | #1

    Please note that the equivalence between the str and unicode types is no longer the case with Jython 2.5. We changed this for compatibility with Python. Also, for Java integration we now convert java.lang.String to/from unicode.

    However, we still use String internally to represent str, which requires 2 bytes per actual byte used. This will almost certainly be changed soon to byte[], perhaps as early as 2.5.1, so as to improve performance and reduce memory overhead.

  2. June 19th, 2009 at 10:48 | #2

    The repr for jarray.array (in your example) suggests that array.array(’b') is compatible with jarray.array. In other words that the arrays you get out of the standard python module for making arrays (array) are in fact the same as native Java arrays. Which I was just thinking would be a nice feature for interoperability.

    Do you know if that is indeed the case? Can one pass an array.array(’b') to a Java function that expects a Java array?

  3. June 19th, 2009 at 18:31 | #3

    @David Jones:
    I’d recommend using the array module instead of jarray. The array module has two advantages: it’s standard Python, and it interoperates to/from Java array objects. Meanwhile, the jarray module is being phased out, if slowly.

  1. No trackbacks yet.