OWL BASIC produces its first executable
After a long haul, and diversions into other more important projects — including starting a family — OWL BASIC today produced its first executable. Its not much. In fact its hardly anything. Just 2048 bytes of Windows PE executable containing the global variable declarations from Acornsoft’s 1982 Sphinx Adventure. Each file of BASIC source code will be converted to a single .NET static class, with the global variables as private static fields.

The first executable produced from OWL BASIC.
Above you can see the executable loaded up into .NET Reflector, which can be used to introspect the executable, and in this case attempt to disassemble it into C#. Now we see what makes .NET such a great platform for compiler construction; below is the IronPython source code for the embryonic assembly generation function. It clocks in at fewer than ten lines of code to create an assembly, create a module, create a class, add one private static field to it for each global variable, and save the result as an .exe.
def generateAssembly(name, global_symbols):
domain = Thread.GetDomain()
assembly_name = AssemblyName(name)
assembly_builder = domain.DefineDynamicAssembly(assembly_name, AssemblyBuilderAccess.RunAndSave)
module_builder = assembly_builder.DefineDynamicModule(name + ".exe")
type_builder = module_builder.DefineType(name, TypeAttributes.Class | TypeAttributes.Public, object().GetType())
# Add global variables to the class
for symbol in global_symbols.symbols.values():
field_builder = type_builder.DefineField(symbol.name, ctsType(symbol),
FieldAttributes.Private | FieldAttributes.Static)
result = type_builder.CreateType()
assembly_builder.Save(name + ".exe")
where global_symbols is the global symbol table constructed during traversal of the Abstract Syntax Tree and the Control Flow Graph and the ctsType function maps OWL BASIC types to their equivalent Common Type System types for .NET. Everything else is provided by Reflection.Emit and other parts of .NET.
Its interesting that no validation was applied to the variable names supplied to Reflection.Emit. As you can see, the variable names still include the sigil suffixes for variable typing (e.g. $ for string) and Reflector happily dissassembles these into invalid C# identifiers. For the final version these names will need to be mangled (Hungarian notation?), or merely de-sigiled if no conflicts result, for compatibility with other .NET languages and tools.

If I may speculate about .NET (about which I know essentially nothing) based on my JVM knowledge (a tiny bit more than nothing): In the JVM ‘$’ is perfectly valid in an identifier, but not in Java the language (the compiler uses this to mangle inner class names). Possibly something similar is going on in .NET? In any case, I agree it would be embarrassing to have identifiers that one could not access from C#.
Congrats by the way. The first executable is always a good milestone to reach.
Hi Rob,
Unless you have done so already, do you intend to implement some or all of BBC BASIC for Windows’ features (not available in ARM BBC BASIC 5 and certainly not in 8-bit BBC BASIC II) such as structures, the ^ ‘address of’ operator, the function form of the DIM keyword DIM() which – amongst other things – returns the size of an array or structure. And will it include an inline IA-32/x86 assembler?
Regards,
David.
@David Williams
Yes, the intent is to cover the new syntax introduced by BBC BASIC for Windows, and much of that is in place already. Furthermore, there will be small OWL BASIC specific syntax extensions to improve interoperability with .NET. For example, there will be a new reference type which will hold object references, possibly using a tilde sigil, so A~ would be a reference to a .NET object. Also FN and PROC will need to allow .NET class and namespace qualifiers to allow calls on .NET methods, such as PROC System.Console.Writeline(A$) or theta = FN Math.Tanh(angle) or PROCmethod(object~, arg1%, arg2%). You’ll also notice that a space is permitted between the keyword and the identifier name to aid readability. I’ve also done some work on how BB4W notion of structs maps onto CLR value types (structs in C#), and I think fairly seamless interop is possible. Another difference is that strings will be Unicode.
Some BBC BASIC constructs, notably BB4W pointer syntax and also indirection operators may result in the generation of code which is required to be marked as ‘unsafe’ to the runtime. I’m finding it a fine line between implementing a language compiler and implementing a BBC Micro or RISC OS emulator. I’m trying to stay true to the language whilst still providing something that is in some sense useful and can run real programs. Right now I’m working on improving the type inferencing system to see if its possible to determine at compile time whether indirection operators index into a pre-allocated DIM block% 1024, which is safe, or whether they just manipulate some system memory directly, which is unsafe. One of my test programs is Acornsoft Sphinx adventure which seems to store data in the BBC MOS Econet workspace – to support cases such as think I’ll be supporting a compile-time flag or pragma which fakes a BBC Micro 32 kB (or larger!) memory map, which is peanuts these days. On a related note PAGE, HIMEM, TOP and LOMEM will probably return dummy but consistent values without much meaning, since I’m not reproducing the BBC BASIC stack or heap structures.
There definitely won’t be an inline assembler for native machine code; however, in the spirit of previous incarnations of BBC BASIC an assembler for the host system will be provided, which in this case will be the stack virtual machine code CIL. In practice assembly will be delegated to the Microsoft or Mono tools for that purpose. That said, if you need to call into native code I think I’ll have SYS doing the equivalent of a CLR P/Invoke. I’ve already had the compiler running on architectures other than IA-32 – its developed primarily on an amd64 computer but I’ve also had it running on my 64-bit Sunblade SPARC under Mono! It should, in theory also run unchanged on the ARM or PowerPC CLR implementations…
I’m glad to hear that the BBC Basic/CLR project is going forward and that you have decided to give it a name ‘Owl Basic’ which suggests you are not going to slavishly follow the road to exact compatability with the ARM interpreter.
If I read the last post correctly Owl Basic will permit a space between the keyword ‘proc’ and the procedure name, i.e. def proc DoThis, rather than def proc_DoThis. This is a welcome change and has one very interesting consequence.
I use BBC4W with an external editor, PSPad. This editor allows the use of user produced syntax files and the control over keyword colouring and font. As it operates on whole words it cannot colour the keywork ‘proc’ in proc_DoThis. Allowing a space after proc will allow syntax colouring to be used.
PSPad allows an external compiler (or interpreter) to be called so it will only be a matter of pointing it at the Owl Basic compiler.
If you are willing to go a little bit further in dropping complete compatibility with the ARM interpreter the way functions are defined and called would benefit from a bit of tidying up. In particular an ‘endfunc’ and a specific keyword to return a value from the function would bring it closer to modern ways of doing things. It might mean changing ‘return’ for variables passed by reference to ‘ref’.
What’s the ‘C’ stand for?
What’s the ‘C’ stand for?
‘Corporation’ surely? As in British Broadcasting Corporation BASIC.