RPython is not a Language: An introduction to the RPython language
The exact definition is “RPython is everything that our translation toolchain can accept” :)
The above quote is from the coding guidelines for RPython. RPython is not a typical language, in that it is not described by a syntax, but is defined by whether or not a tool chain can compile the code.
RPython is a subset of the Python language. That is, any RPython code can run in a Python interpreter. The difference is that you can compile RPython code, with the RPython tool-chain down to C code. So the advantage of RPython is speed after compilation, and the disadvantage is that you cannot use all of Python's features.
The borders and rules of RPython can be blurry, context dependent, and very difficult to understand. Teaching RPython by example can be very tricky as sometimes code will work, and sometimes it wont! Therefore, tutorials for RPython are very different from other languages, e.g. the vague rules expressed in the coding guidelines.
RPython is a language that is largely learnt by trial and error. I have only just scraped the surface, but I will post what I have learnt so far.
To demonstrate RPython I will give a few examples. The main code will be wrapped inside the main function, with other functions defined outside, i.e.:
#FUNCTIONS HERE def main(argv): #MAIN CODE HERE return 0 def target(*args): return main, None if __name__ == '__main__': import sys main(sys.argv)
To set up the RPython tool-chain to compile these examples, please look at my other post.
The first example shows the dependence on context the code can have:
#FUNCTIONS def add(x,y): return x + y #MAIN CODE print add(1,2)
This code is RPython code as it will compile and output the integer 3. However, if you add the code
add('Graham','Jenson'), it will fail to compile with a large stack trace including the error:
[translation:ERROR] In <FunctionGraph of (rtest:9)main at 0x103f7dc58>: [translation:ERROR] Happened at file rtest.py line 11 [translation:ERROR] [translation:ERROR] print add(1,2) [translation:ERROR] ==> print add('Graham','Jenson')
Now, if you delete the line
add(1,2) it will compile again. This is because RPython will generate statically typed functions, and if its input or return value are different at different parts of your code, then it will throw an error.
HINT 1: Remember the types of your functions
Many functions that are defined in the core Python may work in RPython, may not work, or may work with certain arguments. For example
'Graham Jenson'.split() is not RPython, but
'Graham Jenson'.split(' ') is.
HINT 2: Be careful when using previously defined functions
The addition to this hint is that many of Python's built-in functions that will not work, e.g.
The work-around to this is to redefine these functions yourself, there may be a RPython utility library somewhere with these defined.
HINT 3: Redefine built-in functions that you want to use
As an example, here is the function I use to read_lines from a file and return a list:
import os def read_lines(inputfile): f = os.open(inputfile, os.O_RDONLY, 0777) x = os.read(f,1) lines =  tmpstr = "" while x != '': if x == '\n': lines.append(tmpstr) tmpstr = "" else: tmpstr += x x = os.read(f,1) lines.append(tmpstr) f.close() return lines
Higher Order Functions
Lambdas in RPython seem not to work. However, you can still define and use higher-order functions (functions that take functions as parameters).
Here is an RPython example using a custom
#FUNCTIONS def map(fun,ls): nls =  for l in ls: nls.append(fun(l)) return nls def add_one(x): return x + 1 #MAIN CODE map(add_one, [1,2,3])
HINT 4: Don't use lambdas, use functions
Watch out when using operators as Python will attempt to call 'rich' methods that may not be defined e.g. for
== operator call the
#FUNCTION def hello() return 'hi' #MAIN CODE hello() == None
This will break with the error:
MissingRTypeOperation: unimplemented operation: 'eq' on (<CharRepr Char>, <NoneFrozenPBCRepr Void>)
You can fix it by replacing the
== is an
hello() is None.
Note: for some reason
'a' == None is RPython, does anyone know why.
HINT 5: Use explicit functions instead of operators
I found the biggest advantage to working in RPython is the fact that at any point you can execute your code in the python interpreter and see what happens. This means tools like the
assert statement can be used to help write valid Python code, but are ignored by the RPython compiler when optimisation is required.
My workflow quickly became
- Write and get it working in Python
- Compile with RPython tool-chain and fix any errors that occur
- Rinse and Repeat
HINT 6: Make it work in Python before worrying about making it work in RPython
SAT Solver: Case Study
Previously, I posted an article about RPython in which I compared the performance between interpreted python and compiled RPython when calculating Fibonacci numbers. I noted that this was a very basic experiment and further, more complicated examples were needed.
To compare Python against compiled RPython in a more complicated function I created an object oriented, conflict driven, learning SAT solver called SATRPy (SATrippy).
If you do not know what a Boolean Satisfiability (SAT) problem is, it is basically trying to answer the question
For a given Boolean formula, can you assign values to the variables to make the equation true?
If the answer is yes the problem is said to be satisfiable, if it is no then the problem is said to be unsatisfiable. A SAT solver is a function that tries to find if a problem is satisfiable or not. The SAT problem is most (in)famous as it is the first identified NP-Complete problem. I have studied SAT problems towards my research into Component Dependency Resolution and found the algorithms and heuristics around this domain very interesting.
RPython SAT Solver
I found it fun to implement a complex function using RPython . The lack of the dynamic aspects of Python can be worked around, and once you get used to not using them.
The biggest annoyance for me was the time it takes to compile the source. Generally, I would compile just to identify problems with the RPython code. This problem is a result of the lack of definition for the language. That is, to find out if my code is RPython, I have to ask RPython.
The most interesting RPython 'hack' I used to get SATRPy working involved the heap implementation I used. Previously when implementing a SAT solver I found the priority queue ( that is used to order literals in order to be chosen) to be incredibly slow. This was because I was using the built in priority queue that came in the Java standard library. To solve the issue I ended up borrowing the heap queue from SAT4J.
To get an efficient heap implementation I edited the heapq core implementation. First by removing the reference to the C implementation (I am going to compile the heap to C with RPython anyway). Then I changed the
cmp_lt function from
return (x < y) if hasattr(x, '__lt__') else (not y <= x)
to the RPython friendly
return x.heur() < y.heur()
heur() returns the value of the heuristic that is used to sort the items.
Experiment & Results
I ran my SAT solver against 200 (100 satisfiable, and 100 unsatisfiable) Uniform Random 3-SAT problems, each have 75 variables and 325 clauses. This set was taken from the SATLIB Benchmark suite.
In addition to testing my solver using both the pypy interpreter and after compiling with the RPython tool-chain, I tested it against the MiniSat solver.
To time the results I ran the solvers over all the files and timed them with the unix
time function. I used the
real time value as the measurement.
The results are:
pypy Interpreted Solver
- Satisiable problems: 259.177s
- Unsatisiable problems: 742.133s
Compiled RPython Solver
- Satisiable problems: 10.514s
- Unsatisiable problems: 59.729s
- Satisiable problems: 0.295s
- Unsatisiable problems: 0.441s
These results show that for satisfiable problems the Compiled RPython code is about 25 times faster than the interpreted code, and 12 times faster for the unsatisfiable problems. It also shows how much room for improvement there is for my algorithms and heuristics, as minisat was 35 times faster in satisfiable problems, and 150 times faster for unsatisfiable problems.
The more complicated the algorithm the more is gained by compiling to RPython. The simple calculation of Fibonacci numbers in is 12 times faster after compilation, but finding if a SAT problem is satisfiable is 25 times faster after compiled RPython code.
This experiment measured the gain of using RPython as opposed to interpreted python. Although we compared my solver to minisat, this is not a fair test as minisat uses many different optimised algorithms, heuristics, and data structures. A more interesting experiment would be to re-implement minisat directly into RPython. Then we could see what is lost when using RPython instead of coding directly in C.