Many business problems boil down to reading data from somewhere, transforming it, and writing it somewhere else.
We could implement the transformations in code, but non-technical users might like to see and edit the rules without having to deploy a new build.
Here’s an example rule:
- Read the value at
https://example.com/foo/bar/x
, refer to it asval
. - Return
10*(1.0/val)
.
Or, a more complicated rule:
- Read the value at
https://example.com/foo/bar/x
, refer to it asval_x
. - Read the value at
https://example.com/foo/bar/y
, refer to it asval_y
. - Return the average of
1.0/val_x
and1.0/val_y
.
The question is how to flatten this out into a format suitable for a CSV file, to allow users to easily update the rules using Excel.
One approach would be to implement an expression DSL, but this gets a bit painful when the input space is cells in a spreadsheet or CSV file. There are also questions about encoding the order of evaluation.
Reverse Polish notation is a simple way to encode arbitrary mathematical formulas in a flat sequence. Here’s how to write the first example:
CONSTANT 1
GET_DATA /foo/bar/x
DIV
CONSTANT 10
MUL
This sequence of operations will do the following:
- Push 1 onto the stack.
- Get data from the URL, push onto the stack.
- Perform the divide operation (1.0 divided by the value we got from the URL), and push that onto the stack.
- Push 10 onto the stack.
- Multiply the result from step 3 by 10.
We might use the following format in a CSV file. The ORDER column ensures the correct sequencing of operations.
ID | OUTPUT_ID | ORDER | OP | CONST | DATA_SOURCE |
---|---|---|---|---|---|
KEY001 | RULE001 | 0 | CONSTANT | 1 | |
KEY002 | RULE001 | 1 | GET_DATA | /foo/bar/x | |
KEY003 | RULE001 | 2 | DIV | ||
KEY004 | RULE001 | 3 | CONSTANT | 10 | |
KEY005 | RULE001 | 4 | MUL |
And here is how to encode the second example:
ID | OUTPUT_ID | ORDER | OP | CONST | DATA_SOURCE |
---|---|---|---|---|---|
KEY0200 | RULE002 | 0 | CONSTANT | 1 | |
KEY0201 | RULE002 | 1 | GET_DATA | /foo/bar/x | |
KEY0202 | RULE002 | 2 | DIV | ||
KEY0203 | RULE002 | 3 | CONSTANT | 1 | |
KEY0204 | RULE002 | 4 | GET_DATA | /foo/bar/y | |
KEY0205 | RULE002 | 5 | DIV | ||
KEY0206 | RULE002 | 6 | PLUS | ||
KEY0207 | RULE002 | 7 | CONSTANT | 2 | |
KEY0208 | RULE002 | 8 | DIV |
Evaluating a sequence of operations is straightforward. Start with an empty stack (in Python, this is just a normal list). If the next operation is CONSTANT or GET_DATA, push the value onto the stack. Otherwise, an operation like PLUS will need two operands, so pop two things off the stack and then do that actual operation. As a bonus, we can render a normal mathematical expression as we go: instead of putting a floating point number onto the stack, put a string onto the stack.
Here is the entire evaluator:
def eval_rule(rule):
s = []
expr = []
for (_, x) in rule.sort_values('ORDER').iterrows():
op = x['OP']
if op == 'GET_DATA':
s.append(x['GET_DATA'])
expr.append('(GET_DATA: ' + str(x['DATA_SOURCE']) + ')')
elif op == 'CONSTANT':
s.append(x['CONST'])
expr.append(str(x['CONST']))
elif op == 'MUL':
b = s.pop()
a = s.pop()
s.append(a*b)
b2 = expr.pop()
a2 = expr.pop()
expr.append('(' + a2 + '*' + b2 + ')')
elif op == 'PLUS':
b = s.pop()
a = s.pop()
s.append(a+b)
b2 = expr.pop()
a2 = expr.pop()
expr.append('(' + a2 + '+' + b2 + ')')
elif op == 'MINUS':
b = s.pop()
a = s.pop()
s.append(a-b)
b2 = expr.pop()
a2 = expr.pop()
expr.append('(' + a2 + '-' + b2 + ')')
elif op == 'DIV':
denominator = s.pop()
numerator = s.pop()
s.append(numerator/denominator)
denominator2 = expr.pop()
numerator2 = expr.pop()
expr.append('(' + numerator2 + '/' + denominator2 + ')')
else:
raise ValueError('Unknown operator: ' + op)
if len(s) != 1:
raise ValueError('Expected one item on the evaluation stack, but found: ' + str(s))
if len(expr) != 1:
raise ValueError('Expected one item on the expression stack, but found: ' + str(expr))
return s[0], expr[0]
Just for fun, we will also evaluate the example from the Wikipedia page on Reverse Polish notation:
ID | OUTPUT_ID | ORDER | OP | CONST | DATA_SOURCE |
---|---|---|---|---|---|
KEY0101 | RULE003 | 0 | CONSTANT | 15 | |
KEY0102 | RULE003 | 1 | CONSTANT | 7 | |
KEY0103 | RULE003 | 2 | CONSTANT | 1 | |
KEY0104 | RULE003 | 3 | CONSTANT | 1 | |
KEY0105 | RULE003 | 4 | PLUS | ||
KEY0106 | RULE003 | 5 | MINUS | ||
KEY0107 | RULE003 | 6 | DIV | ||
KEY0108 | RULE003 | 7 | CONSTANT | 3 | |
KEY0109 | RULE003 | 8 | MUL | ||
KEY0110 | RULE003 | 9 | CONSTANT | 2 | |
KEY0111 | RULE003 | 10 | CONSTANT | 1 | |
KEY0112 | RULE003 | 11 | CONSTANT | 1 | |
KEY0113 | RULE003 | 12 | PLUS | ||
KEY0114 | RULE003 | 13 | PLUS | ||
KEY0153 | RULE003 | 14 | MINUS |
Here’s the output:
$ python3 rpn.py
RULE001
((1.0/(GET_DATA: /foo/bar/x))*10.0)
0.23809523809523808
RULE002
(((1.0/(GET_DATA: /foo/bar/x))+(1.0/(GET_DATA: /foo/bar/y)))/2.0)
0.02857142857142857
RULE003
(((15.0/(7.0-(1.0+1.0)))*3.0)-(2.0+(1.0+1.0)))
5.0
Reverse Polish Notation gives us a compact way to represent a sequence of operators with no ambiguity about the order of evaluation.