An expression evaluator in CSV

Many business problems boil down to reading data from somewhere, transforming it, and writing it somewhere else.

We could implement the transformations in code, but non-technical users might like to see and edit the rules without having to deploy a new build.

Here’s an example rule:

  1. Read the value at http://example.com/foo/bar/x, refer to it as val.
  2. Return 10*(1.0/val).

Or, a more complicated rule:

  1. Read the value at http://example.com/foo/bar/x, refer to it as val_x.
  2. Read the value at http://example.com/foo/bar/y, refer to it as val_y.
  3. Return the average of 1.0/val_x and 1.0/val_y.

The question is how to flatten this out into a format suitable for a CSV file, to allow users to easily update the rules using Excel.

One approach would be to implement an expression DSL, but this gets a bit painful when the input space is cells in a spreadsheet or CSV file. There are also questions about encoding the order of evaluation.

Reverse Polish notation is a simple way to encode arbitrary mathematical formulas in a flat sequence. Here’s how to write the first example:

CONSTANT 1
GET_DATA /foo/bar/x
DIV
CONSTANT 10
MUL

This sequence of operations will do the following:

  1. Push 1 onto the stack.
  2. Get data from the URL, push onto the stack.
  3. Perform the divide operation (1.0 divided by the value we got from the URL), and push that onto the stack.
  4. Push 10 onto the stack.
  5. Multiply the result from step 3 by 10.

We might use the following format in a CSV file. The ORDER column ensures the correct sequencing of operations.

IDOUTPUT_IDORDEROPCONSTDATA_SOURCE
KEY0001RULE0010CONSTANT1 
KEY0002RULE0011GET_DATA /foo/bar/x
KEY0003RULE0012DIV  
KEY0004RULE0013CONSTANT10 
KEY0005RULE0014MUL  

And here is how to encode the second example:

IDOUTPUT_IDORDEROPCONSTDATA_SOURCE
KEY0200RULE0020CONSTANT1 
KEY0201RULE0021GET_DATA /foo/bar/x
KEY0202RULE0022DIV  
KEY0203RULE0023CONSTANT1 
KEY0204RULE0024GET_DATA /foo/bar/y
KEY0205RULE0025DIV  
KEY0206RULE0026PLUS  
KEY0207RULE0027CONSTANT2 
KEY0208RULE0028DIV  

Evaluating a sequence of operations is straightforward. Start with an empty stack (in Python, this is just a normal list). If the next operation is CONSTANT or GET_DATA, push the value onto the stack. Otherwise, an operation like PLUS will need two operands, so pop two things off the stack and then do that actual operation. As a bonus, we can render a normal mathematical expression as we go: instead of putting a floating point number onto the stack, put a string onto the stack.

Here is the entire evaluator:

def eval_rule(rule):
    s = []

    expr = []

    for (_, x) in rule.sort_values('ORDER').iterrows():
        op = x['OP']

        if op == 'GET_DATA':
            s.append(x['GET_DATA'])
            expr.append('(GET_DATA: ' + str(x['DATA_SOURCE']) + ')')

        elif op == 'CONSTANT':
            s.append(x['CONST'])
            expr.append(str(x['CONST']))

        elif op == 'MUL':
            b = s.pop()
            a = s.pop()
            s.append(a*b)

            b2 = expr.pop()
            a2 = expr.pop()
            expr.append('(' + a2 + '*' + b2 + ')')

        elif op == 'PLUS':
            b = s.pop()
            a = s.pop()
            s.append(a+b)

            b2 = expr.pop()
            a2 = expr.pop()
            expr.append('(' + a2 + '+' + b2 + ')')

        elif op == 'MINUS':
            b = s.pop()
            a = s.pop()
            s.append(a-b)

            b2 = expr.pop()
            a2 = expr.pop()
            expr.append('(' + a2 + '-' + b2 + ')')

        elif op == 'DIV':
            denominator = s.pop()
            numerator   = s.pop()
            s.append(numerator/denominator)

            denominator2 = expr.pop()
            numerator2   = expr.pop()
            expr.append('(' + numerator2 + '/' + denominator2 + ')')
        else:
            raise ValueError('Unknown operator: ' + op)

    if len(s) != 1:
        raise ValueError('Expected one item on the evaluation stack, but found: ' + str(s))

    if len(expr) != 1:
        raise ValueError('Expected one item on the expression stack, but found: ' + str(expr))

    return s[0], expr[0]

Just for fun, we will also evaluate the example from the Wikipedia page on Reverse Polish notation:

IDOUTPUT_IDORDEROPCONSTDATA_SOURCE
KEY0101RULE0030CONSTANT15 
KEY0102RULE0031CONSTANT7 
KEY0103RULE0032CONSTANT1 
KEY0104RULE0033CONSTANT1 
KEY0105RULE0034PLUS  
KEY0106RULE0035MINUS  
KEY0107RULE0036DIV  
KEY0108RULE0037CONSTANT3 
KEY0109RULE0038MUL  
KEY0110RULE0039CONSTANT2 
KEY0111RULE00310CONSTANT1 
KEY0112RULE00311CONSTANT1 
KEY0113RULE00312PLUS  
KEY0114RULE00313PLUS  
KEY0153RULE00314MINUS  

Here’s the output:

$ python3 rpn.py
RULE001
((1.0/(GET_DATA: /foo/bar/x))*10.0)
0.23809523809523808

RULE002
(((1.0/(GET_DATA: /foo/bar/x))+(1.0/(GET_DATA: /foo/bar/y)))/2.0)
0.02857142857142857

RULE003
(((15.0/(7.0-(1.0+1.0)))*3.0)-(2.0+(1.0+1.0)))
5.0

 

Reverse Polish Notation gives us a compact way to represent a sequence of operators with no ambiguity about the order of evaluation.

 

 

Leave a Reply