Many business problems boil down to reading data from somewhere, transforming it, and writing it somewhere else.

We could implement the transformations in code, but non-technical users might like to see and edit the rules without having to deploy a new build.

Here’s an example rule:

  1. Read the value at https://example.com/foo/bar/x, refer to it as val.
  2. Return 10*(1.0/val).

Or, a more complicated rule:

  1. Read the value at https://example.com/foo/bar/x, refer to it as val_x.
  2. Read the value at https://example.com/foo/bar/y, refer to it as val_y.
  3. Return the average of 1.0/val_x and 1.0/val_y.

The question is how to flatten this out into a format suitable for a CSV file, to allow users to easily update the rules using Excel.

One approach would be to implement an expression DSL, but this gets a bit painful when the input space is cells in a spreadsheet or CSV file. There are also questions about encoding the order of evaluation.

Reverse Polish notation is a simple way to encode arbitrary mathematical formulas in a flat sequence. Here’s how to write the first example:

CONSTANT 1
GET_DATA /foo/bar/x
DIV
CONSTANT 10
MUL

This sequence of operations will do the following:

  1. Push 1 onto the stack.
  2. Get data from the URL, push onto the stack.
  3. Perform the divide operation (1.0 divided by the value we got from the URL), and push that onto the stack.
  4. Push 10 onto the stack.
  5. Multiply the result from step 3 by 10.

We might use the following format in a CSV file. The ORDER column ensures the correct sequencing of operations.

IDOUTPUT_IDORDEROPCONSTDATA_SOURCE
KEY001RULE0010CONSTANT1
KEY002RULE0011GET_DATA/foo/bar/x
KEY003RULE0012DIV
KEY004RULE0013CONSTANT10
KEY005RULE0014MUL

And here is how to encode the second example:

IDOUTPUT_IDORDEROPCONSTDATA_SOURCE
KEY0200RULE0020CONSTANT1
KEY0201RULE0021GET_DATA/foo/bar/x
KEY0202RULE0022DIV
KEY0203RULE0023CONSTANT1
KEY0204RULE0024GET_DATA/foo/bar/y
KEY0205RULE0025DIV
KEY0206RULE0026PLUS
KEY0207RULE0027CONSTANT2
KEY0208RULE0028DIV

Evaluating a sequence of operations is straightforward. Start with an empty stack (in Python, this is just a normal list). If the next operation is CONSTANT or GET_DATA, push the value onto the stack. Otherwise, an operation like PLUS will need two operands, so pop two things off the stack and then do that actual operation. As a bonus, we can render a normal mathematical expression as we go: instead of putting a floating point number onto the stack, put a string onto the stack.

Here is the entire evaluator:

def eval_rule(rule):
    s = []

    expr = []

    for (_, x) in rule.sort_values('ORDER').iterrows():
        op = x['OP']

        if op == 'GET_DATA':
            s.append(x['GET_DATA'])
            expr.append('(GET_DATA: ' + str(x['DATA_SOURCE']) + ')')

        elif op == 'CONSTANT':
            s.append(x['CONST'])
            expr.append(str(x['CONST']))

        elif op == 'MUL':
            b = s.pop()
            a = s.pop()
            s.append(a*b)

            b2 = expr.pop()
            a2 = expr.pop()
            expr.append('(' + a2 + '*' + b2 + ')')

        elif op == 'PLUS':
            b = s.pop()
            a = s.pop()
            s.append(a+b)

            b2 = expr.pop()
            a2 = expr.pop()
            expr.append('(' + a2 + '+' + b2 + ')')

        elif op == 'MINUS':
            b = s.pop()
            a = s.pop()
            s.append(a-b)

            b2 = expr.pop()
            a2 = expr.pop()
            expr.append('(' + a2 + '-' + b2 + ')')

        elif op == 'DIV':
            denominator = s.pop()
            numerator   = s.pop()
            s.append(numerator/denominator)

            denominator2 = expr.pop()
            numerator2   = expr.pop()
            expr.append('(' + numerator2 + '/' + denominator2 + ')')
        else:
            raise ValueError('Unknown operator: ' + op)

    if len(s) != 1:
        raise ValueError('Expected one item on the evaluation stack, but found: ' + str(s))

    if len(expr) != 1:
        raise ValueError('Expected one item on the expression stack, but found: ' + str(expr))

    return s[0], expr[0]

Just for fun, we will also evaluate the example from the Wikipedia page on Reverse Polish notation:

IDOUTPUT_IDORDEROPCONSTDATA_SOURCE
KEY0101RULE0030CONSTANT15
KEY0102RULE0031CONSTANT7
KEY0103RULE0032CONSTANT1
KEY0104RULE0033CONSTANT1
KEY0105RULE0034PLUS
KEY0106RULE0035MINUS
KEY0107RULE0036DIV
KEY0108RULE0037CONSTANT3
KEY0109RULE0038MUL
KEY0110RULE0039CONSTANT2
KEY0111RULE00310CONSTANT1
KEY0112RULE00311CONSTANT1
KEY0113RULE00312PLUS
KEY0114RULE00313PLUS
KEY0153RULE00314MINUS

Here’s the output:

$ python3 rpn.py
RULE001
((1.0/(GET_DATA: /foo/bar/x))*10.0)
0.23809523809523808

RULE002
(((1.0/(GET_DATA: /foo/bar/x))+(1.0/(GET_DATA: /foo/bar/y)))/2.0)
0.02857142857142857

RULE003
(((15.0/(7.0-(1.0+1.0)))*3.0)-(2.0+(1.0+1.0)))
5.0

Reverse Polish Notation gives us a compact way to represent a sequence of operators with no ambiguity about the order of evaluation.