# An expression evaluator in CSV

Many business problems boil down to reading data from somewhere, transforming it, and writing it somewhere else.

We could implement the transformations in code, but non-technical users might like to see and edit the rules without having to deploy a new build.

Here’s an example rule:

- Read the value at
`<a href="http://example.com/foo/bar/x" rel="nofollow">http://example.com/foo/bar/x</a>`

, refer to it as`val`

. - Return
`10*(1.0/val)`

.

Or, a more complicated rule:

- Read the value at
`<a href="http://example.com/foo/bar/x" rel="nofollow">http://example.com/foo/bar/x</a>`

, refer to it as val_x. - Read the value at
`<a href="http://example.com/foo/bar/y" rel="nofollow">http://example.com/foo/bar/y</a>`

, refer to it as val_y. - Return the average of
`1.0/val_x`

and`1.0/val_y`

.

The question is how to flatten this out into a format suitable for a CSV file, to allow users to easily update the rules using Excel.

One approach would be to implement an expression DSL, but this gets a bit painful when the input space is cells in a spreadsheet or CSV file. There are also questions about encoding the order of evaluation.

Reverse Polish notation is a simple way to encode arbitrary mathematical formulas in a flat sequence. Here’s how to write the first example:

```
CONSTANT 1
GET_DATA /foo/bar/x
DIV
CONSTANT 10
MUL
```

This sequence of operations will do the following:

- Push 1 onto the stack.
- Get data from the URL, push onto the stack.
- Perform the divide operation (1.0 divided by the value we got from the URL), and push that onto the stack.
- Push 10 onto the stack.
- Multiply the result from step 3 by 10.

We might use the following format in a CSV file. The ORDER column ensures the correct sequencing of operations.

ID | OUTPUT_ID | ORDER | OP | CONST | DATA_SOURCE |

KEY0001 | RULE001 | CONSTANT | 1 | ||

KEY0002 | RULE001 | 1 | GET_DATA | /foo/bar/x | |

KEY0003 | RULE001 | 2 | DIV | ||

KEY0004 | RULE001 | 3 | CONSTANT | 10 | |

KEY0005 | RULE001 | 4 | MUL |

And here is how to encode the second example:

ID | OUTPUT_ID | ORDER | OP | CONST | DATA_SOURCE |

KEY0200 | RULE002 | CONSTANT | 1 | ||

KEY0201 | RULE002 | 1 | GET_DATA | /foo/bar/x | |

KEY0202 | RULE002 | 2 | DIV | ||

KEY0203 | RULE002 | 3 | CONSTANT | 1 | |

KEY0204 | RULE002 | 4 | GET_DATA | /foo/bar/y | |

KEY0205 | RULE002 | 5 | DIV | ||

KEY0206 | RULE002 | 6 | PLUS | ||

KEY0207 | RULE002 | 7 | CONSTANT | 2 | |

KEY0208 | RULE002 | 8 | DIV |

Evaluating a sequence of operations is straightforward. Start with an empty stack (in Python, this is just a normal list). If the next operation is CONSTANT or GET_DATA, push the value onto the stack. Otherwise, an operation like PLUS will need two operands, so pop two things off the stack and then do that actual operation. As a bonus, we can render a normal mathematical expression as we go: instead of putting a floating point number onto the stack, put a string onto the stack.

Here is the entire evaluator:

Just for fun, we will also evaluate the example from the Wikipedia page on Reverse Polish notation:

ID | OUTPUT_ID | ORDER | OP | CONST | DATA_SOURCE |

KEY0101 | RULE003 | CONSTANT | 15 | ||

KEY0102 | RULE003 | 1 | CONSTANT | 7 | |

KEY0103 | RULE003 | 2 | CONSTANT | 1 | |

KEY0104 | RULE003 | 3 | CONSTANT | 1 | |

KEY0105 | RULE003 | 4 | PLUS | ||

KEY0106 | RULE003 | 5 | MINUS | ||

KEY0107 | RULE003 | 6 | DIV | ||

KEY0108 | RULE003 | 7 | CONSTANT | 3 | |

KEY0109 | RULE003 | 8 | MUL | ||

KEY0110 | RULE003 | 9 | CONSTANT | 2 | |

KEY0111 | RULE003 | 10 | CONSTANT | 1 | |

KEY0112 | RULE003 | 11 | CONSTANT | 1 | |

KEY0113 | RULE003 | 12 | PLUS | ||

KEY0114 | RULE003 | 13 | PLUS | ||

KEY0153 | RULE003 | 14 | MINUS |

Here’s the output:

```
$ python3 rpn.py
RULE001
((1.0/(GET_DATA: /foo/bar/x))*10.0)
0.23809523809523808
RULE002
(((1.0/(GET_DATA: /foo/bar/x))+(1.0/(GET_DATA: /foo/bar/y)))/2.0)
0.02857142857142857
RULE003
(((15.0/(7.0-(1.0+1.0)))*3.0)-(2.0+(1.0+1.0)))
5.0
```

Reverse Polish Notation gives us a compact way to represent a sequence of operators with no ambiguity about the order of evaluation.