How to Build a Calculator
Calculator is a common tool for us to get the arithmetic expression’s result. In *NIX OS, once you type bc command, then you get into calculating environment. Feel free to input any legal arithmetic expressions.
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.This is free software with ABSOLUTELY NO WARRANTY.For details type `warranty'.
Let’s dive into what’s going on when you type string(aka arithmetic expression) then it feeds back the result. If you’re familiar with Python, it provides a interactive console which you can type any Python’s statements, including the above arithmetic expression.
Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 29 2018, 20:59:26)[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> 1 * 2 + 3 - 4 / 23>>>
Now that they’re legal Python code, it must be underwent the full compile phases. Here are they:
Exactly right, they’re all stages for calculate the arithmetic expression. In this article we’ll complete above from scratch in C# language (also netstandard2.0). And you will learn from this article.
- Abstract syntax tree (aka: AST)
- Pratt parsing method
- How to evaluate system
After installing Visual Studio Code, you could install C# plugins to help you write the better code. Now you can run command dotnet new console CalculatorCLI to create a console application. And open the folder with visual studio code. At the beginning the project, we just complete the prototype program that we want.
It has nothing mysterious in above pieces of code. Read from the input then write it back. And what’s your job? The lines between #16 to #21 will filled by calculator.
Take simplicity into consideration, our calculator only supports integer number, excluding float or complex number. And supports add(+), minus(-), multiply(*), division(/) and power (^) operators. Parenthesizes to group expression is also supported.
Token is the basic concept of compile technology. We re-create the source codes into a sequence of tokens. The Lexer will take the responsibility for it. First of all, how many kinds of token are in our calculator?
Nothing strange, except EOF and ILLEGAL, they are just talked above section.
Token class is quite simple, including two properties. Literal including the token literal value in the source code.
The next step is to build our Lexer, which just only includes one feature: NextToken. Every time we call NextToken, it returns a Token instance to represent the current source code.
Before we move forward, we should expect what we’ll get. Here are unit tests for the Lexer implementation, just like Test Drive Development (TDD). Use dotnet new nunit CalculatorTest command to create NUnit test framework then dotnet add CalculatorTest.csproj reference CalculatorCLI.csproj。
It chooses table-driven unit test pattern. The Lexer takes in a string input and invokes NextToken once a time to generate a token. Compare this token with expect one.
The Lexer has only two fields: input and position. The position points to the current character in the input. According to the pointing character, it returns the corresponding Token. It is pretty straight, isn’t it?
AST(Abstract Syntax Tree) is primary concept in the compile, also equivalent in our calculator. As we all know, various operators have their precedence respectively. 1 + 2 * 3 equals 7 , rather than 9 , because of higher priority of multiply operator. Internally, how will our calculator represents this expression?
Above illustration tells what the AST looks like. 2 * 3 expression seems has lower position. So this expression should be calculated firstly, which is we expected. Beside the original priorities of operators, the parentheses could *leverage* their ones. For example, now 1 + 2 * 3 result is 9. Here is the AST.
Here is the question:
How can we build such a AST according to the given expression ?
Firstly, Expression is abstract entity in the AST. Every part of the AST, even the whole is considered as an Expression. Let take above illustration for example.
- Leaf with 1: it represents the an integer expression. Here, we call it IntegerLiteralExpression;
- Left subtree, including leaves 1 and 2 with operator +: It also can be considered as Expression and now it is InfixExpression.
- The whole tree: It is still an Expression, even an InfixExpression. But this time, its left branch is another InfixExpression and the right branch is IntegerLiteralExpression.
Beside above two kinds expression, there is additional expression called PrefixExpression. -1 is an example. We think prefix - as a prefix of the IntegerLiteralExpression.
Time to go, let’s define AST’s components.
Everything is ready, we’re going to PARSE our input expression into AST’s expression. So many kinds of parser in our toolbox and we choose Pratt method, which was first invented by Vaughan Pratt's paper Top Down Operator Precedence. However, this article doesn’t cover the topic about why this method works but just follow how it works.
As mentioned above, every operator has its own precedence.
Almost every token is related to one or more parse functions, which depends on the position of token. For example, if - exists in the head of expression, it is considered as a PrefixExpression. In any other cases, it is InfixExpression.
ParseExpression is the key method. It takes Precedences parameter. Firstly, it try to parse out a prefix expression. In most cases, it’s IntegerLiteralExpression or a minus prefix. Then it compares current precedence with peek token’s precedences. If less than, so it means the right infix operator has lower position in the AST. And the parsed expression should act as like left branch in this infix expression rather than previous expression component.
Thanks to well defined precedences and related parse methods by the token type, the while statement can go through until meets the an EOF.
Here is the unit test for parser.
Now that we use parser to get an expression to represent the AST. It quite nature to get the value of this AST and this is called evaluation phrase. There is only one kind of operand (integer) so we just need just one entity in the object system.
Evaluation progress is quite straightforward. It accepts the expression which is generated by the parser and uses corresponding methods to handle the expression. Because expression may be nested, so we call the eval method recursively.
Well done, we finally get what we want after a loooong journey. We now hold the Object.Integer object to represent the result of input arithmetic expression. Let’s go back to our Main method to finish left work.
In the Main method, we take in the user input, create a lexer, pass to Parser ‘s constructor, call parse method to get the AST expression, use Eval static method to get result, and print the result’s Inspect output.
We create a basic calculator from nothing but language basic data structure. Enjoy it.