Updated Application overview (markdown)

Bartłomiej Przemysław Pluta
2020-03-30 09:39:54 +02:00
parent 3ca8f7374e
commit bc1823e549

@@ -24,7 +24,7 @@ Apart from mentioned data, each token also includes some metadata, like location
You can check what tokens are produced for arbitrary input code using --tokens flag, for example:
```
smnp --tokens --dry-run -c "[1, 2, 3] as i ^ println(\"Current: \" + i.toString());"
$ smnp --tokens --dry-run -c "[1, 2, 3] as i ^ println(\"Current: \" + i.toString());"
size: 21
current: 0 -> (open_square, »[«, 1:1)
all: [(open_square, »[«, 1:1), (integer, »1«, 1:2), (comma, »,«, 1:3), (integer, »2«, 1:5), (comma, »,«, 1:6), (integer, »3«, 1:8), (close_square, »]«, 1:9), (as, »as«, 1:11), (identifier, »i«, 1:14), (caret, »^«, 1:16), (identifier, »println«, 1:18), (open_paren, »(«, 1:25), (string, »"Current: "«, 1:26), (plus, »+«, 1:38), (identifier, »i«, 1:40), (dot, ».«, 1:41), (identifier, »toString«, 1:42), (open_paren, »(«, 1:50), (close_paren, »)«, 1:51), (close_paren, »)«, 1:52), (semicolon, »;«, 1:53)]
@@ -32,7 +32,7 @@ all: [(open_square, »[«, 1:1), (integer, »1«, 1:2), (comma, »,«, 1:3), (in
Tokenizer tries to match input with all available patterns, sticking with rule first-match. That means if there is more than one patterns that match input, only first will be applied. This is why you can't for example name your variables or functions/methods with keywords. Take a look at the output of following command:
```
smnp --tokens --dry-run -c "function = 14;"
$ smnp --tokens --dry-run -c "function = 14;"
size: 4
current: 0 -> (function, »function«, 1:1)
all: [(function, »function«, 1:1), (assign, »=«, 1:10), (integer, »14«, 1:12), (semicolon, »;«, 1:14)]
@@ -40,3 +40,107 @@ all: [(function, »function«, 1:1), (assign, »=«, 1:10), (integer, »14«, 1:
The first token has type of `function`, not `identifier` which is expected for assignment operation.
All tokenizer-related code is located in `io.smnp.dsl.token` module.
# Parser
Parser is the next stage of code processing pipeline.
It takes input from tokenizer and tries to compose a tree (called
AST, which stands for **a**bstract **s**yntax **t**ree) basing on known rules, which are called *productions*.
As long as tokenizer defines language's *alphabet*, i.e. a set
of available terminals, parser defines *grammar* of that language.
It means that tokenizer can for example detect unknown character
or sequence of characters meanwhile parser is able to detect unknown
constructions built with known tokens.
A good example is the last snippet from [[Application overview#Tokenizer]] section:
```
$ smnp --tokens --dry-run -c "function = 14;"
size: 4
current: 0 -> (function, »function«, 1:1)
all: [(function, »function«, 1:1), (assign, »=«, 1:10), (integer, »14«, 1:12), (semicolon, »;«, 1:14)]
Syntax error
Source: <inline>
Position: line 1, column 10
Expected function/method name, got '='
```
You can see, that tokenizer has successfully done his job,
but parser throw a syntax error saying that it does not know
any production that could (directly or indirectly) match
`function assign integer semicolon` sequence.
You can check AST produced for arbitrary input code
using `--ast` flag, for example:
```
smnp --ast --dry-run -c "[1, 2, 3] as i ^ println(\"Current: \" + i.toString());"
RootNode 1:16
└─LoopNode 1:16
├─ListNode 1:1
│ ├─IntegerLiteralNode 1:2
│ │ └ (integer, »1«, 1:2)
│ ├─IntegerLiteralNode 1:5
│ │ └ (integer, »2«, 1:5)
│ └─IntegerLiteralNode 1:8
│ └ (integer, »3«, 1:8)
├─LoopParametersNode 1:14
│ └─IdentifierNode 1:14
│ └ (identifier, »i«, 1:14)
├─FunctionCallNode 1:18
│ ├─IdentifierNode 1:18
│ │ └ (identifier, »println«, 1:18)
│ └─FunctionCallArgumentsNode 1:25
│ └─SumOperatorNode 1:38
│ ├─StringLiteralNode 1:26
│ │ └ (string, »"Current: "«, 1:26)
│ ├─TokenNode 1:38
│ │ └ (plus, »+«, 1:38)
│ └─AccessOperatorNode 1:41
│ ├─IdentifierNode 1:40
│ │ └ (identifier, »i«, 1:40)
│ ├─TokenNode 1:41
│ │ └ (dot, ».«, 1:41)
│ └─FunctionCallNode 1:42
│ ├─IdentifierNode 1:42
│ │ └ (identifier, »toString«, 1:42)
│ └─FunctionCallArgumentsNode 1:50
└─NoneNode 0:0
```
Technically SMNP does have **LL(1)** parser implemented.
The acronym means:
* input is read from **L**eft to right
* parser produces a **L**eft-to-right derivation
* parser uses one lookahead token.
Even though this kind of parsers is treated as the least sophisticated, in most cases
they do the job and are enough even for more advanced use cases.
SMNP language parser has some fundamental helper function that provides
something like construction blocks that are used in right production
rules implementations. SMNP language parser actually is a combination
of sub-parsers that are able to parse subset of language.
For example `io.smnp.dsl.ast.parser.AtomParser` defines a parser related to parsing
atomic values, like literals and so on (note also that *expression* with parentheses
on both sides is treated like atom):
```kotlin
class AtomParser : Parser() {
override fun tryToParse(input: TokenList): ParserOutput {
val parenthesesParser = allOf(
terminal(TokenType.OPEN_PAREN),
ExpressionParser(),
terminal(TokenType.CLOSE_PAREN)
) { (_, expression) -> expression }
val literalParser = oneOf(
parenthesesParser,
ComplexIdentifierParser(),
StaffParser(),
ListParser(),
LiteralParser(),
MapParser()
)
return literalParser.parse(input)
}
}
```
In this example you can notice both `allOf()` and `oneOf()` helper methods. The first one returns success (and parsed node) if and only if all of its subparsers returns success as well. In contrast to that, the `oneOf()` method returns success with parsed node when any of its subparsers returns success. The `oneOf()` method seeks for the first parser that returns success. When it finds it, it immediately returns success with node returned from its subparser and does not execute further subparsers. Because the `oneOf()` method is only a proxy for other parsers, it does not need to do anything with returned nodes. In contrast to that, the `allOf()` method has to compose every node returned from its subparsers to new node. Thanks to that, we can obtain AST instead of CST (concrete syntax tree).