bartek/smnp-kt

Fork 0

Table of Contents

Architecture overview
Interpreter

Tokenizer

Parser

Parsers' cascade

Evaluator
Interpreter

Modules Management System

Architecture overview

The system is composed of following components (which are technically Gradle subprojects) :

core - the SMNP language engine consisting of interpreter (being actually a facade for tokenizer, parser and evaluator), as well as the modules' management system
app - the commandline-based frontend for core component
modules (smnp.lang, smnp.io, smnp.audio.synth etc.) - a set of external modules that extends the functionality of SMNP scripts
api - component that provides shared interfaces and abstract classes common for both core and each module components.

Interpreter

SMNP language interpreter is a facade of three parts composed to pipeline:

tokenizer (or lexer)
parser
evaluator

All of these components participate in processing and executing passed code, producing output that can be consumed by next component.

Tokenizer

Tokenizer is the first component in code processing pipeline. Input code is directly passed to tokenizer which splits it to several pieces called tokens. Each token contains of main properties, such as value and related token type, for example:

the "Hello, world!" is token with value Hello, world! and token type of STRING
the abc123 is token with value abc123 and token type of IDENTIFIER

Apart from mentioned data, each token also includes some metadata, like location including column, line and source name (file name or module name).

You can check what tokens are produced for arbitrary input code using --tokens flag, for example:

$ smnp --tokens --dry-run -c "[1, 2, 3] as i ^ println(\"Current: \" + i.toString());"
size: 21
current: 0 -> (open_square, »[«, 1:1)
all: [(open_square, »[«, 1:1), (integer, »1«, 1:2), (comma, »,«, 1:3), (integer, »2«, 1:5), (comma, »,«, 1:6), (integer, »3«, 1:8), (close_square, »]«, 1:9), (as, »as«, 1:11), (identifier, »i«, 1:14), (caret, »^«, 1:16), (identifier, »println«, 1:18), (open_paren, »(«, 1:25), (string, »"Current: "«, 1:26), (plus, »+«, 1:38), (identifier, »i«, 1:40), (dot, ».«, 1:41), (identifier, »toString«, 1:42), (open_paren, »(«, 1:50), (close_paren, »)«, 1:51), (close_paren, »)«, 1:52), (semicolon, »;«, 1:53)]

Tokenizer tries to match input with all available patterns, sticking with rule first-match. That means if there is more than one patterns that match input, only first will be applied. This is why you can't for example name your variables or functions/methods with keywords. Take a look at the output of following command:

$ smnp --tokens --dry-run -c "function = 14;" 
size: 4
current: 0 -> (function, »function«, 1:1)
all: [(function, »function«, 1:1), (assign, »=«, 1:10), (integer, »14«, 1:12), (semicolon, »;«, 1:14)]

The first token has type of function, not identifier which is expected for assignment operation.

All tokenizer-related code is located in io.smnp.dsl.token module.

Parser

Parser is the next stage of code processing pipeline. It takes input from tokenizer and tries to compose a tree (called AST, which stands for abstract syntax tree) basing on known rules, which are called productions. As long as tokenizer defines language's alphabet, i.e. a set of available terminals, parser defines grammar of that language. It means that tokenizer can for example detect unknown character or sequence of characters meanwhile parser is able to detect unknown constructions built with known tokens.

A good example is the last snippet from Application overview#Tokenizer section:

$ smnp --tokens --dry-run -c "function = 14;"
size: 4
current: 0 -> (function, »function«, 1:1)
all: [(function, »function«, 1:1), (assign, »=«, 1:10), (integer, »14«, 1:12), (semicolon, »;«, 1:14)]
Syntax error
Source: <inline>
Position: line 1, column 10

Expected function/method name, got '='

You can see, that tokenizer has successfully done his job, but parser throw a syntax error saying that it does not know any production that could (directly or indirectly) match function assign integer semicolon sequence.

You can check AST produced for arbitrary input code using --ast flag, for example:

smnp --ast --dry-run -c "[1, 2, 3] as i ^ println(\"Current: \" + i.toString());"
RootNode 1:16
   └─LoopNode 1:16
      ├─ListNode 1:1
      │  ├─IntegerLiteralNode 1:2
      │  │  └ (integer, »1«, 1:2)
      │  ├─IntegerLiteralNode 1:5
      │  │  └ (integer, »2«, 1:5)
      │  └─IntegerLiteralNode 1:8
      │     └ (integer, »3«, 1:8)
      ├─LoopParametersNode 1:14
      │  └─IdentifierNode 1:14
      │     └ (identifier, »i«, 1:14)
      ├─FunctionCallNode 1:18
      │  ├─IdentifierNode 1:18
      │  │  └ (identifier, »println«, 1:18)
      │  └─FunctionCallArgumentsNode 1:25
      │     └─SumOperatorNode 1:38
      │        ├─StringLiteralNode 1:26
      │        │  └ (string, »"Current: "«, 1:26)
      │        ├─TokenNode 1:38
      │        │  └ (plus, »+«, 1:38)
      │        └─AccessOperatorNode 1:41
      │           ├─IdentifierNode 1:40
      │           │  └ (identifier, »i«, 1:40)
      │           ├─TokenNode 1:41
      │           │  └ (dot, ».«, 1:41)
      │           └─FunctionCallNode 1:42
      │              ├─IdentifierNode 1:42
      │              │  └ (identifier, »toString«, 1:42)
      │              └─FunctionCallArgumentsNode 1:50
      └─NoneNode 0:0

Technically SMNP does have LL(1) parser implemented. The acronym means:

input is read from Left to right
parser produces a Left-to-right derivation
parser uses one lookahead token. Even though this kind of parsers is treated as the least sophisticated, in most cases they do the job and are enough even for more advanced use cases.

SMNP language parser has some fundamental helper function that provides something like construction blocks that are used in right production rules implementations. SMNP language parser actually is a combination of sub-parsers that are able to parse subset of language.

For example io.smnp.dsl.ast.parser.AtomParser defines a parser related to parsing atomic values, like literals and so on (note also that expression with parentheses on both sides is treated like atom):

class AtomParser : Parser() {
    override fun tryToParse(input: TokenList): ParserOutput {
        val parenthesesParser = allOf(
            terminal(TokenType.OPEN_PAREN),
            ExpressionParser(),
            terminal(TokenType.CLOSE_PAREN)
        ) { (_, expression) -> expression }

        return oneOf(
            parenthesesParser,
            ComplexIdentifierParser(),
            StaffParser(),
            ListParser(),
            LiteralParser(),
            MapParser()
        ).parse(input)
    }
}

In this example you can notice both allOf() and oneOf() helper methods. The first one returns success (and parsed node) if and only if all of its subparsers returns success as well. In contrast to that, the oneOf() method returns success with parsed node when any of its subparsers returns success. The oneOf() method seeks for the first parser that returns success. When it finds it, it immediately returns success with node returned from its subparser and does not execute further subparsers. Because the oneOf() method is only a proxy for other parsers, it does not need to do anything with returned nodes. In contrast to that, the allOf() method has to compose every node returned from its subparsers to new node. Thanks to that, we can easily obtain AST instead of CST (concrete syntax tree).

This parser implementation can be featured using following notation:

parenthesesExpr ::= '(' expr ')' ;
atom ::= parenthesesExpr | identifier | staff | list | map ;

Therefore, allOf(a, b, c) {...} is equivalent of a = a b c, whereas oneOf(a, b, c) is equivalent of a = a | b | c.

Parsers' cascade

Parsers are cascaded composed and thanks to that, they are able to parse a one-dimensional tokens' stream to tree structure. For example, the mentioned before AtomParser is used by UnitParser which is responsible for parsing minus operator and dot operator. In turn, the UnitParser is used by FactorParser that is responsible for parsing not operator and power operator. The FactorParser is used by TermParser which is responsible for parsing product operator. The TermParser is used by SubexpressionParser which provides production rules for logic operators, relation operators etc. The SubexpressionParser is used by ExpressionParser which technically is oneOf-based wrapper for SubexpressionParser and LoopParser. The ExpressionParser represents all constructions that can product a value and is used by StatementParserwhich is eventually used byRootParser`.

The order of each parser in the cascade determines the precedence of each operation and has influence on the AST's shape. Take look at the following example:

class SubexpressionParser : Parser() {
   override fun tryToParse(input: TokenList): ParserOutput {
      val expr1Parser = leftAssociativeOperator(
         TermParser(),
         listOf(TokenType.PLUS, TokenType.MINUS),
         assert(TermParser(), "expression")
      ) { lhs, operator, rhs ->
         SumOperatorNode(lhs, operator, rhs)
      }

      val expr2Parser = leftAssociativeOperator(
         expr1Parser,
         listOf(TokenType.RELATION, TokenType.OPEN_ANGLE, TokenType.CLOSE_ANGLE),
         assert(expr1Parser, "expression")
      ) { lhs, operator, rhs ->
         RelationOperatorNode(lhs, operator, rhs)
      }

      val expr3Parser = leftAssociativeOperator(
         expr2Parser,
         listOf(TokenType.AND),
         assert(expr2Parser, "expression")
      ) { lhs, operator, rhs ->
         LogicOperatorNode(lhs, operator, rhs)
      }

      val expr4Parser = leftAssociativeOperator(
         expr3Parser,
         listOf(TokenType.OR),
         assert(expr3Parser, "expression")
      ) { lhs, operator, rhs ->
         LogicOperatorNode(lhs, operator, rhs)
      }

      return expr4Parser.parse(input)
   }

This is a code of SubexpressionParser and it consists of 4 subparsers cascaded composed. Because of the expr4Parser (responsible for or operator) is defined using expr3Parser (responsible for and operator), the or operator has a higher precedence than and operator (please compare Operators#Operators precedence).

Following listening features the composition of and and or operator nodes honoring their precedence:

$ smnp --ast --dry-run -c "true and false or not false and not false;"
RootNode 1:16
   └─LogicOperatorNode 1:16
      ├─LogicOperatorNode 1:6
      │  ├─BoolLiteralNode 1:1
      │  │  └ (bool, »true«, 1:1)
      │  ├─TokenNode 1:6
      │  │  └ (and, »and«, 1:6)
      │  └─BoolLiteralNode 1:10
      │     └ (bool, »false«, 1:10)
      ├─TokenNode 1:16
      │  └ (or, »or«, 1:16)
      └─LogicOperatorNode 1:29
         ├─NotOperatorNode 1:19
         │  ├─TokenNode 1:19
         │  │  └ (not, »not«, 1:19)
         │  └─BoolLiteralNode 1:23
         │     └ (bool, »false«, 1:23)
         ├─TokenNode 1:29
         │  └ (and, »and«, 1:29)
         └─NotOperatorNode 1:33
            ├─TokenNode 1:33
            │  └ (not, »not«, 1:33)
            └─BoolLiteralNode 1:37
               └ (bool, »false«, 1:37)

All parsers-related code is located in io.smnp.dsl.ast package.

Evaluator

Evaluator is the last stage of SMNP language processing pipeline and also is the heart of entire SMNP tool, which takes AST as an input and performs programmed operations. Similar to implemented parser, evaluator works recursively because of processing tree-like structure. Evaluator's architecture is similar to parser's one. Evaluator consists of smaller evaluators which are able to evaluate small part of AST's node types. Similar to parsers, the evaluators also uses a helper method (like oneOf()) to improve readability and decrease the complexity along with the code repeatability.

Because evaluator introduces as runtime term, it also works on special object called environment. The environment object contains some runtime information, like loaded modules (with included functions and methods), call stack with included scopes and some meta information. This object is passed through all evaluators along with AST and its subtrees.

Following listening shows the example evaluator which is if statement evaluator:

class ConditionEvaluator : Evaluator() {
   private val expressionEvaluator = ExpressionEvaluator()
   private val defaultEvaluator = DefaultEvaluator()

   override fun supportedNodes() = listOf(ConditionNode::class)

   override fun tryToEvaluate(node: Node, environment: Environment): EvaluatorOutput {
      val (conditionNode, trueBranchNode, falseBranchNode) = (node as ConditionNode)
      val condition = expressionEvaluator.evaluate(conditionNode, environment).value

      if (condition.type != DataType.BOOL) {
         throw contextEvaluationException(
            "Condition should be of bool type, found '${condition.value}'",
            conditionNode.position,
            environment
         )
      }

      if (condition.value as Boolean) {
         return defaultEvaluator.evaluate(trueBranchNode, environment)
      } else if (falseBranchNode !is NoneNode) {
         return defaultEvaluator.evaluate(falseBranchNode, environment)
      }

      return EvaluatorOutput.ok()
   }
}

The code above defines list of supported node, which in this case is a list with single element: ConditionNode. The ConditionNode is product of ConditionParser's work which handles the if statements. The tryToEvaluate() method contains the actually logic of evaluation, and in this case it:

evaluates the condition using ExpressionEvaluator (it always returns a value)
asserts the value to be of bool type - if it's other than bool, an exception is begin thrown
evaluates the trueBranchNode if the value is evaluated to true
if the condition is evaluated to false, it checks if falseBranchNode (that comes from else clause) is present. If so, it's being evaluated.

All evaluator-related code is located in io.smnp.evaluation package.

Interpreter

Interpreter actually isn't an another language processing stage, rather it is a facade that composes each stage into single pipeline, accepting a raw SMNP code as an input. It also accepts additional parameters, like printTokens, printAst and dryRun.

So far, SMNP provides two types of interpreters:

LanguageModuleInterpreter (with its implementation: DefaultLanguageModuleInterpreter) and is used only by Language Module Providers and Hybrid Module Providers.
DefaultInterpreter which is the standard interpreter that is used for user's input (in form of both scripts and inline code snippets).

The difference between these two interpreters is the LanguageModuleInterpreter does support only definitions of functions and methods at the top level of script (technically, in RootNode), whereas the DefaultInterpreter allows you to have each available statement at the top level of script.

Following snippets shows the code of DefaultInterpreter:

class DefaultInterpreter  {
   private val tokenizer = DefaultTokenizer()
   private val parser = RootParser()
   private val evaluator = RootEvaluator()

   fun run(
      code: String,
      environment: Environment = DefaultEnvironment(),
      printTokens: Boolean = false,
      printAst: Boolean = false,
      dryRun: Boolean = false
   ): Environment {
      val lines = code.split("\n")
      return run(lines, "<inline>", environment, printTokens, printAst, dryRun)
   }

   private fun run(
      lines: List<String>,
      source: String,
      environment: Environment,
      printTokens: Boolean,
      printAst: Boolean,
      dryRun: Boolean
   ): Environment {
      environment.loadModule("smnp.lang")

      val tokens = tokenizer.tokenize(lines, source)

      val ast = parser.parse(tokens)

      if (!dryRun) {
         evaluator.evaluate(ast.node, environment)
      }

      if (printTokens) println(tokens)
      if (printAst) ast.node.pretty()

      return environment
   }

   fun run(
      file: File,
      environment: Environment = DefaultEnvironment(),
      printTokens: Boolean = false,
      printAst: Boolean = false,
      dryRun: Boolean = false
   ): Environment {
      val lines = file.readLines()
      return run(lines, file.canonicalPath, environment, printTokens, printAst, dryRun)
   }
}

You can think of DefaultInterpreter as an endpoint for core module that is ready to be used by app module or any other application willing to make use of SMNP.

Modules Management System

The modules' management system is built on top of the PF4J plugin management system and uses its feature to meet the modules' system requirements. In fact, each ModuleProvider implementation is annotated with @ExtensionPoint annotation which comes from PF4J framework and each module's jar file is actually a plugin in the terminology of PF4J framework.

The central component of modules' management system is the ModuleRegistry with standard DefaultModuleRegistry implementation. At the SMNP startup process, the ModuleRegistry loads and starts each available module (i.e. found in the default modules' directory or in the module passed through smnp.modulesDir JVM property) and composes the dictionary (registry) with these modules.

When it comes to evaluation of the import statement, the Evaluator calls Environment's loadModule() method. The Environment requests ModuleProvider assigned to desired module from ModuleRegistry and accesses the module simply by passing the DefaultLanguageModuleInterpreter as well as itself to the ModuleProvider, which constructs the Module object. This is the stage, when it comes to evaluation scripts in LanguageModuleProvider-based module providers. Thanks to the tree-like structure of Module objects, the newly-provided Module can be simply merged into the root module of Environment. From now on, all functions and methods of the module are available. This is also the stage, when onModuleLoad() ModuleProvider's lifecycle hook is being invoked.

Simple Music Notation Processor

SMNP Language Reference

Modules and standard library: