Architecture overview
The system is composed of following components (which are technically Gradle subprojects) :
core- the SMNP language engine consisting of interpreter (being actually a facade for tokenizer, parser and evaluator), as well as the modules' management systemapp- the commandline-based frontend forcorecomponent- modules (
smnp.lang,smnp.io,smnp.audio.synthetc.) - a set of external modules that extends the functionality of SMNP scripts api- component that provides shared interfaces and abstract classes common for bothcoreand each module components.
Interpreter
SMNP language interpreter is a facade of three parts composed to pipeline:
- tokenizer (or lexer)
- parser
- evaluator
All of these components participate in processing and executing passed code, producing output that can be consumed by next component.
Tokenizer
Tokenizer is the first component in code processing pipeline. Input code is directly passed to tokenizer which splits it to several pieces called tokens. Each token contains of main properties, such as value and related token type, for example:
- the
"Hello, world!"is token with valueHello, world!and token type ofSTRING - the
abc123is token with valueabc123and token type ofIDENTIFIER
Apart from mentioned data, each token also includes some metadata, like location including column, line and source name (file name or module name).
You can check what tokens are produced for arbitrary input code using --tokens flag, for example:
$ smnp --tokens --dry-run -c "[1, 2, 3] as i ^ println(\"Current: \" + i.toString());"
size: 21
current: 0 -> (open_square, »[«, 1:1)
all: [(open_square, »[«, 1:1), (integer, »1«, 1:2), (comma, »,«, 1:3), (integer, »2«, 1:5), (comma, »,«, 1:6), (integer, »3«, 1:8), (close_square, »]«, 1:9), (as, »as«, 1:11), (identifier, »i«, 1:14), (caret, »^«, 1:16), (identifier, »println«, 1:18), (open_paren, »(«, 1:25), (string, »"Current: "«, 1:26), (plus, »+«, 1:38), (identifier, »i«, 1:40), (dot, ».«, 1:41), (identifier, »toString«, 1:42), (open_paren, »(«, 1:50), (close_paren, »)«, 1:51), (close_paren, »)«, 1:52), (semicolon, »;«, 1:53)]
Tokenizer tries to match input with all available patterns, sticking with rule first-match. That means if there is more than one patterns that match input, only first will be applied. This is why you can't for example name your variables or functions/methods with keywords. Take a look at the output of following command:
$ smnp --tokens --dry-run -c "function = 14;"
size: 4
current: 0 -> (function, »function«, 1:1)
all: [(function, »function«, 1:1), (assign, »=«, 1:10), (integer, »14«, 1:12), (semicolon, »;«, 1:14)]
The first token has type of function, not identifier which is expected for assignment operation.
All tokenizer-related code is located in io.smnp.dsl.token module.
Parser
Parser is the next stage of code processing pipeline. It takes input from tokenizer and tries to compose a tree (called AST, which stands for abstract syntax tree) basing on known rules, which are called productions. As long as tokenizer defines language's alphabet, i.e. a set of available terminals, parser defines grammar of that language. It means that tokenizer can for example detect unknown character or sequence of characters meanwhile parser is able to detect unknown constructions built with known tokens.
A good example is the last snippet from Application overview#Tokenizer section:
$ smnp --tokens --dry-run -c "function = 14;"
size: 4
current: 0 -> (function, »function«, 1:1)
all: [(function, »function«, 1:1), (assign, »=«, 1:10), (integer, »14«, 1:12), (semicolon, »;«, 1:14)]
Syntax error
Source: <inline>
Position: line 1, column 10
Expected function/method name, got '='
You can see, that tokenizer has successfully done his job,
but parser throw a syntax error saying that it does not know
any production that could (directly or indirectly) match
function assign integer semicolon sequence.
You can check AST produced for arbitrary input code
using --ast flag, for example:
smnp --ast --dry-run -c "[1, 2, 3] as i ^ println(\"Current: \" + i.toString());"
RootNode 1:16
└─LoopNode 1:16
├─ListNode 1:1
│ ├─IntegerLiteralNode 1:2
│ │ └ (integer, »1«, 1:2)
│ ├─IntegerLiteralNode 1:5
│ │ └ (integer, »2«, 1:5)
│ └─IntegerLiteralNode 1:8
│ └ (integer, »3«, 1:8)
├─LoopParametersNode 1:14
│ └─IdentifierNode 1:14
│ └ (identifier, »i«, 1:14)
├─FunctionCallNode 1:18
│ ├─IdentifierNode 1:18
│ │ └ (identifier, »println«, 1:18)
│ └─FunctionCallArgumentsNode 1:25
│ └─SumOperatorNode 1:38
│ ├─StringLiteralNode 1:26
│ │ └ (string, »"Current: "«, 1:26)
│ ├─TokenNode 1:38
│ │ └ (plus, »+«, 1:38)
│ └─AccessOperatorNode 1:41
│ ├─IdentifierNode 1:40
│ │ └ (identifier, »i«, 1:40)
│ ├─TokenNode 1:41
│ │ └ (dot, ».«, 1:41)
│ └─FunctionCallNode 1:42
│ ├─IdentifierNode 1:42
│ │ └ (identifier, »toString«, 1:42)
│ └─FunctionCallArgumentsNode 1:50
└─NoneNode 0:0
Technically SMNP does have LL(1) parser implemented. The acronym means:
- input is read from Left to right
- parser produces a Left-to-right derivation
- parser uses one lookahead token. Even though this kind of parsers is treated as the least sophisticated, in most cases they do the job and are enough even for more advanced use cases.
SMNP language parser has some fundamental helper function that provides something like construction blocks that are used in right production rules implementations. SMNP language parser actually is a combination of sub-parsers that are able to parse subset of language.
For example io.smnp.dsl.ast.parser.AtomParser defines a parser related to parsing
atomic values, like literals and so on (note also that expression with parentheses
on both sides is treated like atom):
class AtomParser : Parser() {
override fun tryToParse(input: TokenList): ParserOutput {
val parenthesesParser = allOf(
terminal(TokenType.OPEN_PAREN),
ExpressionParser(),
terminal(TokenType.CLOSE_PAREN)
) { (_, expression) -> expression }
return oneOf(
parenthesesParser,
ComplexIdentifierParser(),
StaffParser(),
ListParser(),
LiteralParser(),
MapParser()
).parse(input)
}
}
In this example you can notice both allOf() and oneOf() helper methods. The first one returns success (and parsed node) if and only if all of its subparsers returns success as well. In contrast to that, the oneOf() method returns success with parsed node when any of its subparsers returns success. The oneOf() method seeks for the first parser that returns success. When it finds it, it immediately returns success with node returned from its subparser and does not execute further subparsers. Because the oneOf() method is only a proxy for other parsers, it does not need to do anything with returned nodes. In contrast to that, the allOf() method has to compose every node returned from its subparsers to new node. Thanks to that, we can easily obtain AST instead of CST (concrete syntax tree).
This parser implementation can be featured using following notation:
parenthesesExpr ::= '(' expr ')' ;
atom ::= parenthesesExpr | identifier | staff | list | map ;
Therefore, allOf(a, b, c) {...} is equivalent of a = a b c, whereas oneOf(a, b, c) is equivalent of a = a | b | c.
Parsers' cascade
Parsers are cascaded composed and thanks to that, they are able to parse a one-dimensional tokens' stream to tree structure. For example, the mentioned before AtomParser is used by UnitParser which is responsible for parsing minus operator and dot operator. In turn, the UnitParser is used by FactorParser that is responsible for parsing not operator and power operator. The FactorParser is used by TermParser which is responsible for parsing product operator. The TermParser is used by SubexpressionParser which provides production rules for logic operators, relation operators etc. The SubexpressionParser is used by ExpressionParser which technically is oneOf-based wrapper for SubexpressionParser and LoopParser. The ExpressionParser represents all constructions that can product a value and is used by StatementParserwhich is eventually used byRootParser`.
The order of each parser in the cascade determines the precedence of each operation and has influence on the AST's shape. Take look at the following example:
class SubexpressionParser : Parser() {
override fun tryToParse(input: TokenList): ParserOutput {
val expr1Parser = leftAssociativeOperator(
TermParser(),
listOf(TokenType.PLUS, TokenType.MINUS),
assert(TermParser(), "expression")
) { lhs, operator, rhs ->
SumOperatorNode(lhs, operator, rhs)
}
val expr2Parser = leftAssociativeOperator(
expr1Parser,
listOf(TokenType.RELATION, TokenType.OPEN_ANGLE, TokenType.CLOSE_ANGLE),
assert(expr1Parser, "expression")
) { lhs, operator, rhs ->
RelationOperatorNode(lhs, operator, rhs)
}
val expr3Parser = leftAssociativeOperator(
expr2Parser,
listOf(TokenType.AND),
assert(expr2Parser, "expression")
) { lhs, operator, rhs ->
LogicOperatorNode(lhs, operator, rhs)
}
val expr4Parser = leftAssociativeOperator(
expr3Parser,
listOf(TokenType.OR),
assert(expr3Parser, "expression")
) { lhs, operator, rhs ->
LogicOperatorNode(lhs, operator, rhs)
}
return expr4Parser.parse(input)
}
This is a code of SubexpressionParser and it consists of 4 subparsers cascaded composed. Because of the expr4Parser (responsible for or operator) is defined using expr3Parser (responsible for and operator), the or operator has a higher precedence than and operator (please compare Operators#Operators precedence).
Following listening features the composition of and and or operator nodes honoring their precedence:
$ smnp --ast --dry-run -c "true and false or not false and not false;"
RootNode 1:16
└─LogicOperatorNode 1:16
├─LogicOperatorNode 1:6
│ ├─BoolLiteralNode 1:1
│ │ └ (bool, »true«, 1:1)
│ ├─TokenNode 1:6
│ │ └ (and, »and«, 1:6)
│ └─BoolLiteralNode 1:10
│ └ (bool, »false«, 1:10)
├─TokenNode 1:16
│ └ (or, »or«, 1:16)
└─LogicOperatorNode 1:29
├─NotOperatorNode 1:19
│ ├─TokenNode 1:19
│ │ └ (not, »not«, 1:19)
│ └─BoolLiteralNode 1:23
│ └ (bool, »false«, 1:23)
├─TokenNode 1:29
│ └ (and, »and«, 1:29)
└─NotOperatorNode 1:33
├─TokenNode 1:33
│ └ (not, »not«, 1:33)
└─BoolLiteralNode 1:37
└ (bool, »false«, 1:37)
All parsers-related code is located in io.smnp.dsl.ast package.
Evaluator
Evaluator is the last stage of SMNP language processing pipeline and also is the heart
of entire SMNP tool, which takes AST as an input and performs programmed operations.
Similar to implemented parser, evaluator works recursively because of processing tree-like
structure.
Evaluator's architecture is similar to parser's one. Evaluator consists of
smaller evaluators which are able to evaluate small part of AST's node types.
Similar to parsers, the evaluators also uses a helper method (like oneOf()) to improve readability and decrease
the complexity along with the code repeatability.
Because evaluator introduces as runtime term, it also works on special object called environment. The environment object contains some runtime information, like loaded modules (with included functions and methods), call stack with included scopes and some meta information. This object is passed through all evaluators along with AST and its subtrees.
Following listening shows the example evaluator which is if statement evaluator:
class ConditionEvaluator : Evaluator() {
private val expressionEvaluator = ExpressionEvaluator()
private val defaultEvaluator = DefaultEvaluator()
override fun supportedNodes() = listOf(ConditionNode::class)
override fun tryToEvaluate(node: Node, environment: Environment): EvaluatorOutput {
val (conditionNode, trueBranchNode, falseBranchNode) = (node as ConditionNode)
val condition = expressionEvaluator.evaluate(conditionNode, environment).value
if (condition.type != DataType.BOOL) {
throw contextEvaluationException(
"Condition should be of bool type, found '${condition.value}'",
conditionNode.position,
environment
)
}
if (condition.value as Boolean) {
return defaultEvaluator.evaluate(trueBranchNode, environment)
} else if (falseBranchNode !is NoneNode) {
return defaultEvaluator.evaluate(falseBranchNode, environment)
}
return EvaluatorOutput.ok()
}
}
The code above defines list of supported node, which in this case is a list with single element: ConditionNode. The ConditionNode is product of ConditionParser's work which handles the if statements.
The tryToEvaluate() method contains the actually logic of evaluation, and in this case it:
- evaluates the condition using
ExpressionEvaluator(it always returns a value) - asserts the value to be of
booltype - if it's other thanbool, an exception is begin thrown - evaluates the
trueBranchNodeif the value is evaluated totrue - if the condition is evaluated to
false, it checks iffalseBranchNode(that comes fromelseclause) is present. If so, it's being evaluated.
All evaluator-related code is located in io.smnp.evaluation package.
Interpreter
Interpreter actually isn't an another language processing
stage, rather it is a facade that composes each stage into single pipeline,
accepting a raw SMNP code as an input.
It also accepts additional parameters, like printTokens, printAst and dryRun.
So far, SMNP provides two types of interpreters:
LanguageModuleInterpreter(with its implementation:DefaultLanguageModuleInterpreter) and is used only by Language Module Providers and Hybrid Module Providers.DefaultInterpreterwhich is the standard interpreter that is used for user's input (in form of both scripts and inline code snippets).
The difference between these two interpreters is the LanguageModuleInterpreter does support only definitions of functions and methods at the top level of script (technically, in RootNode), whereas the DefaultInterpreter allows you to have each available statement at the top level of script.
Following snippets shows the code of DefaultInterpreter:
class DefaultInterpreter {
private val tokenizer = DefaultTokenizer()
private val parser = RootParser()
private val evaluator = RootEvaluator()
fun run(
code: String,
environment: Environment = DefaultEnvironment(),
printTokens: Boolean = false,
printAst: Boolean = false,
dryRun: Boolean = false
): Environment {
val lines = code.split("\n")
return run(lines, "<inline>", environment, printTokens, printAst, dryRun)
}
private fun run(
lines: List<String>,
source: String,
environment: Environment,
printTokens: Boolean,
printAst: Boolean,
dryRun: Boolean
): Environment {
environment.loadModule("smnp.lang")
val tokens = tokenizer.tokenize(lines, source)
val ast = parser.parse(tokens)
if (!dryRun) {
evaluator.evaluate(ast.node, environment)
}
if (printTokens) println(tokens)
if (printAst) ast.node.pretty()
return environment
}
fun run(
file: File,
environment: Environment = DefaultEnvironment(),
printTokens: Boolean = false,
printAst: Boolean = false,
dryRun: Boolean = false
): Environment {
val lines = file.readLines()
return run(lines, file.canonicalPath, environment, printTokens, printAst, dryRun)
}
}
You can think of DefaultInterpreter as an endpoint for core module that is ready to be used by app module or any other application willing to make use of SMNP.
Modules Management System
The modules' management system is built on top of the PF4J plugin management system and uses its feature to meet the modules' system requirements. In fact, each ModuleProvider implementation is annotated with @ExtensionPoint annotation which comes from PF4J framework and each module's jar file is actually a plugin in the terminology of PF4J framework.
The central component of modules' management system is the ModuleRegistry with standard DefaultModuleRegistry implementation. At the SMNP startup process, the ModuleRegistry loads and starts each available module (i.e. found in the default modules' directory or in the module passed through smnp.modulesDir JVM property) and composes the dictionary (registry) with these modules.
When it comes to evaluation of the import statement, the Evaluator calls Environment's loadModule() method. The Environment requests ModuleProvider assigned to desired module from ModuleRegistry and accesses the module simply by passing the DefaultLanguageModuleInterpreter as well as itself to the ModuleProvider, which constructs the Module object. This is the stage, when it comes to evaluation scripts in LanguageModuleProvider-based module providers. Thanks to the tree-like structure of Module objects, the newly-provided Module can be simply merged into the root module of Environment. From now on, all functions and methods of the module are available. This is also the stage, when onModuleLoad() ModuleProvider's lifecycle hook is being invoked.
Simple Music Notation Processor
SMNP Language Reference
- About SMNP Language
- Import statement
- Supported data types
- Variables
- Operators
- Functions and methods
- Condition statement
- Loop statement-expression
- Error handling
Modules and standard library: