Language Extensions?

He was kind of foreign, you know? Like the Hub people. Bald as a coot. I remember thinking, 'you look like a young man, mister, but you look like you've been a young man for a long, long time if I'm any judge.' Normally I wouldn't have any man there, but he sat and talked to her in his foreign lingo and sang her songs and little poems and soothed her, and back she came, out of thin air, and I was ready and it was one, two, done. And then she was gone. Except that she was still there, I think. In the air.

- Pratchett, Terry. (2001). Thief of Time. Harper Collins

A really useful and exciting component of Concurnas is its support for language extensions. These enable us to embed code defined in other programming languages directly within Concurnas. This provides us with the following advantages:

  • Convenience - By using language extensions we avoid the need to switch to other tools/toolchains, products etc. this greatly simplifies our working environment and build process.

  • Appropriateness - By using language extensions we're able to use the right language for the task at hand, without having to try and fit our solution around the language we're mainly programming in.

  • Allows for compile time checking of code - It's always better to know about errors ahead of runtime. Language extensions may be integrated such that they can report errors back to the core Concurnas compiler for highlighting at compile time.

  • Compiles down to bytecode - Language extensions are compiled to bytecode and are thus executed at the same speed as 'native' high performance Concurnas code.

  • Easy to integrate - With Concurnas language extensions we need only concern ourselves with the tokenization, parsing and some semantic analysis of our guest language. We can avoid bytecode or machine code generation, as we instead output Concurnas code. This generated Concurnas code is checked and treated as par normal Concurnas code in the subsequent analysis and compilation stages of the core Concurnas compiler. This removes the majority of the hard work from the process of supporting guest languages and allows us to focus on the more interesting part concerning language structure and semantics.

Using language extensions?

We can make use of pre defined language extensions by importing them in much the same way as conventional import statements. We simply need use the using keyword to import the appropriate language extension class. The following is valid:

using com.myorg.langs.mylang
using com.myorg.langs.mylang as MyLanguage

from com.myorg.langs using mylang
from com.myorg.langs using mylang as MyLanguage

from com.mycode using *
using com.mycode.*

In addition to the above, language extensions may be imported using the regular import syntax and used as regular classes.

Using imported language extensions?

The syntax for using an imported language extension within a language extension expression is as follows:

LangExtentionName '||' code '||'

For example, a lisp like language may be used within a language extension expression as follows:

using com.myorg.langs.mylisp

result = mylisp||(+ 1 2 3 )||

The code expressed within the pair of ||'s is to be defined in that of the guest language extension, in the above example, mylisp. At compilation time, this code is passed to an instance of the mylisp language extension which itself (after any appropriate error checking and other operations it performs) returns a String output of Concurnas code - this Concurnas code is then handled as normal by the Concurnas compiler.

Language extensions may be used at any point within a Concurnas program, provided that the guest language extension supports it. I.e. at top level, within a function, within an extension function/method, at field level of a class and within a method. Language extensions may only be referenced as names as par above - they may not include parameters since they are not treated as variables/classes etc.

As with Strings when defining code in a guest language the backslash may be used as an escape character, for instance to escape the pipe operator:

using com.myorg.langs.usesPipe

result = usesPipe||something \| something ||//escape the |

Language extensions are executed as par normal Concurnas code, i.e as isolates.

Creating language extensions?

Concurnas offers a powerful and concise API (exposed under: com.concurnas.lang.langExt) for implementing language extensions, of either existing programming languages or totally new ones! Let us first familiarise ourselves with the typical phases of language compilation and execution implemented in some manor by most language compilers/runtimes, from the perspective of Concurnas:

  1. Lexical analysis. Splitting the input code into parable segments. ( val a = 10 -> 'val', 'a', '=', '10' )

  2. Parsing. Turning our stream of tokens from the previous stage into a verifiably syntactically correct structure.

  3. Semantic analysis. Attributes meaning to our previously defined structure and verifies consistency of that meaning (e.g. variable types, scopes).

  4. Intermediate code generation (optional). Transcompiling our previous data structure to another format for easier working in downstream phases.

  5. Bytecode generation. The final stage of the Concurnas compiler outputs executable bytecode.

  6. Code optimization. The Concurnas runtime (operating on the JVM) verifies the integrity of the previously output bytecode whilst performing some runtime transformations and optimizations.

  7. Machine code generation. The JVM creates optimized machine code from the previous bytecode and executes this machine code upon the CPU and GPUs available.

Aside: We see above that there are a large number of phases to the compilation process, in fact, within the Concurnas compiler (and most other compilers) these steps are often themselves broken down into further sub phases. Execution of these phases takes a non epsilon amount of time. It is for this reason (amongst others) that, in the interests of the high performance computing which Concurnas offers, steps 1-5 of 7 are performed at compilation time, once only, and the steps 6 and 7 are performed only infrequently and in a highly optimized manner, at runtime. This approach of course gives a natural performance boost over a runtime based scripting language which if naively implemented would implement all 7 of these stages (and perhaps not even implementing the final stage if it were a fully interpreted language) at runtime.

The good news is that with Concurnas language extensions we need focus our effort upon only steps 1, 2 and step 3 to some extent (depending on the complexity of our language), before finally outputting a String of Concurnas code as our intermediate code for step 4. Concurnas will then handle steps 5 onwards (whilst also implicitly performing steps 1 - 3 on the provided output Concurnas code). The first 3 phases tend to be where the most interesting and value added work regarding programming languages reside and so this puts us in a good position.

Defining language extensions?

Language extension classes can be defined using idiomatic Concurnas code, affording us all the features of normal Concurnas code. Furthermore, Language extension classes are executed like normal Concurnas code, within an isolate - which greatly eases state management.

Language extension classes are required to output Concurnas code in the form of a String. Within that String of valid Concurnas code any aspect of Concurnas may be used. For instance, we can create new variables, classes, methods, functions etc as well as use those defined outside the definition of any language extension expressions.

Language extension classes must implement the com.concurnas.lang.langExt.LanguageExtension trait and expose a publicly accessible zero argument constructor. This trait stipulates the following two methods which must be implemented:

  • def initialize(line int, col int, loc SourceLocation, src String) Result

  • def iterate(ctx Context) IterationResult

When a language extension expression is encountered Concurnas will create one instance of the appropriate imported class (imported via the aforementioned using keyword) via its publicly accessible zero argument constructor. After this, the initialize method is called once. This provides the following parameters:

  • line int - The line number at the start of the || block.

  • col int - The column number at the start of the || block.

  • loc com.concurnas.lang.langExt.SourceLocation - The location of the language extension expression - either:

    • At top level

    • Within a function

    • Within an extension function/method

    • At field level of a class

    • Within a method

  • src String - The code of the language extension expression defined within the bounds of the || block.

A com.concurnas.lang.langExt.Result object must be returned from the initialize method. This includes a list of any errors, warnings which may have been encountered during initial compilation - for instance if the code is syntactically invalid (if analysed at this point) or defined at an unsupportable source location.

Next, Concurnas will call the iterate method. Concurnas is an iterative compilation compiler, as such multiple calls to the iterate method may be made in the process of compiling the source file in which the language extension expression is defined as more information regarding the context of the defined code becomes known. A language extension implementation may need to account for this.

A com.concurnas.lang.langExt.Context object is provided to the iterate method. This allows the language extension implementation to interrogate the host Concurnas compiler at the point of usage of the language extension via a language extension expression. The following methods are exposed on the Context object and can be of particular use in this regard:

  • def getVariable(name String) Variable? - Returns the variable (if any) in scope matching the provided name.

  • def getclass(name String) Class? - Returns the class (if any) in scope matching the provided name.

  • def getMethods(name String) List<Method> - Returns all callable methods in scope matching the provided name.

  • def getNesting() List<Location> - Returns a list of the nesting hierarchy in terms of methods and classes in which the language extension expression code resides.

Where relevant the aforementioned Variable and Method classes contain type information. This type information is expressed as a java bytecode field descriptor String, details of this can be found here:\\ JVM Spec.

The iterate method is required to output a String of Concurnas code 1 each time it is called. This String is then passed to the core Concurnas compiler for subsequent compilation.

The iterate method is expected to return a com.concurnas.lang.langExt.IterationResult object. This includes a list of any errors, warnings which may have been encountered during iterative compilation (for instance scope/type errors, usage context warnings etc). Additionally this object is to include a reference to the aforementioned output String of Concurnas code.

Errors and warnings?

One of the most important jobs of any language compiler is the verification and validation of code. This can take place in either the initialization or iterative compilation stages of a Concurnas language extension, i.e. within the the initialize and iterate methods respectfully.

Problems in the verification and validation of code may take the form of errors or warnings, multiple of which may be outputted by the compilation phases, the Result class (of which IterationResult inherits) contains two list fields for holding these errors and warnings.

These errors and warnings are then propagated back via the instances of the Result class returned from the initialize and iterate methods back to the main Concurnas compiler and prefixed with the name of the language extension referenced within the language extension expression.

If either the initialize or iterate methods throw an exception, this will be caught by the core Concurnas compiler and wrapped up as an error originating from the language extension (and prefixed as such). If the initialize throws an exception then subsequent compilation (i.e. invoking of the iterate) will not take place.

Practical tips?

There are some practical tips which can be considered when we're building language extensions:

Abstract syntax trees?

Abstract syntax trees or ATS's are an excellent intermediate data format representation. Many compilers make use of these. More information regarding this may be found here: AST

The visitor pattern?

The visitor pattern crops up time and time again as a really useful method for navigating the aforementioned ATS. More information regarding this may be found here: Visitor pattern

Iterative compilation?

Since Concurnas is an iterative compiler it is to be expected that multiple calls to our aforementioned iterate method will be made during the course of compilation. It is possible that a type of a dependant class, variable etc which was previously unknown will become known during this iterative compilation. Our language extension compiler may need to take this into account - i.e. by avoiding repeating work which it has already completed or by say caching a previous compilation result if it is agnostic to these iterative calls. It may also be appropriate to output a known to be incorrect variant of the expected output code (with errors flagged as appropriate) such that dependant code may be allowed to continue iterative compilation in some manner.

Focus of effort?

Given that our language extension will be outputting Concurnas code, and that this Concurnas code will be passed to the core Concurnas compiler for further compilation as par normal Concurnas code, it is recommended that attention be focused on generating syntactically valid Concurnas code over semantically correct. In effect the core Concurnas compiler acts like a safety net for any problems which may have been missed during processing of the language extension code. The language extension doesn't have to be perfect, the core Concurnas compiler will do this hard work for us.

Example language extension?

Here is a full example of a very simple lisp like language com.mycompany.myproduct.langExts.miniLisp.

In the interests of brevity this example focuses upon only a small subsection of syntax and takes a few shortcuts (imperfect parsing, operates upon numerical types only, no variable assignment/function definitions) etc, but it nevertheless serves nicely to illustrate the typical phases of compilation as well as the sorts of verification and validation we must perform within a language extension.

from com.concurnas.lang.LangExt import LanguageExtension, Context, SourceLocation
from com.concurnas.lang.LangExt import ErrorOrWarning, Result, IterationResult
from com.concurnas.lang.LangExt import Variable, Method, MethodParam
from java.util import Stack, Scanner, ArrayList, List

/*
* sample inputs:
*  (+ 1 2 3 )
*  (+ 1 2 n )
*  (+ 1 2 ( * 3 5) )
*  (+ 1 2 ( methodCall 3 5) )
*/

/////////////////   AST  /////////////////
enum MathOps(code String){PLUS("+"), MINUS("-"), MUL("*"), DIV("/"), POW("**");
  override toString() => code
}

open class ASTNode(-line int, -col int){
  nodes = new ArrayList<ASTNode>()
  def add(toAdd ASTNode){
    nodes.add(toAdd)
  }
	
  def accept(visitor Visitor) Object 
}

class LongNode(line int, col int, along Long) < ASTNode(line, col){
  def accept(vis Visitor) Object => vis.visit(this);
}

class MathNode(line int, col int, what MathOps) < ASTNode(line, col){
  def accept(vis Visitor) Object => vis.visit(this);
}

class MethodCallNode(line int, col int, methodName String) < ASTNode(line, col){
  def accept(vis Visitor) Object => vis.visit(this);
}

class NamedNode(line int, col int, name String) < ASTNode(line, col){
  def accept(vis Visitor) Object => vis.visit(this);
}


///////////////// Parser /////////////////

def parse(line int, col int, source String) (ASTNode, Result) {
  rootNode ASTNode?=null
	
  warnings = new ArrayList<ErrorOrWarning>()
  errors = new ArrayList<ErrorOrWarning>()
	
  sc = new Scanner(source)
  nodes = Stack<ASTNode>()
	
  while(sc.hasNext()){
    if(sc.hasNextInt()){
      llong = sc.nextLong()
			
      if(nodes.isEmpty()){
        errors.add(new ErrorOrWarning(line, col, "unexpected token: {llong}"))
      }
      else{
        nodes.peek().add(LongNode(line, col, llong))
      }
    }else{
      str = sc.next()
      match(str){
        ")" => {
					
          if(nodes.isEmpty()){
            errors.add(new ErrorOrWarning(line, col, "unexpected token: {str}"))
          }else{
            rootNode = nodes.pop()
            if(rootNode <> null and not nodes.isEmpty()){
              nodes.peek().add(rootNode)
            }
          }
        }
        else => match(str){
          "( +" => nodes.push(MathNode(line, col, MathOps.PLUS))
          "( -" => nodes.push(MathNode(line, col, MathOps.MINUS))
          "( *" => nodes.push(MathNode(line, col, MathOps.MUL))
          "( /" => nodes.push(MathNode(line, col, MathOps.DIV))
          "( **" => nodes.push(MathNode(line, col, MathOps.POW))
          else =>{
            if(str.startsWith("(")){
              str = str.substring(1, str.length());
              nodes.push(MethodCallNode(line, col, str))
            }else{
              if(nodes.isEmpty()){
                errors.add(new ErrorOrWarning(line, col, "unexpected token: {str}"))
              }else{
                nodes.peek().add(NamedNode(line, col, str))
              }
            }
          } 
        }
      }
    }
  }
		
  rootNode?:LongNode(0, 0, 0), new Result(errors, warnings)
}

/////////////////Visitors/////////////////

trait Visitor{
  def visit(longNode LongNode) Object
  def visit(mathNode MathNode) Object
  def visit(namedNode NamedNode) Object
  def visit(methodNode MethodCallNode) Object
}

class CodeGennerator ~ Visitor{
  def visit(longNode LongNode) Object{
    "{longNode.along}"
  }
	
  def visit(mathNode MathNode) Object{
    String.join("{mathNode.what}", ("" + x.accept(this)) for x in mathNode.nodes)
  }
	
  def visit(methodCall MethodCallNode) Object{
    "{methodCall.methodName}(" + String.join(", ", ("" + x.accept(this)) for x in methodCall.nodes ) + ")"
  }
	
  def visit(namedNode NamedNode) Object{
    "{namedNode.name}"
  }
}
cg = CodeGennerator()

private numericalTypes = new set<String>()
{
  numericalTypes.add("I")
  numericalTypes.add("J")
  numericalTypes.add("Ljava/lang/Long;")
  numericalTypes.add("Ljava/lang/Integer;")
}

class NodeChecker(ctx Context) ~ Visitor{
  errors = ArrayList<ErrorOrWarning>()
  warnings =  ArrayList<ErrorOrWarning>()
	
  def visit(longNode LongNode) Object => "I"
  def visit(mathNode MathNode) Object{
    for( x in mathNode.nodes){
      x.accept(this)
    }
		
    "I"
  }
	
  private def raiseError(line int, col int, str String){
    errors.add(new ErrorOrWarning(line, col, str))
  }
	
  def visit(namedNode NamedNode) Object{
    vari Variable? = ctx.getVariable(namedNode.name)
    type = vari?.type
		
    if(type){
      if(type not in numericalTypes){
        raiseError(namedNode.line, namedNode.col, "Variable: {namedNode.name} is expected to be of numerical type")
      }
    }else{
      raiseError(namedNode.line, namedNode.col, "Unknown variable: {namedNode.name}")
    }
		
    return type
  }
	
	
  def visit(methodCall MethodCallNode) Object{
    meths List<Method> = ctx.getMethods(methodCall.methodName)
		
    found = false
		
    argsWanted = methodCall.nodes
    wantedSize = argsWanted.size()
		
    for(method in meths){
      ret = method.returnType
      if(ret=="V" or ret in numericalTypes){
        margs ArrayList<String> = method.arguments
        if(wantedSize == margs.size()){
          found=true
        }else{
          //check to see if an arg is an array - then can attempt to vararg in
          found = margs.stream().anyMatch(a => a <> null and a.startsWith("["))				
        }
				
        if(found){
          break
        }
      }
    }
		
    if(not found){
      raiseError(methodCall.line, methodCall.col, "Cannot find method: {methodCall.methodName}")
    }
		
    for(arg in argsWanted; idx){
      atype = ""+arg.accept(this)
      if(atype not in numericalTypes){
        raiseError(methodCall.line, methodCall.col, "method argument {idx+1} is expected to be of numerical type not: {atype}")
      }
    }
		
    return "I"
  }
}


///////////////// Lang  /////////////////

class SimpleLisp ~ LanguageExtension{
  line int
  col int
  rootNode ASTNode?=null
	
  def initialize(line int, col int, location SourceLocation, source String) Result {
    this.line = line
    this.col = col
		
    //build abstract syntax tree...
    (this.rootNode, result) = parse(line, col, source)
		
    result
  }
	
  def iterate(ctx Context) IterationResult {
    rootn = rootNode
    if(rootn){
      nc = NodeChecker(ctx)
      rootn.accept(nc)
			
      code = ""
      if(nc.errors.isEmpty() and nc.warnings.isEmpty()){
        code = ""+rootn.accept(cg)
      }
			
      new IterationResult(nc.errors, nc.warnings, code)
			
    }else{
      errors = ArrayList<ErrorOrWarning>()
      warnings =  ArrayList<ErrorOrWarning>()
      new IterationResult(errors, warnings, "")
    }
  }
}

Using our example language extension?

We can now use or previously defined language extension:

from com.mycompany.myproduct.langExts.miniLisp using SimpleLisp

result = SimpleLisp||(+ 1 2 (* 3 3 ) )||
//result == 9

Footnotes

1this technically makes language extensions transpilers