Pegasm

I was unfamiliar with the Java 1.5 instrumentation interface when I joined New Relic. I studied it and the ASM library that supports the manipulation of bytecode that the interface enables. This was a learning exercise.

Strategy

VIMEO 57515597 Chris Hansen explains the ASM capabilities in his PJUG talk in the fall of 2012. api video

I adopted the methodology Chris suggested: write java, read the bytecode, hack the bytecode to handle your variation.

Transcribed the published PEG grammar producing one or two line Java methods for each production. Ford 2004. pdf

I included an abstract class that provided input buffering and parse position tracking utilities.

The hand coded parser could read a text version of the published grammar. It's result was simply pass or fail.

I wrote an instrumentation jar that would amend each production method of the parser with trace calls that would report, step by step, the production rules that were accepted. github

Limitations

Chris' methodology suggested that I would then write instrumentation that could generate the parser that I had written by hand from the text file itself. I haven't yet succeeded at this for several reasons.

I chose to use the "sax" style ASM API which seems best for annotating existing code. This worked for tracing, but not so much for translation. The "dom" version of the ASM API would seem preferred.

I found that I could not write Java code for some productions directly. I wish I could include a while loop as a term in an expression. This isn't possible in Java because a while loop is a statement, not a term. Instead I replaced while loops with calls to methods that contained the while loops. A work-around that will not be required when generating the while loop code in bytecode myself.