Exploratory Parsing in Frames

A brief charter. We explore Ward's technique of Exploratory Parsing with several hours, some federated collaboration, the Frame Plugin and our esm.html script.

This was a web based experiment manager that could prepare, monitor, and refine task specific grammars turned into engines with a slightly enhanced version of Ian Piumarta's pegleg.

We begin with our esm frame and a javascript peg parser. docs

We will parse raw text, a comma-separated list of url.

Trouble: I wanted newline as separator but couldn't get that to parse.

const rawtext = `https://cdn.jsdelivr.net/npm/peggy@4.0.2/esm,https://c2.com/ward/sys/find.cgi?search=explore&start=3&list=7`

Here's our grammar:

Trouble: the other rule works when called directly but doesn't take over when url fails to match.

const grammar = ` start = url ("," url)* url = (protocol "//" domain path* query?) / other protocol = "http" "s"? ":" domain = word ("." word)* path = "/" word suffix? version? suffix = "." word query = "?" param ("&" param)* param = key ("=" value)? key = word / word value = word / num version = "@" capture:(num "." num "." num) {return capture.flat().join("")} word = capture:([a-z] [a-z0-9]*) {return capture.flat().join("")} num = [0-9]+ other = [^,]+ `

We pretty-print the parse result and follow that with the trace generated dot and log.

Readers can interact with the parse tree as follows:

First, open DOT Subscriber.

Second, click the graph button in the frame below to send the parse tree to be displayed in the DOT Subscriber.

Third, click on edges in that graph to see which portions of the data match that edge.

//wiki.dbbs.co/assets/pages/js-snippet-template/esm.html HEIGHT 400 SOURCE dot LINEUP dotLabelClick

Publish dot as sourceData.

function publish(dot) { window.parent.postMessage({ action: 'publishSourceData', name: 'dot', sourceData: {dot} }, '*'); }

Import one Observable function.

import {observe} from "https://cdn.jsdelivr.net/npm/@observablehq/stdlib@5.8.6/src/generators/index.js";

Listen for dotLabelClicks.

const dotLabelClickStream = observe(change => { function filterForDotLabelClickStream(message) { console.log({message}); if (message ?.data ?.action == "dotLabelClickStream" /* compare our ids with the message */ ) { const {label, title} = message.data; change({label, title}); } } window.addEventListener( 'message', filterForDotLabelClickStream); return () => window.removeEventListener( 'message', filterForDotLabelClickStream); });

//...

First our trace handler then the parser run and print.

const stack = []; const log = []; const tally = {}; function trace({type,rule,result}) { const show = () => { log.push(`${stack.join("->")}, ${ JSON.stringify(result)}`); const edge = stack.slice(-2).join("->") if(edge in tally) tally[edge].count++ else tally[edge]={count:1} } switch(type) { case 'rule.enter': stack.push(rule); break case 'rule.fail': stack.pop(); break; case 'rule.match': show(); stack.pop(); break } }

import peggy from 'https://cdn.jsdelivr.net/npm/peggy@4.0.2/+esm'; import * as frame from 'https://wiki.dbbs.co/assets/v1/frame.js' export async function emit(el) { const option = {trace:true,tracer:{trace}}; const parser = peggy.generate(grammar,option); const result = parser.parse(rawtext,option); const style = `style="background-color:white"`; const token = t => `<code ${style}>${t}</code>`; const pretty = result .flat(9) .filter(t => t) .map(token) .join(" "); const dot = Object.entries(tally) .map(t =>`${t[0]} [label=${t[1].count}]`) .join("\n") el.innerHTML = ` <button>graph</button> <div>${pretty}</div> <pre id="sample"></pre> <pre>${dot}</pre> <pre>${log.join("\n")}</pre>`; el.querySelector('button').onclick = async () => { publish(`digraph { ${dot} }`); for await (const data of dotLabelClickStream) { const {label, title} = data; if (title.includes('->')) { window.sample.innerText = log .filter(line => line.includes(`${title},`)) .join("\n"); } } }; }