Weak Graph-RAG

Ward’s thread is basically three strands woven together: **search / visibility**, **C2 heuristics**, and **“weak Graph-RAG” via Solo / blankets**. matrix

--- ## 1. Search, visibility, and the “visible federation” ### Centralised search (and why it’s brittle) * Today’s “Search” button in classic FedWiki talks to a **central indexer** that: * walks **sitemaps**, * indexes page titles and item IDs, * and serves “where else is this page / paragraph?” answers. * Ward points to `search-index-logs.html` at ward.dojo as a window into: * **which sites were found**, * which site URLs have been *mentioned* but never successfully crawled. The big problem: this indexer is **centralised, moderated, and fragile**. * My “20,000 pages” site appears to **choke** the code. * Ward used to **block** that site so the crawler wouldn’t get stuck; that block list vanished when Jon took over. * Hypothesis: *re-instating* the block and **removing my very large site** might free the crawler to find more sites again.

So when suggest84 asks “can I use search as proof that my wiki isn’t visible?”, Ward explicitly **withdraws** that suggestion: this search is too broken and too partial to be evidence of anything. ### Visibility vs. access Ward’s answers to suggest84 split two things: 1. **Access**: * Nobody can “magically” access your wiki; they need: * the **URL** or * a **JSON export**. * There is no stealth backdoor where random strangers suddenly see your pages. 2. **Visibility / discoverability**: * Once a site is **reachable** on the open web, it *may* be: * crawled by FedWiki’s search, * discovered via other people’s forks / links, * or appear in various experimental “visible federation” indexes. * Given the current brokenness of search, **absence from results ≠ privacy**; it just means “not found by this one brittle crawler”. Ward’s own preferred answer: **distributed search**, like in his page “Most Visible Federation”: search performed by *many* small crawlers / heuristics, not a single moderated service. ---

## 2. C2 usernames and “who we were really listening to” Ward’s C2 scrape gives us the **social background radiation** for the 1996 graph experiments: * He collected all `-- NameName` signatures from the **frozen c2 wiki** into `allUserNames.txt`. * Counted and sorted occurrences. The “big nine” show up at the tail: * JeffGrigg, SunirShah, RonJeffries, DaveHarris, JohnFletcher, DougMerritt, AnonymousDonor, AlistairCockburn, MichaelFeathers, plus Ward himself. * These are the names that **dominate the link structure** when we include signatures.

My own comment that “simulo’s contribution pins down what mind we were listening to” is exactly this: the early C2 graph is **heavily biased** towards a handful of prolific signers. In graph terms, those are the **super-hubs**. Why this matters for Solo aspects: * When Ward samples 1996 pages and builds aspect graphs, signatures create a **fake connectivity**: * everything hooks through a tiny set of **social hubs**, * so the graph *looks* very connected, but it’s mostly “this person agreed here too”. * With the username counts in hand, Ward can build a **block list / down-weight list**: * ignore or demote pages dominated by those high-frequency signatures, * and instead let **weaker, more structural links** drive the aspect graph.

That’s the bridge back to my earlier “Ward’s new experience” notes: once those hubs are suppressed, he must sample **4× more pages** to see links – but the links that *do* appear are **rarer and more meaningful**. --- ## 3. Blankets, diagrams, and “weak Graph-RAG”

Ward’s “blankets”: * For each randomly selected node in `c2Wiki1996.graphml`, Ward builds a **blanket**: * a local neighbourhood graph (something like: node + its near neighbours, via certain link types). * Once we allow connections through **signatures**, “lots of things connect even through signatures” – the graph becomes dense again. * Ward parametrises the script but doesn’t yet have a knob for our/Ward’s **block list** of heavy signers.

Ward’s reaction: * He posts multiple screenshots at `wiki-in-1996.html`: non-trivial graph shapes, some with “blankets” that look like **islands** or **constellations**. * His own judgement is very sober: > These look more interesting than they really are. But with a bit more tuning this could tell stories that were missed at the time. Then he pushes the idea a notch further: > Maybe we could give the algorithm some diagrams that are looking pretty interesting and then have the heuristics make it more interesting. That’s the key: use **human-chosen “interesting graphs”** as *training signals* for our heuristics: * Feed the algorithm examples of: * “this is a boring hub-and-spoke graph (all signatures)” vs. * “this is an intriguing mid-density pattern (cross-topic, weak ties, etc.)” * Tune the Solo / blanket heuristics so that: * they **avoid** trivial big-hub structures, * they **favour** mid-scale, story-rich subgraphs (roughly Jiang’s “far more smalls than larges” and “living structure”, but applied to discourse graphs). Finally he names it: > We might be inventing a weak and small scale version of Graph-RAG. In other words: * Treat the C2 corpus as a **graph of pages and signatures**. * Use Solo/blankets as a **retrieval layer**: * given a starting node or question, pull a small, **interesting** subgraph (via tuned heuristics). * Then let *humans* do the “generation” part: reading, interpreting, telling the story of that subgraph. It’s “Graph-RAG without the LLM step”: the graph is doing the *retrieval*, Solo does the *presentation*, and human readers do the *reasoning*.

--- ## 4. FedWiki-ready page draft Here’s a page we can paste into a FedWiki JSON editor or rebuild manually as items; text only, no journal: --- Title: Solo As Weak Graph-RAG Story: * paragraph Ward’s recent experiments with C2 1996 data, Solo aspects, and “blankets” suggest that we are building a weak, small-scale version of Graph-RAG inside Federated Wiki. * paragraph Classic FedWiki search is a central indexer that crawls sitemaps and indexes titles and item IDs. It works well enough for small farms, but chokes on very large sites and needs hand-maintained block lists. Ward notes that this “visible federation” is fragile and moderated, and that we should not use the current search results as evidence that a site is invisible. * paragraph In practice, access and visibility come apart. Nobody can see a wiki without its URL or exported JSON, but once a site is on the open web, it can be crawled and discovered by various search scripts and experiments. Ward prefers distributed search — many small crawlers and heuristics — over a single central index. * paragraph For the C2 corpus, Ward scraped all `-- NameName` signatures from the frozen wiki and produced `allUserNames.txt`. A small set of high-frequency names (SunirShah, RonJeffries, JeffGrigg, MichaelFeathers, AlistairCockburn, WardCunningham, and others) dominate the counts. These signatures act as social hubs in the link graph. * paragraph When Solo includes signatures in its aspect graphs, these hubs create an easy but misleading connectivity: many pages appear related simply because the same people signed them. My comment that this shows “what mind we were listening to” recognises that early C2 structure is strongly shaped by a handful of prolific participants. * paragraph With the username counts in hand, Ward can treat these heavy signers as a block list or down-weight list. By ignoring or demoting signature-heavy pages, he forces Solo to look for weaker, more structural connections. This matches his new experience: he has to sample four times as many pages to see links, but the links that do appear are rarer and more interesting. * paragraph Ward’s “blankets” build local neighbourhoods in `c2Wiki1996.graphml`. When signatures are allowed, everything connects again through the same social hubs. When signatures are filtered out, blankets produce smaller, more varied subgraphs that hint at stories and themes that were easy to miss in the live wiki. * paragraph Ward observes that these first diagrams “look more interesting than they really are”, but suggests that we could use human judgement to tune the heuristics. We can show the algorithm graph shapes that humans find intriguing, then encourage the heuristics to produce more graphs of that kind and fewer trivial hub-and-spoke patterns. * paragraph In this sense, Solo and the blanket scripts form a weak Graph-RAG layer for historic wiki discourse. The graph of pages, signatures, and links is the retrieval structure. Solo’s popup and lineups are the retrieval interface. Human readers provide the “generation” step by interpreting each retrieved subgraph as a story. * paragraph This weak Graph-RAG does not try to replace human sense-making. Instead, it amplifies interesting neighborhoods in a large legacy corpus, helping us see “stories that were missed at the time” and giving us a way to experiment with heuristics that favour surprise and living structure over simple popularity and hubs.