We consider the various responsibilities we have given to various files and scripts that work together but often for different purposes.
We distinguish indexing from the search queries agains that index. We also separate generally useful status from the more obscure information intended for repair.
Note: The diagrams on this page are automatically drawn from annotations in the scripts. See Drawing Automation
# Index
We poll sites for updates made visible in their sitemaps. New and revised pages are scraped for content of interest including new sites to be scanned. enlarge
sites/ fed.wiki.org/ words.txt links.txt sites.txt items.txt pages/ how-to-wiki/ words.txt links.txt sites.txt items.txt
# Query
We evaluate queries agains the index that are composed in a stand-alone web interface returning html and offered through the wiki Search plugin formatting results returned in json. enlarge
ALL WORDS INPUT
# Status
We maintain a privileged view into activity throughout the federation. We share this by enumerating sites found to be active over the last week. enlarge
ROSTER search.fed.wiki.org:3030/recent-activity
# Debug
We report conditions that may deserve administrative attention with respect to the search machinery itself or with the sites it watches. enlarge
http://ward.dojo.fed.wiki/assets/pages/search-index-logs/sites-indexed.html HEIGHT 200