As a follow-up/spin-off to my work with Federated Wiki Feeds I'd like to create a new free service.
My service would at its core be a tool that intelligently crawls and mirrors the publically available json and asset files from all federated wikis.
As an extension of this process, I would provide interesting and hopefully useful services for fedwiki users.
## Backups
Backups would be one of these services. Since I'd be crawling and mirroring sites. A user could access the latest copy in different ways.
1. Download a tar/zip file of what I've copied. Ideally in a format that can be dropped into the wiki folder of a new installation of fedwiki. 2. Git/Github mirror. I haven't decided how this would work yet. Ward brought up valid questions in Volunteer Backup to Github. 3. Live mirror. If possible I'd like to provide mirrored wikis, either by default or by request. Most likely the domain name of the original site would become the subdomain of my archive domain. This would make it easy for people to recover individual pages.
## Notifications
Another service I think would be interesting would be notifications.
Examples of notifications you could sign up for:
1. Someone has forked a page from your wiki 2. Somone has updated a page that you've forked 3. Someone has referenced a page on your wiki 4. Someone has included your wiki in a roster 5. A wiki has gone offline/online
## Statistics
I would also hopefully be able to extract interesting insights and surface interesting content.
1. Most forked/referenced pages 2. Most forked/referenced wikis 3. Graph of fediverse connections 4. Most active pages 5. Most active wikis
## API
Since I already created an app Federated Wiki Feeds that crawls a subset of fedwiki files, it makes sense to provide an API from this app.
That way, I could rebuild Federated Wiki Feeds to leverage the new service so it wouldn't have to crawl any wikis itself.
Instead, it could get notifications and content from my API which will be engineered to handle the load.
## Hosting
Most likely I'll build this to run on AWS. That way I can use the various services to make this tool scalable and performant.
1. Elastic Beanstalk for the primary app hosting. This way I can spin up additional workers as needed. 2. Simple Queue Service to coordinate work. Since I'm expecting to potentially grow to multiple workers, I'll need to coordinate the work that needs to be done. 3. S3 and CloudFront for static file hosting. Since a lot of what I'll be doing is hosting/mirroring data, I hope to leverage S3 and CloudFront to host the static files in a way that is fast, inexpensive, and scaleable. 4. DynamoDB for shared data storage across all workers.