Wiki Spam Solutions

There have been a number of proposals for dealing with Wiki Spam. This page compares the various proposals. The requirements section has been distilled from arguments used against these solutions in the past.


Requirements

Here are some vague requirements to help guide us in finding a good solution. The order is arbitrary.

01 Should deal with first-time spammers

02 Should not require spammers to pay attention

03 Should not require spammers to understand English

04 Should deal with human spammers

05 Should deal with robot spammers

06 Should not annoy readers (leading to loss of readership)

07 Should not annoy authors (leading to loss of contributions)

08 Should deal with bulk spam

09 Should deal with Submarine Spam

10 Should not invalidate any current content

11 Should not be too different from what we have today

12 Should not be too hard to implement

13 Should be resistant to abuse (for example, in Edit Wars)

14 Should not require demanding ongoing maintenance


Solution comparison

Dashes mean that it probably meets the requirement; X's means that it probably doesn't meet the requirement.

02 03 04 05 06 07 08 09 10 11 12 13 14

-- xx -- -- -- -- -- -- -- -- -- -- -- -- DelayedIndexing -- xx -- -- -- -- -- -- -- -- -- -- -- -- NoFollow -- -- -- -- -- -- -- -- xx -- -- -- -- -- WikiSpamBlocker -- -- -- -- -- -- -- -- xx xx -- -- -- -- StatisticalFilter xx -- -- -- -- -- -- -- -- -- -- -- -- xx BannedContentBot -- -- -- xx -- -- -- -- xx -- -- -- -- -- LanguageFilter -- xx -- -- -- -- -- -- -- -- -- -- -- -- RedirectExternalLinks -- xx -- -- -- -- -- xx -- -- -- -- -- -- ExternalLinkArea -- xx xx -- xx -- -- -- -- -- -- -- -- -- WarningMessage''''''s -- xx xx -- xx -- -- -- -- -- -- -- -- -- SpamHereOnly -- -- -- xx -- -- xx -- xx xx -- -- -- -- ShotgunSpam -- xx -- -- -- xx -- -- -- -- -- -- -- -- LetsInsulateOurselves -- xx -- -- -- xx -- -- -- -- -- -- -- -- StopAutoLinking -- -- -- -- xx -- xx -- xx -- -- -- -- -- VolumeLimitedEdits xx -- -- -- -- -- -- -- xx -- -- -- -- xx EditThrottling -- -- -- xx -- -- xx -- -- -- -- -- -- -- HumanVerification -- -- -- -- -- xx xx -- -- xx -- -- -- -- RejectEdits xx -- -- -- -- -- xx -- -- -- -- xx -- -- EditsRequireKarma -- -- -- xx -- -- xx -- -- -- -- -- -- xx UserLogin''''''s xx -- -- -- -- -- -- -- -- -- -- -- xx xx SpamBlackList -- -- -- -- -- -- -- -- -- -- -- xx xx -- PeerToPeerBanList -- -- -- xx -- -- xx -- -- -- -- xx -- -- EditsRequireJavaScript


Proposed solutions

02 Spammers will probably not notice

02 See No Follow on Meatball Wiki for a thorough analysis

09 Doesn't address well-concealed submarine spam

StatisticalFilter

09 Intelligent spammers may blend in

10 Some current content may not pass the filter

01 Banned content is likely to be spammer-specific

14 Requires ongoing maintenance

04 Spammers can adapt to the filter

09 Doesn't address submarine spam

02 Has no effect unless the spammers realize what is going on

02 Spammers may not notice

08 Depends on bulk spam containing many links to the same page

02 Spammers may not notice the warning message

03 Spammers may not understand the warning message

05 Robots will not notice the warning message

02 Spammers may not notice the directions

03 Spammers may not understand the directions

05 Robots will not notice the directions

04 Spammers can use multiple IP addresses or pages

This is only a problem if you're basing spam detection on the editing of multiple pages; the simple approach of counting links during an edit submit isn't affected by where the spammer is coming from.

If the spammer uses multiple IPs, he can make multiple repeat edits to the same page. So even if he can't add 1000 links at once, he could still add them 4 at a time.

07 May get in the way of some large refactorings

09 Does not deal with submarine spam

10 If existing pages exceed the new limit, authors are required to clean them up before posting minor edits in the future.

Though that depends on whether we limit the total number of external links, or just the number of added links, and whether it's an absolute number or a links-to-text ratio.

Lets Insulate Ourselves: disable google indexing of wiki

02 Spammers will probably not notice

06 Readers will no longer be able to search the wiki with google

Stop Auto Linking: stop auto-linking external addresses

02 Spammers may not even notice

06 Readers have to copy and paste addresses

Google may still find the links, even if they aren't clickable

05 Robots can make many small edits

07 Slows down major refactorings

09 Does not address submarine spam

01 First-time spam always gets through

09 Doesn't stop non-bulk spam

14 Manual edits still required, just at a slower pace

04 Spammers can answer quizzes too

07 Quiz would be time-consuming for authors

Reject Edits: reject edits containing discernible external addresses

06 Readers have to follow vague instructions to find external resources

07 Authors have to write detailed instructions leading to external resources

10 Invalidates all current external links

01 One-time spammers may get their first spam through (if too much initial karma is granted)

07 One-time authors are discouraged from contributing (if not enough initial karma is granted)

12 May take a lot of work

04 Spammers can create accounts too

07 One-time authors would be discouraged

14 Spammers' logins would have to be banned somehow

01 By definition, doesn't deal with first-time spammers

13 Want to win an Edit War? Just add your opponent to the blacklist

14 Requires ongoing maintenance

12 Would have to be integrated with a number of Wiki Engines before it becomes effective

13 Subject to same abuse as Spam Black List

Edits Require Java Script: only accept edits from users who have configured their browser to run Java Script

04 Human spammers use a normal browser

07 Some authors may prefer to disable Java Script in their browser, or use a browser that cannot run it

12 The Java Script code (and its generation) must be such complicated that it really needs a Java Script interpreter to pass the test (hashcash…)

(05 in the future, spam bots may be able to run Java Script)


While no measure is perfect by itself, most of them help to some kind of spammers. This suggests combined measures will be more effective than using a single solution. For example, on my wiki (although it definitely does not have the attention and Link Share of Wards Wiki) I use combined Link Throttling, Edit Throttling and Volume Limited Edits - together with a system that requires fetching at least part of a page before saving it (this is actually a side effect of the transparent Three Way Merging deployed in the wiki). This has prevented spamming thus far.

At any rate, any spam detection system is IMNSHO better than the "code word" system used here. It has effectively made this wiki a closed medium, for example it took me a long time before I could find the right moment to put this comment here. -- Panu Kalliokoski

The right moment?

See, there was this period of approximately a month (fall 2004 IIRC), when I didn't ever manage to come to the wiki at a time the code word was there. So I concluded, "okay, this is not the place to participate anymore", and left for a long time. After many months, I checked again, and no, the code word wasn't there. So I didn't come back until now.

Not fall 2004, as the code mechanism was introduced about February 2005.


See original on c2.com