After more than a decade of successful growth, Wikipedia continues to defy easy characterization. It receives more than 400 million viewers per month. Close to four million articles grace its web pages in English alone. Volunteers built the entire corpus of text.
This experience suggests that Wikipedia has done something right, but begs the question: Which actions mattered, and which ones were merely incidental? Answering that question is the key to finding general lessons for countless other web sites that aggregate user-generated content.
Many Wikipedians believe that Linus’ Law is an important ingredient in their sauce. Coined by Eric Raymond, this law is less legal precept than slogan—namely, “Given enough eyeballs, all bugs are shallow.”
Few people know that it is actually a pert and terse restatement of a quote from Linus Torvalds, who originally said, “Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.” Raymond’s restatement drops all the qualifiers, vesting the proposition with more certitude and making it more egalitarian by extending it to nonexperts.
Wikipedia’s experience suggests Raymond was onto something. Let’s consider when the Law works and why it sometimes fails at Wikipedia.
What crowds can do
Linus’ Law is infeasible without the web. That much is obvious. The diffusion of the web reduced the costs of assembling the attention of many reviewers, making it feasible to have a crowd focus on the same text.
That doesn’t imply it is feasible for every article to garner attention from a large crowd, however. Articles vary in the number of contributions per day, week, or month, so some have accumulated many contributions over time while others have not. Articles also vary in the type of contributors they attract, and in the outlook of those contributors.
Consider what happens when hordes of readers and monitors do not show up—an article can retain a mistake. Simple probability theory suggests why. If the percentage of knowledgeable readers is low and the number of readers is low, the probability of a knowledgeable reader passing through will be low.
There have been many infamous examples of such failures, such the entry for Margaret Thatcher. For an extended period it said she was a “fictional character.”
The Wikipedia community learned from these incidents, and, to its credit, invented some useful programming tools for those problems, such as watch lists. No longer must contributors check their favorite entries frequently, but, instead, they can check only when necessary—namely, after notification of a change. Moreover, the organization eventually adopted policies aimed at more openness, so entries on controversial issues got scrutiny.
Now, many more entries receive considerable attention and corrections in a timely way. If somebody tried to change any facts on high-profile pages—for whatever reason, whether aimless vandalism, malicious lying, or merely misplaced confusion—an army of readers and monitors can reverse the error quickly, leaving accurate text for the vast majority of readers.
What is the lesson? Given enough inexpensive programming, many more bugs can be addressed. It is not an elegant solution, but it hints at the general insight. Linus’ Law works with the right supplements.
Three questions
Now consider what happens when the eyeballs do show up. I think it takes a great deal more than many eyeballs to implement Linus’ Law with good results.
To appreciate why, consider the best case scenario. What does an article look like when it has objective and verifiable information about an uncontroversial topic? In brief, it accumulates a plethora of facts and commonly accepted knowledge, which editors make presentable.
That is why Wikipedia is just so good with technical information. That is why so many engineers and technologists update and use Wikipedia for aggregating uncontroversial information about designs for software and hardware. It is also why Wikipedia is so good at presenting the random facts from sophomore college history courses next to the detritus of popular culture. In short, that is why Wikipedia makes it possible to find the date Abraham Lincoln died (April 15, 1865), the nanometer node for the most recent CMOS process as standardized by the International Technology Roadmap for Semiconductors (22 nm), or the weight of a standard Reese’s Peanut Butter Cup (0.75 oz in a classic two-cup package).
Linus’ Law says all that ought to happen. Enough attention to such details elicits the relevant objective information and many viewers then correct errors as needed.
That also illustrates the open question. What happens when the information becomes subjective, remains unverified, and concerns controversial matters? As it turns out, supplements to Linus’ Law can help overcome some, but not all, of these challenges.
Lacking what?
First consider subjective information—that is, information that lends itself to many idiosyncratic perspectives. Wikipedia’s community, almost by definition, faces a challenge. Its editors must aggregate subjective perspectives from multiple sources.
Wikipedia’s community has engineered an approach to this situation, aspiring to bring a neutral point of view (NPOV) to every article, giving each significant view a fair representation. To be sure, in many specific instances the editors argue about how to implement NPOV, but so what? Many of those are the same situations where an expert would parse words to represent different perspectives. Wikipedia is no worse.
More to the point, aspiring toward NPOV makes Wikipedia better in some situations. A well-behaved crowd of editors will tend to enumerate most viewpoints. Miniscule storage and transmission costs reduce the cost of listing another view on a web page.
That is why so many of Wikipedia’s political articles are so useful. All sides get a fair hearing. No editor minds adding another 500 words if it diffuses an issue by giving voice to dissent. Think of this result as illustration of another supplement to Linus’ Law. Given enough cheap storage, all viewpoints can be represented.
Next consider verification. Here I believe the experience is more mixed. Many experienced contributors to Wikipedia have become good at manipulating their references to support their view, no matter what it is or how biased it is. It’s possible to manipulate references because it takes work to track down the original source, or correct dead links, or find the truth behind a meaningless citation.
In other words, while many eyeballs can help with poor references, it takes a lot of sophisticated eyeballs to spot a manipulated reference. It also takes work to fix a poor reference—namely, to replace it with something more authoritative or match it against an alternative reference.
Stated succinctly, when verification costs are high, it is too costly to generate enough expert eyeballs. No supplement to Linus’ Law has yet been invented to fix this issue.
Finally, consider controversial topics. Once again, the experience is mixed. An editor’s standard approach to controversy resembles the solution to subjective information. Most editors try to diffuse issues with a fair representation of all viewpoints. That works as long as no party seeks to be the exclusive voice. Consider why: One viewpoint’s gain does not become another’s loss. That reduces most fights to debates about which syntax best represents an idea in an additional paragraph.
Controversy survives for numerous reasons, nonetheless. In some instances, some issues are simply too complex for one additional paragraph to resolve a dispute. Interpreting the science behind global warming can serve as an example. Anybody can verify the same objective data, but getting a consensus about what it all means takes considerable effort and expertise. That will be especially so in the presence of a loud doubter, such as a representative for a stakeholder, or in the presence of anyone loudly raising doubts at the legitimate crevices in the extant scientific debate.
Controversy also survives due to the presence of costly verification and subjective information. That is why pages about the Israeli-Palestinian debates can continue indefinitely, with each side reading historical facts with different interpretations. That is also why the pages on the Armenian genocide and Vietnam War have a similarly unsettled state.
Those observations also suggest a line of causation. Controversy survives and thrives when other supplements to Linus’ Law fail—for example, when merely adding another line does not satisfy all participants.
Having said all that, some amount of controversy is not about the cost of information at all. No amount of review will ever settle some topics, such as government policies for abortion, what Jesus actually meant to say, or which of Mohammed’s grandchildren truly carries the faith today. Relatedly, there seems to be no hope for settling debates about Stalin’s and Hitler’s motives. So it goes.
Summing up, Linus’ Law depends on some economic conditions and quite a few supplements. Eyeballs may be cheap to assemble, but it works better at Wikipedia when objective information is inexpensive to find, or when additional subjective information is cheap to add. It also works better when the cost of verification is low.
It also works best in the presence of civility. Every supplement to Linus’ Law requires editors with the right attitude. Crucially, no side must claim exclusive rights to determine the answer. Even the briefest scan of the globe suggests that such attitudes rarely serve as foundations for large group efforts for long. In that light, Wikipedia’s community deserves kudos for keeping it together for over a decade.
Copyright held by IEEE. To view the original essay click here.
I was under the impression that some of the original Wikipedia pages were taken from an out of copyright Encyclopedia Britannica so it wouldn’t be true that all content is user generated.
I am fairly convinced that Wikipedia’s growth could be faster and more non-linear if it was tied to a for-profit company like Google. Just compare it to Baike Baidu, currently at 4,443,107 articles and growing rapidly – although its IPR approach is much more relaxed and the range of contents more varied.
I am going to Wiki Linus’ Law.