Scholarly publishing

You knew it was coming. Google Scholar cites can be manipulated

google-scholarGoogle Scholar works via algorithm. It examines papers that are hosted in certain domains (usually, publishers and higher education institutions) and then constructs citations based on those papers. As it is easily accessible and also includes citations from unpublished papers, Google Scholar is becoming increasingly popular as a key metric for academic performance.

new paper by Emilio Delgado López-CózarNicolás Robinson-García, and Daniel Torres-Salinas posted to arXiv demonstrates how easy it is to manipulate Google Scholar citations:

The launch of Google Scholar Citations and Google Scholar Metrics may provoke a revolution in the research evaluation field as it places within every researchers reach tools that allow bibliometric measuring. In order to alert the research community over how easily one can manipulate the data and bibliometric indicators offered by Google s products we present an experiment in which we manipulate the Google Citations profiles of a research group through the creation of false documents that cite their documents, and consequently, the journals in which they have published modifying their H index. For this purpose we created six documents authored by a faked author and we uploaded them to a researcher s personal website under the University of Granadas domain. The result of the experiment meant an increase of 774 citations in 129 papers (six citations per paper) increasing the authors and journals H index. We analyse the malicious effect this type of practices can cause to Google Scholar Citations and Google Scholar Metrics. Finally, we conclude with several deliberations over the effects these malpractices may have and the lack of control tools these tools offer

At Scholarly Kitchen, Phil Davis notes that:

Calling on Google to tightly regulate their citation index is a call to deaf ears. Google prefers algorithms over humans, and at this time, it is still very easy to trick an indexing software to think you’ve created an original scholarly document. Moreover, there is no reason why Google, unlike Thomson Reuters, would want to invest huge amount of human resources into fixing their citation indexing problem. Google is in the business of selling advertisements to companies, not metrics to scientific organizations.

It is an interesting question as to whether we should expect Google to be so hands off. After all, they are in the information retrieval business and if the information is being compromised, it would be in their interests to do something about it. And one can imagine certain correlations with published citations might yield some results. In the meantime, faculty at tenure-review or promotion time should have their citations examined under closer scrutiny.

7 thoughts on “You knew it was coming. Google Scholar cites can be manipulated

  1. The problem is really one of weighting citations — are real citations from ones own students in undergraduate essays that much better than fake citations? The recursive weighting procedure used to rank journals could be applied to rank the importance of citations, either using journal weights or article weights. Isn’t this simmilar to how Google currently weights the importance of web pages in producing its search results?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s