How Wikipedia reading habits can successfully predict the spread of disease

by ELAHE IZADI

IMAGE/Karen Bleier/AFP/Getty Images

The ability to forecast the spread of an infectious diseases weeks in advance can make a world of difference when it comes to public health responses. For decades, scientists have been trying to create models to predict how something like the flu will spread.

People’s Internet usage has opened a new door for predictive data. There are already some tools out there, such as Google Trends, which tries to “nowcast,” or show what’s happening right now with the spread of certain diseases in the world. There have been studies, too, on whether Twitter can accurately predict how a disease is spreading.

But getting access to Google Trends or Twitter data is not always easy — or cheap. So a team of mathematicians, biologists and computer scientists got together to see if they could use something that’s completely open and free: Wikipedia.

As it turns out, they could accurately forecast how influenza and dengue spread based purely on people’s reading habits of Wikipedia articles. Last week, they showed how their algorithm could predict flu season in the United States. The full results of their research are published in this week’s PLOS Computational Biology.

Researchers looked at seven diseases and 11 countries over a period of three years, starting in 2010, and compared page views on Wikipedia articles about those diseases to official data from health ministries. By looking at readers’ habits, they successfully predicted the spreads of influenza in the United States, Poland, Thailand and Japan and dengue in Brazil and Thailand at least 28 days in advance.

Washington Post for more