Put a battery-operated talking unicorn in your online shopping cart, and you may get an alert suggested some AA cells to juice up the conversation. Binge-watch a few action movies and you may see titles of martial arts cinema fill up your must-see list.
Recommendation engines try to discern habits, likes and other affinity traits to anticipate what you may need or want based on past actions. News consumption can fall into these patterns: We know for instance that people go to search engines to find out more about a story, be it more background or further developments. Certain types of news and feature stories lend itself to typical user behaviors: An impending hurricane, for instance, triggers preparation research (if you’re in the path), offers of help to donate supplies or blood (if you’re nearby) and historical curiosity of past hurricanes. Even celebrity stories inspire certain common impulses: An engagement announcement will launch some to seek a peek at the ring, others to check out past (failed) relationships.
How might a news recommendation system offer more stories, yet not fall into the trap of filter bubbles and echo chambers? The first thing is, you need a high-quality benchmark dataset. That’s where MIND comes in: a mammoth collection of anonymized data from user behavior logs of about 1 million people. Few companies in the world attract those kinds of numbers, and Microsoft News is one of them.
When you work at the scale that Microsoft News does — in 140 countries around the world — the challenge is not to overload its half billion readers. In the not-so-distant past, newspaper print space and radio & TV time constrained how news was reported, displayed and ranked. When the Internet smashed conventions of information delivery, audiences had to assume the responsibility for their own news diet. That diet can be hard to maintain when thousands of news stories come at people every day — especially when you throw in social media.
The second thing is, issue a healthy competition to the science and news world: Go to the Microsoft Research program competition page and find out more about how to dive into this dataset and come up with ways to rank news articles that align with users’ interests. And just to make it interesting, #cashprizes: Grand prize is $10,000, with two second-place prizes at $3,000 and four fourth-place prizes at $1,000.
The dataset is free to download for research purposes on MIND Website, and baseline algorithms are available on Microsoft Recommenders repository. There are two offerings of the dataset: the whole kit’n’caboodle and a smaller subset of 50,000 users called MIND-small— but the goals the Microsoft data scientists aren’t small-minded at all