Recommender Systems - Then and Now

My professional interest is in applying recommenders to travel planning and reservation. So it is natural that I’ll keep up with books on both technical implementation and those that dissect the cultural influences of recommendation systems.

After finishing the book Computing Taste , which charted the change of recommendation systems in the music/movie industry, I want to extend what was discussed in the book to changes in software development paradigm, as well as societal views on mistakes or criteria for finished products.

Previous generation of recommenders

The first generation recommendation systems came to life when bandwidth was precious and scrolling or loading content took a long time. So cutting down on unnecessary choices was touted as user friendly. These recommenders aimed first to be accurate in predicting what the user would like, and push these items up front to reduce overloading users. The dominant KPI was minimizing RMSE between predicted rating vs. actual rating from a user, using static rating data that were computed offline.

Even when a user hasn’t rated many items, by looking at how similar a user was with others who have rated the same items with similar ratings (“neighbors”) , it could predict what other items the user would like from his/her neighbors’ favorites. For that to happen, it has to rely on explicit rating. This could be very passive as you need the user to rate before you have data to act upon. If the user kept watching but didn’t express any sentiment (e.g., when content was delivered offline), no prediction could be done.

We could see that in this era, error was avoided at all cost by extensive computation and optimization. Furthermore, the system acted like a curator or cultural intermediaries who behaved like tastemakers. For example, record label, radio, DJ who imposed their good taste on users, telling others what to and not to consume.

Current generation of recommenders

The current recommenders aim to provoke reaction, then measure and tweak its next action. Since bandwidth has become a commodity, the grand vision in 1st generation recommender of saving expensive bandwidth to serve only a curated set of content is not valid anymore. Rather, the only objective now is to entice the user to stay longer.

It relies on measuring implicit behavior, which is more active as you don’t need to wait for an explicit rating. You can just measure as the user takes any action. How long a user watches a film, where does he skip forward and backward, what he searches for and browses, what he does with the recommended items, how many times he plays an item, genres he spends the most time in, means more than what he rates. In this sense, a user’s action is hailed as the golden truth, something that is more reliable than what he expresses. As each user action is a piece of feedback and there are many actions in a second, the feedback cycle is much shortened. Compared with earlier recommenders, the current generation ones have a much larger dataset to analyzed with to make iterated guesses.

The strong emphasis to minimize error is also irrelevant now. Rather, error is to be embraced as when there is no feedback and potential error, the system can never learn about a user. The shorter feedback cycle also makes iterated learning possible. The system can overcome the cold start problem (new context, new user with no data) with pre-seeded content and then observe how the user responds before coming up with a measured guess for its next move. Each experiment and user reaction feeds to the next experiment that helps to refine the recommender until it becomes more precise.

Underlying driving factors

The change in the ultimate goal of recommenders from reducing overload to retaining users also echos the change in society. We are gradually moving away from a fear of mistake or error in favor of viewing mistakes as a way to probe and learn, highlighted by the growth mindset. We also observe some engineering paradigms become common household words, e.g., agile , PDCA and lean find their inroads into other domains and become common practice.

This also happens due to business model changes. When Netfilix transitioned from a DVD rental company that could only collect offline data to an online streaming service, the data it could collect was exponentially larger and more meaningful in measuring user behaviors. Accurate prediction become less important than keeping user active online.

Software development cycle also has a paradigm shift from waterfall model to agile, reducing both the time and emphasis on being 100% accurate in planning. Building hypothesis and validating it with the flexibility to quickly change course becomes more common. With more powerful and cheaper computation power, modern recommenders could be updated more often, further enabled by agile software development lifecycle.

Other parallels

On learning

It is not necessary to be accurate at first. Just do something and any reaction can be used to learn and tweak. In fact with uncertainty on the rise, we will never have enough information to be “right” before making a move. We have to be humble enough to admit that we don’t know everything in the beginning (cold start problem) but still act. What matters is the signal and reaction you receive and how you experiment with them.

Error is essential to growth and is not something to be afraid of. Immediate feedback is prominent fuel for any forward move. If there is no feedback, we don’t even know if we are heading in the right direction or not. With any feedback, we can validate our hypothesis, learn from it and decide on the next action. We can keep probing and guessing until we approach more accurate results.

We can see that this is the exact opposite of perfectionism . We never claim that we have the final answer, but rather keep iterating towards truth.

On products and services

We also observe similar patterns in products. Not too long ago, we bought something with the expectation that it shipped in a completed format and would last a long time, without any necessary update. Now we often buy something expecting it to last only a few years or with some updates (via accessories or software) to enhance its functionalities. Getting a half-baked product is sometimes the norm. For example, during the pandemic with supply chain issues, car manufacturers would let customers take delivery first and then came back later for non-critical software or hardware update when the parts became available.

In this sense, products and services are always beta, as a work in progress (WIP). Products and services are tested against hypothesis such as 7-11 Japan’s daily hypothesis in stock replenishment. There is no need to wait to deliver everything in a package, you can deliver core functionalities first and add on later.

On people’s implicit vs explicit behaviors

The shift in putting more trust in people’s implicit behaviors vs explicit rating also correlates with voting polls phenomenon, in which implicit political attitude matters as much , or sometimes higher than explicit attitude.