What is the cold start problem?

The cold start problem occurs in recommender systems (RSs) where a new item/user/community does not have sufficient information for the RS to be able to generate a plausible recommendation. Collen Burton gives a concise explanation of what an RS is and how it may be applied in business on the following link https://www.youtube.com/watch?v=uuWjBmxbxs4.

Why is it important to business?

As much as this is becoming a cliché, the customer experience remains critical for a business. A business can lose a significant number of new users due to an inaccurate recommendation at the first phase of user interaction as the RS will not have gathered enough data about this new user to make any accurate recommendations.

Incompetent solving of a cold start problem in RSs inevitably leads to a confused business model. Hence, there is a need to circumvent this problem by modelling kinship features that will improve the accuracy of recommendations for new users and items/products.

For instance, when a user is registered on the system, as the RS doesn’t have enough information about the user, its recommendations are most likely to be poor, and the user’s expectations of the system will not be met. Now, consider a solution where the system recommends with a certain level of confidence, i.e. before the system recommends an item, it calculates the level of confidence for that item to be recommended.

So, we would have solved the problem of recommending invalid items to users. But the system would eventually get a pool of users that never get recommendations and this effectively does not allow the system to mature in terms of its prediction/recommendation accuracy. There is therefore a need to find better ways to tackle this problem.

Candidate solutions for RSs

Collaborative filtering (CF)

CF RSs select items based on subjective information that other users associate with them. Users share information and views about particular items, assigning scores that serve as a reference benchmarks to other users [1].

Content-based filtering (CBF)

CBF RSs analyse a set of documents and/or descriptions of items previously evaluated by the user to build a user model or profile and, subsequently, the user model/profile is used by the RS to make recommendations for new users/items in future [1].

Demographic filtering (DF)

DF RSs use attributes such as age, gender, occupation, and educational level to formulate a demographic profile of the user, and different recommendations are generated for each different demographic niche [1].

Of the three, CF tends to work the best, even though the common approach is to use a hybrid of the candidate solutions to attain better precision and accuracy in recommendations. A hybrid approach can be computationally heavy and requires better hardware to back it up. With that said, a hybrid approach with the abovementioned techniques only improves recommendation accuracy but does not solve the cold start problem.

Candidate solutions for cold start problem in RSs

Pereira and Hruschka (2015) proposed a hybrid solution that combines CF and DF recommendations and implements an iterative divide-and-conquer approach, interleaving clustering and learning tasks to construct prediction models using an algorithm known as simultaneous co-clustering and learning (SCOAL). SCOAL predicts better than the traditional approaches albeit at a higher computational cost [1].

Jesus Bobadilla et al (2013) suggested establishing a better similarity measure to mitigate the new user cold start problem for RS [2]. They use neural learning with neural networks to optimize the similarity measure designed for CF-based RSs.

Hyung Jun Ahn (2007) put forward a new similarity measure, proximity impact popularity (PIP) [3]. Hyung Jun Ahn shows that the underutilization of available information in similarity measures is a consequence of the cold start problem in CF-based systems. PIP is the heuristic used to resolve this issue by following a set of defined goals.

Conclusion

These candidate solutions for RS have been around for quite some time. They alone are not capable of solving the cold start problem as they work well when there is sufficient prior information on which to build the models for recommendations.

It is apparent that at present, the best way to tackle the cold start problem is by taking a hybrid approach to solving the problem. The hybrid approach should ideally consist of at least one of the candidate solutions for RS i.e. CF/CBF/DF with an underlying technique that will identify similarity measure metrics to mitigate the cold start problem. The better the underlying technique, the better RS performs under cold start conditions.

References

[1.] A.L.V. Pereira and E.R. Hruschka, “Simultaneous co-clustering and learning to address the cold-start problem in recommender systems,” Science Direct, vol.  Knowledge based systems, 2015.

 [2.] J. Bobadilla, F. Ortega, A. Hernando and J. Bernal, “A collaborative filtering approach to mitigate the new user cold-start problem,” Science Direct, vol. Knowledge based systems, 2013.

[3] A. H. Jun, “A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem,” Science Direct, 2007.

[4] A. I. Schein, A. Popescul, L. H. Ungar and D. M. Pennock, “Methods And Metrics For Cold-Start Recommendations,” in SIGIR 2002, 2002.

by Mthokozisi Myeza

Related

Featured