Survey of multi­armed bandit algorithms applied to recommendation systems

Elena Gangan; Milos Kudus; Eugene Ilyushin

Survey of multiarmed bandit algorithms applied to recommendation systems

Elena Gangan, Milos Kudus, Eugene Ilyushin

Abstract

The main goal of this paper is to introduce the reader to the multiarmed bandit algorithms of different types and to observe how the industry leveraged them in advancing recommendation systems. We present the current state of the art in RecSys and then explain what multiarmed bandits can bring to the table. First, we present the formalization of the multiarmed bandit problem and show the most common bandit algorithms such as upper confidence bound, Thompson Sampling, epsilon greedy, EXP3. Expanding on that knowledge, we review some important contextual bandits and present their benefits and usage in different applications. In this setting, context means side information about the users or the items of the problem. We also survey various techniques in multiarmed bandits that make bandits fit to the task of recommendations better; namely we consider bandits with multiple plays, multiple objective optimizations, clustering and collaborative filtering approaches. We also assess bandit backed recommendation systems implemented in the industry at Spotify, Netflix, Yahoo and others. At the same time we discuss methods of bandit evaluation and present an empirical evaluation of some notorious algorithms. We conduct short experiments on 2 datasets to show how different policies compare to each other and observe the importance of parameter tuning. This paper is a survey of the multi armed bandit algorithms and their applications to recommendation systems.

Full Text:

PDF

References

J. Kawale and E. Chow, “A MultiArmed Bandit Framework For Recommendations at Netflix,” 2018, library Catalog: SlideShare.

J. McInerney, B. Lacker, S. Hansen, K. Higley, H. Bouchard, A. Gruson, and R. Mehrotra, “Explore, Exploit, and Explain: Personalizing Explainable recommendations with Bandits,” in RecSys ’18: Proceedings of the 12th ACM Conference on Recommender Systems, 2018, pp. 31–39.

R. Mehrotra, N. Xue, and M. Lalmas, “Bandit based Optimization of Multiple Objectives on a Music Streaming Platform,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event CA USA: ACM, Aug. 2020, pp. 3224–3233. [Online]. Available: https://dl.acm.org/doi/10.1145/3394486.3403374

L. Li, W. Chu, J. Langford, and R. E. Schapire, “A ContextualBandit Approach to Personalized News Article Recommendation,” Proceedings of the 19th international conference on World wide web WWW ’10, p. 661, 2010, arXiv: 1003.0146. [Online]. Available: http://arxiv.org/abs/1003.0146

Y. Mao, M. Chen, A. Wagle, M. Natkovich, J. Pan, and D. Matheson, “A Batched MultiArmed Bandit Approach to News Headline Testing,” in IEEE International Conference on Big Data (Big Data). Seattle, WA, USA, USA: IEEE, 2018.

Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” vol. 42, no. 8, p. 30–37, Aug. 2009. [Online]. Available: https://doi.org/10.1109/MC.2009.263

X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.S. Chua, “Neural collaborative filtering,” in Proceedings of the 26th International Conference on World Wide Web, ser. WWW ’17. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee, 2017, p. 173–182. [Online]. Available: https://doi.org/10.1145/3038912.3052569

W. Sun, X. Zhang, W. Liang, and Z. He, “High dimensional explicit feature biased matrix factorization recommendation,” in Trends and Applications in Knowledge Discovery and Data Mining, X.L. Li, T. Cao, E.P. Lim, Z.H. Zhou, T.B. Ho, and D. Cheung, Eds. Springer International Publishing, 2015, vol. 9441, pp. 66–77, series Title: Lecture Notes in Computer Science.

O. Alieva, E. Gangan, E. Ilyushin, and A. Kachalin, “Automatic evaluation of recommendation models,” International scientific journal «Modern Information Technologies and ITEducation», vol. 16, no. 2, 2020. [Online]. Available: http://sitito.cs.msu.ru/index. php/SITITO/article/view/656

S. Li, A. Karatzoglou, and C. Gentile, “Collaborative Filtering Bandits,” in SIGIR ’16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy, 2016, pp. 539–548.

S. Pandey, D. Chakrabarti, and D. Agarwal, “Multiarmed bandit problems with dependent arms,” in Proceedings of the 24th international conference on Machine learning ICML ’07. Corvalis, Oregon: ACM Press, 2007, pp. 721–728. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1273496.1273587

S. Caron and S. Bhagat, “Mixing bandits: a recipe for improved coldstart recommendations in a social network,” 08 2013.

H. T. Nguyen and A. KofodPetersen, “Using Multiarmed Bandit to Solve ColdStart Problems in Recommender Systems at Telco,” in Mining Intelligence and Knowledge Exploration, R. Prasath, P. O’Reilly, and T. Kathirvalavakumar, Eds. Cham: Springer International Publishing, 2014, vol. 8891, pp. 21–30, series Title: Lecture Notes in Computer Science. [Online]. Available: http://link.springer.com/10.1007/97833191381763

C. Z. Felício, K. V. Paixão, C. A. Barcelos, and P. Preux, “A MultiArmed Bandit Model Selection for ColdStart User Recommendation,”

in Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. Bratislava Slovakia: ACM, Jul. 2017, pp. 32–40. [Online]. Available: https://dl.acm.org/doi/10.1145/3079628.3079681

A. Slivkins, “Introduction to multiarmed bandits,” Foundations and Trends® in Machine Learning, vol. 12, no. 12, pp. 1–286, 2019. [Online]. Available: http://dx.doi.org/10.1561/2200000068

S. Bubeck and A. Slivkins, “The best of both worlds: Stochastic and adversarial bandits,” Journal of Machine Learning Research, vol. 23, 02 2012.

T. Lattimore and C. Szepesvári, Bandit Algorithms. Cambridge University Press, 2020.

J. M. White, Bandit Algorithms for Website Optimization. O’Reilly Media, Inc. [Online]. Available: https://www.oreilly.com/library/view/ banditalgorithmsfor/9781449341565/

P. Auer and N. CesaBianchi, “Finitetime Analysis of the Multiarmed Bandit Problem,” Machine Learning 47, p. 22, 2002.

O. Chapelle and L. Li, “An Empirical Evaluation of Thompson Sampling,” in Processing Advances in Neural Information Systems, vol. 24, 2011, pp. 2249–2257. [Online]. Available: http://papers.nips.cc/paper/4321anempiricalevaluationofthompsonsampling.pdf

A. Gopalan, S. Mannor, and Y. Mansour, “Thompson Sampling for Complex Online Problems,” in Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014.

S. Agrawal and N. Goyal, “Analysis of thompson sampling for the multiarmed bandit problem,” ser. Proceedings of Machine Learning Research, S. Mannor, N. Srebro, and R. C. Williamson, Eds., vol. 23. Edinburgh, Scotland: JMLR Workshop and Conference Proceedings, 25–27 Jun 2012, pp. 39.1–39.26. [Online]. Available: http://proceedings.mlr.press/v23/agrawal12.html

D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, “A tutorial on thompson sampling,” Found. Trends Mach. Learn., vol. 11, no. 1, p. 1–96, Jul. 2018. [Online]. Available: https://doi.org/10.1561/2200000070

V. Kuleshov and D. Precup, “Algorithms for multiarmed bandit problems,” Journal of Machine Learning Research, vol. 1, 02 2014.

P. Auer, N. CesaBianchi, Y. Freund, and R. E. Schapire, “The Nonstochastic Multiarmed Bandit Problem,” SIAM Journal on Computing, vol. 32, no. 1, pp. 48–77, Jan. 2002. [Online]. Available: http://epubs.siam.org/doi/10.1137/S0097539701398375

J. Langford and T. Zhang, “The epochgreedy algorithm for multiarmed bandits with side information,” in Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. Curran Associates, Inc., 2008, pp. 817–824. [Online]. Available: http://papers.nips.cc/paper/3178theepochgreedyalgorithmformultiarmedbanditswithsideinformation.pdf

D. Cortes, “Adapting multiarmed bandits policies to contextual bandits scenarios,” arXiv:1811.04383v2, 2019.

L. Zhou, “A survey on contextual multiarmed bandits,” ArXiv, vol. abs/1508.03326, 2015.

W. Chu, L. Li, L. Reyzin, and R. Schapire, “Contextual bandits with linear payoff functions,” ser. Proceedings of Machine Learning Research, G. Gordon, D. Dunson, and M. Dudík, Eds., vol. 15. Fort Lauderdale, FL, USA: JMLR Workshop and Conference Proceedings, 11–13 Apr 2011, pp. 208–214. [Online]. Available: http://proceedings.mlr.press/v15/chu11a.html

L. Zhou and E. Brunskill, “Latent Contextual Bandits and their Application to Personalized Recommendations for New User,” arXiv:1604.06743 [cs.LG], 2016.

S. LI, B. Wang, S. Zhang, and W. Chen, “Contextual Combinatorial cascading Bandits,” in Proceedings of The 33rd International Conference on Machine Learning, vol. 48, 2016.

X. Xu, F. Dong, Y. Li, S. He, and X. Li, “ContextualBandit Based Personalized Recommendation with TimeVarying User Interests,” arXiv:2003.00359v1 [cs.LG], 2020.

B. Hao, Y. Abbasi Yadkori, Z. Wen, and G. Cheng, “Bootstrapping upper confidence bound,” in Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019. [Online]. Available: http:// papers.nips.cc/paper/9382bootstrappingupperconfidencebound.pdf

D. Zhou, L. Li, and Q. Gu, “Neural Contextual Bandits with UCBbased Exploration,” in Proceedings of the 37 th International Conference on Machine Learning, Vienna, Austria, 2020.

S. Agrawal and N. Goyal, “Thompson Sampling for Contextual Bandits with Linear Payoffs,” in Proceedings of the 30 th International Conference on Machine Learning, vol. 28, Atlanta, Georgia, 2013.

I. Osband and B. Roy, "Bootstrapped thompson sampling and deep exploration," 07 2015.

L. Tang, Y. Jiang, L. Li, C. Zeng, and T. Li, “Personalized Recommendation via ParameterFree Contextual Bandit,” in SIGIR ’15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015, pp. 323–332.

L. Li, W. Chu, J. Langford, and X. Wang, “Unbiased Offline Evaluation of Contextualbanditbased news Article Recommendation Algorithms,” WSDM’11, 2011. [Online]. Available: https://arxiv.org/pdf/1003.5956.pdf

C. Riquelme, G. Tucker, and J. Snoek, “Deep bayesian bandits showdown,” in International Conference on Learning Representations, Vancouver Canada, 2018.

M. Dudík, D. J. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, and T. Zhang, “Efficient optimal learning for contextual bandits,” ArXiv, vol. abs/1106.2369, 2011.

L. Reyzin, “New algorithms for contextual bandits,” 2012. [Online]. Available: http://www.levreyzin.com/presentations/BGU2010.pdf

A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. E. Schapire, “Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits,” in Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014.

T. Uchiya, A. Nakamura, and M. Kudo, “Algorithms for adversarial bandit problems with multiple plays,” in Algorithmic Learning Theory, M. Hutter, F. Stephan, V. Vovk, and T. Zeugmann, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 375–389.

J. Louedec, M. Chevalier, J. Mothe, A. Garivier, and S. Gerchinovitz, “A Multipleplay Bandit Algorithm Applied to Recommender Systems,” Association for the Advancement of Artificial Intelligence, 2015.

Y.H. Chang and H.T. Lin, “Pairwise Regression with Upper Confidence Bound for Contextual Bandit with Multiple Actions,” in Proceedings of the Conference on Technologies and Applications for Artificial Intelligence, 2013, pp. 19–24.

J. Komiyama, J. Honda, and H. Nakagawa, “Optimal regret analysis of thompson sampling in stochastic multiarmed bandit problem with multiple plays,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning Volume 37, ser. ICML’15. JMLR.org, 2015, p. 1152–1161.

R. BusaFekete, B. Szoenyi, P. Weng, and S. Mannor, “Multiobjective Bandits: Optimizing the Generalized Gini Index,” arXiv:1706.04933v1, 2017.

J. A. Weymark, “Generalized Gini inequality indices,” Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), CORE Discussion Papers RP 453, Jan. 1981. [Online]. Available: https://ideas.repec.org/p/cor/louvrp/453.html

S. Yahyaa, M. Drugan, and B. Manderick, “Thompson sampling in the adaptive linear scalarized multi objective multi armed bandit,” vol. 2, 01 2015.

C. Tekin and E. Turgay, “Multiobjective contextual bandits with a dominant objective,” in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), 2017, pp. 1–6.

E. Turgay, D. Öner, and C. Tekin, “Multiobjective contextual bandit problem with similarity information,” 03 2018.

C. Gentile, S. Li, and G. Zappella, “Online Clustering of Bandits,” in Proceedings of the 31 st International Conference on Machine Learning, vol. 34, Beijing, China, 2017.

T. Nguyen, “Dynamic clustering of contextual multiarmed bandits. in proceedings of the 23rd acm international conference on conference on information and knowledge management (cikm ’14),” 11 2014.

H. Wang, Q. Wu, and H. Wang, “Factorization bandits for interactive recommendation,” in AAAI’17: Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence, 2017, pp. 2695–2702.

J. Kawale, H. H. Bui, B. Kveton, L. TranThanh, and S. Chawla, “Efficient Thompson Sampling for Online MatrixFactorization Recommendation,” in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 2015, pp. 1297–1305. [Online]. Available: http://papers.nips.cc/paper/5985efficientthompsonsamplingforonlinematrixfactorizationrecommendation.pdf

D. Chakrabarti, R. Kumar, E. Upfal, and F. Radlinski, “Mortal MultiArmed Bandits,” in Advances in Neural Information Processing Systems 21, Proceedings of the TwentySecond Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2008.

C. Zeng, Q. Wang, S. Mokhtari, and T. Li, “Online ContextAware Recommendation with Time Varying MultiArmed Bandit,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco California USA: ACM, Aug. 2016, pp. 2025–2034. [Online]. Available: https://dl.acm.org/doi/10.1145/2939672.2939878

R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits and Systems Magazine, vol. 6, no. 3, pp. 21–45, 2006.

B. Brodén, M. Hammar, D. Paraschakis, and B. J. Nilsson, “Ensemble Recommendations via Thompson Sampling: an Experimental Study within eCommerce,” in IUI ’18: 23rd International Conference on Intelligent User Interfaces, 2018, pp. 18–29.

D. Bouneffouf, R. Laroche, T. Urvoy, R. Feraud, and A. Robin, “Contextual bandit for active learning: Active thompson sampling,” 11 2014.

C.C. Hsieh, J. Neufeld, T. King, and J. Cho, “Efficient approximate thompson sampling for search query recommendation,” ser. SAC ’15. New York, NY, USA: Association for Computing Machinery, 2015, p. 740–746. [Online]. Available: https://doi.org/10.1145/2695664.2695748

M. Lopes, B. Clement, D. Roy, and P.Y. Oudeyer, “Multiarmed bandits for intelligent tutoring systems,” 10 2013.

J. Mary, P. Preux, and O. Nicol, “Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques,” ArXiv, vol. abs/1405.3536, 2014.

A. Beygelzimer and J. Langford, “The offset tree for learning with partial labels,” ser. KDD ’09. New York, NY, USA: Association for Computing Machinery, 2009, p. 129–138. [Online]. Available: https://doi.org/10.1145/1557019.1557040

M. Dudík, D. Erhan, J. Langford, and L. Li, “Doubly robust policy evaluation and optimization,” Statist. Sci., vol. 29, no. 4, pp. 485–511, 11 2014. [Online]. Available: https://doi.org/10.1214/14STS500

S. A. M. M. Y. N. Saito, Yuta, “A largescale open dataset for bandit algorithms,” arXiv preprint arXiv:2008.07146, 2020.

D. N. Hill, H. Nassif, Y. Liu, A. Iyer, and S. Vishwanathan, “An Efficient Bandit Algorithm for Realtime Multivariate Optimization,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax NS Canada: ACM, Aug. 2017, pp. 1813–1821. [Online]. Available: https://dl.acm.org/doi/10.1145/3097983.3098184

A. Chandrashekar, F. Amat, J. Basilico, and T. Jebara. Artwork personalization at netflix. [Online]. Available: https://netflixtechblog. com/artworkpersonalizationc589f074ad76

M. Hejazinia, K. Eastman, S. Ye, A. Amirabadi, and R. Divvela, “Accelerated learning from recommender systems using multiarmed

bandit,” arXiv:1908.06158 [cs], Aug. 2019, arXiv: 1908.06158. [Online]. Available: http://arxiv.org/abs/1908.06158

D. Precup, R. S. Sutton, and S. P. Singh, “Eligibility traces for offpolicy policy evaluation,” in Proceedings of the Seventeenth International Conference on Machine Learning, ser. ICML ’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, p.759–766.

W. Bendada, G. Salha, and T. Bontempelli, “Carousel personalization in music streaming apps with contextual bandits,” in Fourteenth ACM Conference on Recommender Systems, ser. RecSys ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 420–425. [Online]. Available: https://doi.org/10.1145/3383313.3412217

L. Tang, Y. Jiang, L. Li, C. Zeng, and T. Li, “Personalized recommendation via parameterfree contextual bandits,” in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’15. New York, NY, USA: Association for Computing Machinery, 2015, p. 323–332. [Online]. Available: https://doi.org/10.1145/2766462.2767707

C. Zeng, Q. Wang, S. Mokhtari, and T. Li, “Online contextaware recommendation with time varying multiarmed bandit,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 2025–2034. [Online]. Available: https://doi.org/10.1145/2939672.2939878

D. Abensur, I. Balashov, S. Bar, I. Orlov, R. Lempel, and N. Moscovici, “Productization Challenges of Contextual MultiArmed Bandits,” arXiv:1907.04884v1 [cs.IR], 2019.

Refbacks

There are currently no refbacks.

Abava Кибербезопасность Monetec 2026 СНЭ

ISSN: 2307-8162

International Journal of Open Information Technologies