ParClick: A Scalable Algorithm for EM-based Click Models

Abstract

Research on click models usually focuses on developing effective approaches to reduce biases in user clicks. However, one of the major drawbacks of existing click models is the lack of scalability. In this work, we tackle the scalability of Expectation-Maximization (EM)-based click models by introducing ParClick, a new parallel algorithm designed by following the Partitioning-Communication-Aggregation-Mapping (PCAM) method. To this end, we first provide a generic formulation of EM-based click models. Then, we design an efficient parallel version of this generic click model following the PCAM approach: we partition user click logs and model parameters into separate tasks, analyze communication among them, and aggregate these tasks to reduce communication overhead. Finally, we provide a scalable, parallel implementation of the proposed design, which maps well on a multi-core machine. Our experiments on the Yandex relevance prediction dataset show that ParClick scales well when increasing the amount of training data and computational resources. In particular, ParClick is 24.7 times faster to train with 40 million search sessions and 40 threads compared to the standard sequential version of the Click Chain Model (CCM) without any degradation in effectiveness.

Publication
WWW ‘22: Proceedings of the ACM Web Conference 2022