TY - GEN
T1 - An ensemble approach for multi-label classification of item click sequences
AU - Yaʇc, A. Murat
AU - Aytekin, Tevfik
AU - Gürgen, Fikret S.
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/9/16
Y1 - 2015/9/16
N2 - In this paper, we describe our approach to RecSys 2015 chal-lenge problem. Given a dataset of item click sessions, the problem is to predict whether a session results in a purchase and which items are purchased if the answer is yes. We define a simpler analogous problem where given an item and its session, we try to predict the probability of purchase for the given item. For each session, the predictions result in a set of purchased items or often an empty set. We apply monthly time windows over the dataset. For each item in a session, we engineer features regarding the session, the item properties, and the time window. Then, a balanced random forest classifier is trained to perform pre-dictions on the test set. The dataset is particularly challenging due to privacy-preserving definition of a session, the class imbalance prob-lem, and the volume of data. We report our findings with re-spect to feature engineering, the choice of sampling schemes, and classifier ensembles. Experimental results together with benefits and shortcomings of the proposed approach are dis-cussed. The solution is efficient and practical in commodity computers.
AB - In this paper, we describe our approach to RecSys 2015 chal-lenge problem. Given a dataset of item click sessions, the problem is to predict whether a session results in a purchase and which items are purchased if the answer is yes. We define a simpler analogous problem where given an item and its session, we try to predict the probability of purchase for the given item. For each session, the predictions result in a set of purchased items or often an empty set. We apply monthly time windows over the dataset. For each item in a session, we engineer features regarding the session, the item properties, and the time window. Then, a balanced random forest classifier is trained to perform pre-dictions on the test set. The dataset is particularly challenging due to privacy-preserving definition of a session, the class imbalance prob-lem, and the volume of data. We report our findings with re-spect to feature engineering, the choice of sampling schemes, and classifier ensembles. Experimental results together with benefits and shortcomings of the proposed approach are dis-cussed. The solution is efficient and practical in commodity computers.
KW - Recommender systems
KW - Sequence classification
KW - Web mining
UR - http://www.scopus.com/inward/record.url?scp=84960924553&partnerID=8YFLogxK
U2 - 10.1145/2813448.2813516
DO - 10.1145/2813448.2813516
M3 - Conference contribution
AN - SCOPUS:84960924553
T3 - Proceedings of the International ACM Recommender Systems Challenge 2015
BT - Proceedings of the International ACM Recommender Systems Challenge 2015
PB - Association for Computing Machinery, Inc
T2 - International ACM Recommender Systems Challenge, RecSys 2015
Y2 - 16 September 2015
ER -