Aspect Based Recommendations: Recommending Items with the Most Valuable Aspects Based on User Reviews


1 Aspect Based Recommendations: Recommending Items with the Most Valuable Aspects Based on User Reviews Research Paper Konstantin Bauman Bing Liu Alexander Tuzhilin Stern School of Business University of Illinois Stern School of Business New York University at Chicago (UIC) New York University [email protected] [email protected] [email protected] 1 INTRODUCTION ABSTRACT In this paper, we propose a recommendation technique that not Over the last decade, there has been a great deal of interest in only can recommend items of interest to the user as traditional rec- leveraging user reviews to provide personalized recommendations 6 ]. Much of the work in the area focuses on based on these reviews [ ommendation systems do but also speci€c aspects of consumption of the items to further enhance the user experience with those items. trying to improve estimations of user ratings of items based on the For example, it can recommend the user to go to a speci€c restau- 6 ] and also to explain user reviews and other relevant information [ why some particular recommendations are given to the user based rant (item) and also order some speci€c foods there, e.g., seafood Sentiment Utility on the review information [25]. (an aspect of consumption). Our method is called Logistic Model (SULM). As its name suggests, SULM uses sentiment Œese approaches aimed at predicting and explaining ratings in terms of the user and item characteristics without taking into con- analysis of user reviews. It €rst predicts the sentiment that the user may have about the item based on what he/she might express about sideration of additional factors, such as circumstances and user’s personal choices of consuming the item. For example, consider the the aspects of the item and then identi€es the most valuable aspects of the user’s potential experience with that item. Furthermore, the user choosing between ordering Tiramisu or Cannoli in a cafe. De- pending on what the user chooses to taste during her visit, she can method can recommend items together with those most important aspects over which the user has control and can potentially select give di‚erent ratings to the establishment. Œerefore, user experi- them, such as the time to go to a restaurant, e.g. lunch vs. dinner, ence of a particular item can be further improved by recommending some additional aspects and personal user choices of consuming and what to order there, e.g., seafood. We tested the proposed method on three applications (restaurant, hotel, and beauty&spa) that item, such as ordering Tiramisu in that cafe. Note that not all the aspects of the user experience can be selected and experimentally showed that those users who followed our rec- by the user in order to improve her experience with the item. For ommendations of the most valuable aspects while consuming the example, in case of a movie, such aspects as the plot of the movie or items, had beŠer experiences, as de€ned by the overall rating. the actors are beyond user control, which is in contrast to selecting particular dishes in the aforementioned restaurant example. CCS CONCEPTS In this paper, we focus on the laŠer case of the user-controlled Recommender systems; Sentiment → Information systems • aspects by recommending not only particular items but also the Computing methodologies Factorization methods; → analysis; • most important aspects of consumption controlled by the user, such as ordering Tiramisu or Cannoli in a cafe. Furthermore, we can KEYWORDS recommend certain actions to the management of an establishment Recommender systems, user reviews, sentiment analysis, user ex- (item) that can personalize experiences of the user when consuming perience, aspects of user experience, user-controlled aspects. the item (e.g., visiting the establishment). For example, we may recommend to the management of the spa salon to suggest a com- ACM Reference format: plimentary drink to a user because our method estimated that the Konstantin Bauman, Bing Liu, and Alexander Tuzhilin. 2017. Aspect Based user would particularly appreciate that drink in that salon which Recommendations: Recommending Items with the Most Valuable Aspects would enhance her experience there. Proceedings of KDD ’17, Halifax, NS, Canada, Based on User Reviews. In In this paper, we propose a method that identi€es the most valu- August 13-17, 2017, 9 pages. DOI: 10.1145/3097983.3098170 able aspects of possible user experiences of the items that the user together has not tried yet and recommends the items with sugges- tions to consume those most valuable user-controlled aspects that Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed we have identi€ed to be bene€cial to the user. In particular, we have for pro€t or commercial advantage and that copies bear this notice and the full citation developed the (SULM) that takes Sentiment Utility Logistic Model on the €rst page. Copyrights for components of this work owned by others than the user reviews and ratings, extracts aspects, and classi€es sentiments author(s) must be honored. Abstracting with credit is permiŠed. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci€c permission on the aspects in the user reviews and recommends items together and/or a fee. Request permissions from [email protected] with the most important aspects that may enhance user experience KDD ’17, Halifax, NS, Canada with the items. To achieve this, the model learns how to predict the 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. © $ 978-1-4503-4887-4/17/08. . . 15.00 unknown ratings, sentiments that the user would express about DOI: 10.1145/3097983.3098170

2 KDD ’17, August 13-17, 2017, Halifax, NS, Canada K. Bauman et. al. various aspects of an item, and also identi€es the impacts of these framework that incorporates both user opinions and preferences of aspects on the overall rating of the item. Moreover, we use these di‚erent aspects. In particular, they apply the Tensor Factorization technique to the terms clustered using the LDA approach. Further, estimated impacts to recommend the most valuable aspects to the 28 [ users to enhance their experiences with the recommended items. ] uses LDA topics of terms in order to build user pro€les and SULM thus goes one step further and signi€cantly enhances the €lter the reviews to be shown to the user. functionality of the current recommender systems by providing all Another research stream focuses on exploiting sentiment analy- these additional capabilities to the traditional rating prediction and sis techniques to extract useful aspects from the reviews. In par- ] presents the Explicit Factor Model (EFM) to generate recommendation tasks. ticular, [ 31 recommendations according to the speci€c product aspects and We make the following contributions in this paper: [ ] applies the Tensor Factorization technique to learn the ranking 7 Propose a novel approach to enhancing the functionality of (1) of the user preferences over various aspects of an item. Further, the current recommender systems by recommending not only [ ] applies a vertex ranking approach to the tri-partite graph of 12 the item itself but also the speci€c aspects of consumption to users-items-aspects to provide beŠer recommendations of items further enhance the user experiences of the item. 29 using reviews. Finally, [ ] developed an algorithm to infer the (2) Develop a novel method (SULM) for identifying the most importance of aspects for the overall user opinion on the historical valuable aspects of future user experiences using €ne-grained reviews. Œis method is not able to predict aspect importance for a aspect-level sentiment analysis that automatically discovers new potential review as SULM does. aspects and corresponding sentiments speci€ed by users in Moreover, our work is also related to Context-Aware Recom- their reviews. mender Systems (CARS) [ ]. Note that our aspects may also include 4 (3) Test the proposed approach on actual reviews across three contextual variables of user experiences, but they are not limited real-life applications and show that our method performs only to them. Œere has been much work done in CARS, including well in these applications by providing recommendations , , 15 1 11 the papers dealing with user reviews [ ]. Much of this work of the most valuable aspects that improve user experiences. develops new methods for extracting contextual information from Moreover, we show that the proposed method also predicts user-generated reviews and uses this information to estimate the unknown ratings of user reviews and the set of aspects that ], the authors iden- 1 unknown rating for an item. For example, in [ the user would mention in the reviews. ti€ed sentences in the review that contain contextual information Œe rest of the paper is organized as follows. We discuss the based on classi€cation and text mining techniques. Œey applied related work in Section 2 and present the proposed method in Objective-of-trip their method to a hotel application with the contex- Section 3. Experiments with the three real-life applications are ] proposed a method for extracting tual variable. Œe authors of [ 15 described in Section 4 and the results are presented in Section 5. Companion, Occasion, Time, contextual variables in Location and Section 6 summarizes our €ndings and concludes the paper. ] uses 11 restaurant applications based on NLP techniques. Further, [ Trip- a Labeled-LDA method to categorize hotel reviews by their 2 LITERATURE REVIEW type contextual variable. All these papers showed how to improve rating predictions of items using the extracted contextual variables. Over the last few years, several papers tried to improve estimation 32 ] propose a “context suggestion” system, which is based Finally, [ of unknown ratings of items by extracting useful information from on the collected data about contextual variables but does not work user reviews and leveraging this information to achieve improve- with user reviews as we do. ments in estimation [ 10 ] found ]. For example, the authors of [ 6 Œere is also an extensive literature on multi-criteria rating pre- six aspects in restaurant reviews and trained classi€ers to identify 3 ], where such multi-criteria systems use a small number of diction [ them in the review to improve rating prediction quality. In [ ], 20 ratings of prede€ned aspects (such as food quality, service quality, the authors calculated the sentiment of the whole review and in- and ambiance in restaurant applications) to provide appropriate corporated this information into a Matrix Factorization technique. recommendations of items. In contrast to this type of research, In addition to these direct rating prediction approaches based on we use a wide range of aspects automatically extracted from the user reviews, there are also several proposed methods that predict reviews that change from one review to another and, therefore, we user ratings relying on latent aspects inferred from user reviews. are not limited to the prede€ned €xed set of criteria. For example, 26 In particular, [ ] presents a Latent Aspect Rating Analysis method A x B , x may mention aspects review and review x may men- , ] uses 18 to discover the relative importance of the topical aspects. [ 3 2 1 tion aspects x x , whereas the multi-criteria approach uses the , x , the LDA-based approach combined with Matrix Factorization for 5 4 3 same (usually small) €xed set of aspects across all the reviews. beŠer prediction of unknown ratings. Œey obtained highly inter- In contrast to all the previous works, we not only predict un- pretable textual labels for latent rating dimensions, which helped known ratings of items based on user reviews as done in the prior them to “justify” particular rating values using texts of the reviews. work reviewed above, but also estimate the sentiments that a user ] by using more More recently, [ ], [ 16 ] and [ 27 ] went beyond [ 18 8 would express on various aspects in the review and determine the complicated graphical models to predict unknown ratings based impacts of the aspects on the overall predicted rating of the re- on collaborative €ltering and topic modeling of user reviews. Œeir view about an item. Moreover, we use these estimated impacts models are able to capture interpretable aspects and the sentiments to recommend the most valuable aspects to the users to enhance 22 on each aspect of a review. In [ ] the authors presented Aspect- their experiences with the recommended items. Finally, we not based Latent Factor Model (ALFM) that combines ratings and review only provide recommendations to the users but also recommend ] presents a 28 texts to improve rating predictions. A recent work [

3 Recommending Items with the Most Valuable Aspects Based on User Reviews KDD ’17, August 13-17, 2017, Halifax, NS, Canada valuable aspects of user consumption to the mangers sentiment analysis system. It performs two key functions, aspect that can help extraction and aspect sentiment classi€cation. Aspect extraction them to run their businesses beŠer and provide beŠer services to aims to extract sentiment targets on which some sentiments have their users. been expressed. Œese targets are usually di‚erent aspects of en- In the next section, we present our proposed method. tities (e.g., products or services), which are items in our context. 3 OVERVIEW OF THE METHOD Aspect sentiment classi€cation classi€es whether the sentiment expressed on an aspect is positive, neutral, or negative. For example, In this section, we present the proposed method that identi€es the from the sentence “Œe food is great,” “food” should be extracted most valuable aspects of possible user experiences of the items that as an aspect or target by the aspect extraction sub-system, and the the user has not tried yet and recommend the items together with opinion on “food” should be classi€ed as positive by the aspect sen- those most important user-controlled aspects that we have iden- timent classi€cation sub-system. Œe aspect extraction algorithm ti€ed for the user. In particular, our method consists of sentiment used in Opinion Parser is called Double Propagation (DP) [ 21 ]. It is analysis of user reviews and subsequent training of our model called based on the idea that a sentiment always has a target aspect, and (SULM). Œe SULM model not only Sentiment Utility Logistic Model their expressions in a sentence o‰en have some syntactic relation. predicts the rating of a review but also identi€es the impact of each For example, the target aspect of the sentiment word “great” is aspect on the overall rating. More speci€cally, SULM builds user “food.” A dependency parser can be used to identify the relation for and item pro€les that are used for estimating sentiment utilities extracting “food” given the sentiment word “great.” DP thus works and also the overall rating of the review. As a result, SULM can of seed sentiment words, a bootstrapping as follows: Given a set S be used to provide recommendations of items with suggestions to procedure is performed to extract aspects and also more sentiment experience its most valuable aspects that the model considers to be S , and the resulting aspects and sentiment words can words using bene€cial to the user. be used to extract more iteratively. Œe details of the algorithm Œe proposed SULM model relies on the to be logistic function can be found in [ 21 ]. Aspect sentiment classi€cation is based on a used in Sections 3.2 and 3.3. Here we provide some background set of sentiment expressions (called a sentiment lexicon), grammar information about it before focusing on the model itself. Œe logistic analysis, and context analysis to determine whether a sentence is , function maps real numbers to the interval [0 1] and is de€ned as positive or negative about an aspect. Further details can be found 1 g (1) ( = ) t in [17]. We will report its performances in Section 4.2. t − e + 1 In our study we apply OP to the set of reviews R at the aspect Œe logistic function can be applied to the classi€cation problem as for a given application (e.g., restaurant). OP builds a set of level x we can estimate the probability of vector having one of the class , R ∈ r and for each review R occurring in A aspects OP identi€es a y as } 1 , 0 ∈ { labels and the corresponding sentiment A r occurring in set of aspects r k  ∈ { } o opinions , where 1 is positive (like) and 0 is negative 1 , 0  y θ , x ( f ( g = ) )) θ ; x P ( | = 1 ui  (2) expressed about aspects of item (dislike), that user u A k . i ∈  r θ ( f , x ( ; )) P ( y = 0 | x g − θ ) = 1  We use the identi€ed aspects and sentiments to train our model, x θ where f ( is a set of parameters of the model , θ ) is a function, and as described in the rest of this section. ( ··· + + a · x that we will estimate. Œe linear case of f · x , θ ) = a 0 n 0 ]. Assuming that the training constitutes logistic regression [ x 5 n 3.2 Aspect Sentiments x examples were generated independently, we can write down the Œe Sentiment Utility Logistic Model (SULM) assumes that for each likelihood of the parameters θ as: , user i of the consumed item k aspect sentiment utility can give the u ∏ k ( ( θ ) = p L y | x ; θ ) = of value ∈ R expressing the level of satisfaction with aspect k s j , i u x ∈ X i . Œese utility values are not observable from user reviews. item j (3) ∏ ( ( ) ) y − y ) ( 1 Instead of them, we observe the output of the OP which identi- j j ( θ )) g , θ , x ( f ( g − 1 x . ( · f )) = j j k , } . 0 o €es only the binary value of the expressed sentiment 1 ∈ { ui x ∈ X j k s in such Œerefore, we estimate the real sentiment utility values i , u In order to €nd , we apply Stochastic θ that maximizes L ( ) θ a way that they €t the binary sentiment values extracted from the ] to the log likelihood function, where the Gradient Descent [ 30 reviews a‰er applying the logistic function (1). gradient step is computed based on the partial derivatives: Further, SULM estimates the sentiment utility value for each of ) ( ∂ ∂ using the matrix factorization approach [13] as k the aspects (4) L ( θ f ( x )) , θ = . ) y − g ( f ( x ( , θ )) log · ( j j θ ∂ θ ∂ i i k k k T k k k ˆ θ ) = μ (5) + b ( p + b · + ( q s ) s u u i ui i We will use equation (4) below for training the SULM model. k k k where are the is a constant pertaining to aspect k , b μ and b In the rest of this section, we describe the speci€cs of the pro- u i k k posed method. -dimensional m user’s u and item’s i biases of aspect k , p are q and u i latent vectors corresponding to user and item for aspect k . We 3.1 Extracting Aspect-Sentiment Pairs , ) denote all these coecients by θ . ( μ , B Q , B , P = u s i Further, we estimate parameters θ such that the estimated val- s In this step we utilized the state-of-the-art “industrial-strength” ( ) k k ˆ ˆ €t the real binary senti- ues of sentiments s ( θ ) = g ) o θ ( sentiment analysis system Opinion Parser (OP) to extract aspect s s , i u i u , expressions from review text. OP is an unsupervised aspect-based ments extracted by OP as described above.

4 K. Bauman et. al. KDD ’17, August 13-17, 2017, Halifax, NS, Canada 3.4 ‡e SULM Model In particular, assuming that the training examples were gener- θ maximizing log-likelihood ated independently, we search for s Œe SULM model consists of two parts described in Sections 3.2 ∑ ( ( ( )) ) θ and 3.3. Œe main goal of SULM is to estimate the coecients k k k k ˆ ˆ ) ( o l θ | ( θ ) o + ( 1 − o ) θ ) log S 1 − log o = ( s s s s , u ui i , u i ui such that both parts of the model €t the sentiments simultaneously k u , i , and the ratings provided by the users. extracted from the reviews (6) More speci€cally, the SULM optimization criterion consists of the criterion from the sentiment utility part of the model (Equation (6)) S is the set of all sentiments expressed by users in the set of where and the rating prediction part of the model (Equation (9)). Moreover, training reviews. we also apply regularization to avoid over-€Šing. Combining all In this subsection, we described how to estimate the parameters θ these considerations, we search for that minimizes: of the model in order to €t the sentiments in user reviews. In the next subsection we focus on the rating estimation problem. λ λ r s 2 2 ( ( θ ) = − α · l ( R | θ ) − ) ‖ | 1 θ ·‖ − α ) · l + Q ‖ θ ·‖ ( S + θ Finally, we combine the two models into the overall SULM model s r s s r 2 2 for estimating both components in Section 3.4. (10) α where is the parameter of the model de€ning the relative impor- 3.3 ‡e Overall Satisfaction tance of the aspect and rating parts of the optimization criterion, u As in the case of individual aspects, SULM assumes that user can are the regularization parameters. and λ λ , s r with consuming item that de€ne the overall level of satisfaction i is measured by utility value ∈ R d . We estimate this utility as a i , u 3.5 Fitting the SULM Model linear combination of the individual sentiment utility values for all We apply the Stochastic Gradient Descent [ ] to estimate param- 30 the aspects in a review: θ minimizing criterion (10). In particular, we calculate the eters ∑ k k k k ∂ Q ˆ ˆ θ ) = d s ( z + w v ( θ + ) · ( (7) ) i u , s in order to perform the gradient step. partial derivatives u u i , i ∂ θ j ∈ A k First, denote the di‚erence between the real and the predicted r k ˆ ∆ r = r − values of rating as , and the di‚erence between u i , u , i where is the general coecient expressing the relative impor- z i , u s k ∆ = o − the real and the predicted values of sentiment as k tance of aspect in an application, such as restaurants. Moreover, ui i , k , u each user u may have personal preferences and speci€c values of k ˆ . Further, we denote the indicator function showing if user u o ui importance of aspects for the overall level of satisfaction and, there- k i of item expressed a sentiment about aspect in her review by k represents such individual importance value of fore, coecient w k u . ∈ { 0 } , 1 I , u i aspect has its own speci€cs and u for user i k . Similarly, each item Based on (4), the partial derivative of Q by μ would be k k determines the importance value of aspect k coecient v for item i Q ∂ ) , W , V ( = and the set θ . We denote these new coecients by i Z k r k s k k k r = α ∆ · w · ( z − + − ( ∆ · v + I · ) α − 1 ) − δ = u , u i , u i i u , i , k u , i k of all coecients in the model by , θ = ( . θ θ ) u i , μ ∂ s r Further, in our model instead of estimating the rating that user k and we denote this expression by − δ . Further, we calculate the would give to an item and minimizing the RMSE performance , u i Q and partial derivatives of θ for the rest of the parameters in measure, we follow an alternative approach advocated in previous perform the gradient descent step as follows: works (e.g. [ 2 ]) and classify the ratings into “like” and “dislike”. In the traditional €ve-star rating seŠings, we would map “like” ratings k k k : = μ μ γ δ · + u , i 2 , 1 to { and “dislike” to } 5 , 4 . As a result, we transform the } { 3 , k k k k recommendation regression into a classi€cation problem. b b : = γ ( δ + · b · λ − ) s u u u , u i Finally, we estimate parameters θ such that the logistic trans- k k k k ( b ) + γ · b δ : − = · b λ s ˆ i , i i i u formation (1) of the overall utility value ( θ ) would €t binary d , i u k k k k k speci€ed for item 0 } that user u 1 i rating ∈ { r , p ) p : p · + γ λ − = · q · ( δ u , i s u u u i u i , (11) ( ) k k k k k ˆ ) q · λ − p · γ ( + · δ q = : q s ˆ r g ) θ (8) . ( ( θ ) = d u i , u i i i u , i , u i r k k k k ˆ · : = z + ) z γ · ( λ − ) α θ ( · ∆ · s z r s , u i i u , In particular, assuming that the training examples were generated k k k r k θ that maximizes the log-likelihood independently, we search for ˆ · − ∆ · α ( · γ s w ( + ) λ w = ) : θ · w s r u u u i i , , u u function on the training set of reviews: k r k k k ˆ λ ( . · v ∆ · α ( · γ ) + s ) v = : θ · v − r s u i i u , i i , i ∑ )) ( ( ( ) ˆ ˆ · + ) θ ( ( r · log r R | − 1 θ log ) ) r r − 1 ( l = ) θ ( u i u ui r i , , ui ], iteratively, we €rst 14 As in the case of Matrix Factorization [ i , u optimize the parameters in θ pertaining to the user by €xing the s (9) rest of parameters in pertain- θ , then optimize the parameters in θ s In this subsection, we described how to estimate the parameters ing to the item by €xing the rest of the parameters, and, €nally, . We do optimize the parameters in θ θ by €xing the parameters of the model in order to €t the binary ratings provided by the users. r s it iteratively until convergence. As a result, we estimate all the In the next subsection we combine the two models (6) and (9) into the overall SULM. parameters of the SULM model.

5 Recommending Items with the Most Valuable Aspects Based on User Reviews KDD ’17, August 13-17, 2017, Halifax, NS, Canada Beauty & Spas Hotels Restaurants 3.6 Aspect Impact on Ratings Initial 104,199 96,384 1,344,405 In this step we apply the model trained in Section 3.5 to determine 602,112 Filtered 5,669 5,065 the most important aspects of user’s potential experiences with 23,209 Users 352 349 the item that were discussed at the beginning of Section 3. In Table 1: Yelp Dataset Description. in its weight particular, we measure the importance of an aspect by the regression model (7). Œis means that for a potential experience k Service Meat Money Dessert Fish Decor ˆ i we, €rst, predict sentiment utility values of user u s with item ui beef design bartender price tiramisu cod k ∈ A in an application. A‰er that, we compute for each aspect waiter dollars ceiling meat salmon cheesecake the impact of each aspect in the potential user review on the k chocolate bbq cat€sh cost service decor overall predicted level of satisfaction of user u with item i as a hostess tuna ribs lounge dessert budget corresponding summand from the linear model (7): window charge ice cream shark veal manager k k k k k sta‚ space check macaroons €sh pork ˆ (12) z · v + s . w impact ( = + ) u i ui ui Table 2: Examples of words pertaining to some of the aspects in the restaurant application. In other words, the impact of aspect k on the experience of user u with item i is calculated as a product of the predicted sentiment k ˆ and the corresponding coecient representing the utility value s Logistic Model (SULM), predicting sentiment utility values, and cal- ui importance of aspect k of item i for user u . culating personal impact factors that each aspect contributes to the Œese calculated aspect impacts reƒect the importance of each overall rating for the user. In Section 4, we show the experimental aspect of a user review on the overall predicted rating. Note that results of applying the proposed method to the real data from three they can be positive or negative, and we can use them to recommend applications. positive and avoid negative experiences when users consume the recommended items, as explained in the next section. 4 EXPERIMENT 4.1 Dataset 3.7 Recommending Items and Aspects To demonstrate how well our method works in practice, we tested it Next, we manually identify two groups of aspects among all the on the restaurant, hotel and beauty&spa applications based on the 1 A aspects in the application, over which (a) the user has control Yelp reviews collected in several US cities over a period of 6 years. and (b) the management of the establishment has control. We call In this study, we selected only those users who have wriŠen at least and user-controlled these groups of aspects management-controlled 10 reviews. Œe numbers of reviews in the initial datasets, the users respectively. For example, aspect “gym” in a hotel application is having more than 10 reviews, and the overall ratings generated under the user control, because she can decide whether to use it or only by those users (i.e., €ltered ratings) are presented in Table 1 not during her stay in the hotel. Furthermore, within these groups, across the three applications. we identify the most valuable aspects that we want to recommend Although Yelp uses a 5-star rating system, we transformed it into to the user together with the item or to the management. Œese 5 , 4 ) classes, as explained { the binary “high” ( } 3 , 2 , 1 { ) and “low” ( } recommendations can be positive (suggestion to experience an in Section 3.3. Furthermore, we reformulated rating estimation as aspect) or negative (suggestion to avoid an aspect). Finally, we a classi€cation problem where we estimate the probability that a recommend an item and the identi€ed corresponding aspects to user would “like” an item (by giving it a rating of 4 or 5). the user or the most important aspects to the management. For example, if our system identi€ed aspect “€sh” as having high 4.2 Experiment Settings positive impact on the rating, we will recommend this restaurant We applied the method presented in Section 3 to the restaurant, and suggest to order €sh in that restaurant to the user. Similarly, hotel, and beauty&spa applications and extracted 69, 42, and 45 if aspect “dessert” has a strong negative impact on the rating, we Opinion Parser , aspects for these applications respectively using may still recommend visiting that restaurant to the user with a as explained in Section 3.1. Table 2 presents several examples of not to order desserts there if we expect that the restau- suggestion aspects extracted from the reviews for the restaurant application, rant rating in this case to be high. Further, we can recommend together with some of the examples of the words corresponding to such aspects to the management which are under their control and those aspects. For each review, we also determine the set of aspects on which the management have inƒuences. For example, we can appearing in that review and their corresponding sentiments, as recommend to the management of a beauty&spa salon to provide a described in Section 3.1. complementary drink to the user (since it will improve her overall Œe aspect extraction part of OP was evaluated on 5 benchmark experience) and don’t chat with her too much while in session. 21 . 86. Œe of 0 F-score ] and it showed the online review datasets [ In summary, we proposed a method for predicting whether a evaluation of aspect sentiment classi€cation part of OP based on 8 user would like an item, for estimating the sentiments that the user . of 0 F-score 17 online review datasets [ 9 , 90 on average ] showed the might express about di‚erent aspects of the item, and for identify- for positive and negative sentiment classes. We also tested the ing and recommending the most valuable user-controlled aspects performance of the OP system on our dataset. In particular, we of the potential user experience of the item. Œis method consists 1 of sentiment analysis of user reviews, training the Sentiment Utility hŠp:// challenge/dataset

6 KDD ’17, August 13-17, 2017, Halifax, NS, Canada K. Bauman et. al. selected a random sample of 3 , the average overall rating is changed for those items where the 000 sentences from the reviews managers follow our recommendations by suggesting the aspect in the restaurant application and manually evaluated the aspect extraction and the sentiment classi€cation parts of the OP. Our consumptions to the user. As before, we assume that the manage- results are consistent with the previous studies, showing the ment followed our positive recommendation of an additional aspect F-score of 0 if the user mentioned this aspect in the review (e.g., provided a 89 and 0 . 93 for the two parts of the system respectively. . complimentary drink which was mentioned in the review). Note, All these evaluations show that OP performs well in general and that we calculate the described measure for users and management speci€cally in our application. In this study, we focus on leveraging 2 separately. It means that if the consumed aspect was recommended for providing recommendations of items and the output of OP to the user to the management, we count it as both the user and their most valuable aspects. is partitioned Further, for each application, the set of reviews R and the management followed our recommendations. Œis point into training and testing sets in the ratio of 80% to 20%. We also use does not reduce the power of the results which show that the most valuable aspect should be consumed in order to enhance the overall cross-validation on the training set to €nd the best parameters of λ , and λ user experience. , α the SULM model, including parameters, that maximize r s Aspect and Rating Predictions. In addition to measuring the the prediction performance measures to be introduced in the next 0 section. In particular, we found that 5 provides the best α . = e‚ect of the recommendations of the most valuable aspects, we also balance between the performances of predicting aspect sentiments evaluate how well the model predicts which of the aspects would appear in the user reviews. Although SULM does not focus on this and user ratings across the three applications. We trained SULM on MacBook Air 1.4 GHz Intel Core i5. It took type of predictions, we assume that users tend to discuss the most less than a minute to train the model for hotels and beauty&spa valuable aspects in their reviews. Œerefore, we use the absolute ∼ 000 reviews) and about one hour to train SULM , 4 applications ( values of aspects impacts on the predicted rating (described in Sec- for the restaurant application ( ∼ 480 , 000 reviews). tion 3.6) to predict the list of aspects that user would discuss in the A‰er training the model on the restaurant, hotel, and beauty&spas review. In particular, for each potential user experience we €rst A rank all the aspects from the application (e.g. restaurants) accord- applications, we predict the unknown ratings and sentiments on ing to these absolute values of aspects impacts. Œen we select the the test data (reviews) and also determine the impacts of the as- of the ranked aspects and examine how many of them appear pects on the predicted ratings. Moreover, as explained in Section top-n in each review. In other words, we compute the precision measure 3.7, among all aspects of the restaurant, hotel, and beauty&spa of the most important aspects appearing in a review. Finally, applications, we identi€ed that the user has control over 49, 14, T opN and 17 aspects respectively, and the managements of the establish- we also calculate the rating prediction performance of the SULM ments have control over 54, 29, and 31 aspects respectively. Further, model using the standard performance measures [email protected] SULM provides recommendations to experience positive or avoid and AUC [23]. experiencing negative aspects of the item from these identi€ed sets. We computed all these measures on the test data and present our Œe results of these experiments are presented in the next section. results and their comparison with the baselines in the next section. 4.3 Evaluation Methodology 5 RESULTS When running the model as described in Section 4.2 on the data In this section, we present the results of how well the proposed from Section 4.1, we measure its performance in terms of how the method performed on the restaurant, hotel and beauty&spa appli- recommendations of the most valuable aspects a‚ect the overall cations across the three measures described in Section 4.3, each rating, how well the model predicts if the user would like (or dislike) measure being described in a separate subsection (5.1-5.3). In par- the recommended items, and how well it predicts which of the ticular, in Section 5.1 we compare the performance of SULM aspect aspects would appear in the potential user review. recommendations with certain baselines. In Section 5.2 we compare Œe main point of SULM’s per- Aspect Recommendations. aspect ranking performance of our algorithm with another baseline. formance is the e‚ect it produces on the overall user experience Finally, in Section 5.3 we compare our method with the baselines with an item by providing recommendations of additional most in terms of rating prediction performance. valuable aspects. To estimate this e‚ect we use the measure of how Note, that we compare the performance of our method with the recommendations of speci€c aspects (described in Section 3.7) di‚erent baselines in the three subsections because our model in a‚ect the overall rating. In particular, we measure how much the addition to the standard rating prediction provides a novel aspect average overall rating is changed for those users who “follow” our recommendation functionality which is not supported by the base- our positive recommendations. We assume that the user followed lines and it is thus not possible to compare our model uniformly recommendations of additional aspects if he/she mentioned this with them across all the aforementioned performance metrics. aspect in the review. We expect that positive recommendations of aspects would increase the average rating, while not-following 5.1 Recommendations of Aspects negative recommendations would decrease it. Note that in the case user- con- In this subsection, we evaluate the recommendations of avoid of negative recommendations we advise the user to expe- trolled aspects presented in Section 3.7 based on the performance riencing the speci€ed aspects. Similarly, we measure how much measure described in Section 4.3. In particular, we measure how much the ratings have changed for those users who “follow” our 2 Œe result of applying OP to the Yelp dataset is available online: hŠp:// aspect dataset. recommendations on the test set by mentioning the recommended

7 Recommending Items with the Most Valuable Aspects Based on User Reviews KDD ’17, August 13-17, 2017, Halifax, NS, Canada Figure 1: Average ratings for (a) Users and (b) Managers who Figure 2: Average ratings for (a) Users and (b) Managers who followed positive recommendations of additional aspects in did not follow negative recommendations of additional as- the restaurant application. pects in the restaurant application. recommen- negative Similarly, those users who did not follow our 3 dations (and experienced the negative aspect of an item against aspect in the review as described in Section 4.3. In addition to the our advice) gave lower ratings to the items than the average rating average rating on the test set, we compare our results with three of the items given by all users in the application and those users strong baseline approaches. Œese baselines basically indicate the who did not follow the recommendations provided with the base- strengths and weaknesses of the establishments based on their user line approach. For example, in the restaurant application (Figure 2 reviews: (a)) the users who did not follow the negative recommendations of this approach recommends users to experience Popular Aspect: • the most negative aspects liked their experiences with the items in the most popular aspect of an item, i.e. the most frequently 64 . 1% of the cases, while on average users like the items in 65 1% . mentioned aspect in the historical reviews of the item. of cases. Furthermore, those users who did not follow negative this approach identi€es and recommends Most Positive Aspect: • recommendations provided by our SULM model liked the items the user to experience the most positive aspect of an item p 9% of the cases, which is signi€cantly lower ( . in only 62 -value in the sense that it has the highest average sentiment rating 0 . < 05) than the result of the baseline approach. Œese comparisons taken over all the frequently occurring aspects of an item, i.e., demonstrate that negative recommendations provided by our SULM times in the reviews of k those aspects that appear more than model can help customers to avoid more negative experiences with an item (we set 5 in our experiments). = k items than the baseline approach. • this approach identi€es and recommends Most Negative Aspect: As described in Section 3.7, SULM provides similar recommen- the most negative aspect of an item. Similarly avoid the user to dations of the most valuable aspects not only to users but also to to the most positive case, this approach identi€es an aspect of aspects). As for the managers (we call them management-controlled an item with the lowest average sentiment rating among the users, Table 3 and Figures 1-2 also present the results of aspects k historical set of frequent aspects, i.e. appearing in more than recommendations to the management of the establishments. Œese reviews of an item. recommen- positive results show that managers who “followed” our dations (as explained in Section 4.3), obtained higher ratings for Œe results of our experiments are presented in Table 3 that the user experiences than the managers who followed recommen- compares the performance of our method with the baselines across dations provided with the baseline approaches. For example, in the the three applications in terms of average overall rating for di‚erent restaurant application (Figure 1 (b)) when the managers followed groups of reviews. Furthermore, Figures 1-2 graphically show the SULM’s positive recommendations of , users liked additional aspects same comparison results for the restaurants application. As Table 3 . their experiences in 71 6% of the cases, whereas, in those cases and Figures 1-2 demonstrate, our method signi€cantly outperforms most popular when the managers followed recommendations of the baseline approaches. In case of restaurants (Figure 1 (a)), users baseline approaches, users liked their expe- most positive or the . 1% of the cases on average in the test liked their experiences in 65 1% and 68 8% of the cases respectively. Œese . riences in only 67 . set, whereas, those users who followed recommendations of the numbers are signi€cantly lower than the result of the proposed 1% of most popular . aspect of an item liked their experience in 67 05), which demonstrates that our method SULM model ( p -value < 0 . the cases. Moreover, those users who followed recommendations can help managers to provide beŠer experiences for the users than aspects, liked their experiences in 69 0% of the . most positive of the the baseline methods. cases which is signi€cantly higher that the performance result of Furthermore, as Table 3 also shows, these results hold not only case. Finally, our SULM model achieved 1% for the 67 . most popular for the restaurants but also across hotels and beauty&spas domains. 3% which is again signi€cantly higher . the performance result of 72 In conclusion, our SULM method outperformed the baselines and t -test with than the previously described other baselines (based on help users to get (and managers to provide) beŠer experiences with -value < 0 . 05). Œese comparisons show that recommendations p recommended items. of the aspects provided by our SULM model can help customers to 3 get beŠer experiences with items than the baseline approaches. “do not consume aspect k of item i ”

8 KDD ’17, August 13-17, 2017, Halifax, NS, Canada K. Bauman et. al. Restaurants Hotels Beauty & Spas managers managers users users users managers 58 0% 71 . 2% 1% . . 65 Average 62.2% 62.8% 72.0% 71.9% Popular 67.1% Followed 67.1% 68.8% 65.7% 65.2% 72.4% Followed Most Positive 69.0% 72.7% 72.3% 68.0% 67.7% 76.1% 75.7% Followed Positive SULM 71.6% 64.2% 70.5% 57.9% 64.1% 70.4% Not followed Most Negative 57.9% 62.9% 57.2% 57.6% 67.8% 67.5% Negative SULM Not followed 63.3% Table 3: Average fraction of liked items for the users who followed (or not) our positive/negative recommendations of addi- tional aspects. [email protected] AU C [email protected] [email protected] R R H B&S R H H B&S B&S H Application Application B&S R LRPPM 0.637 0.725 0.694 0.845 0.822 0.801 0.22 0.34 0.16 0.24 0.41 0.20 LRPPM 0.19 0.16 0.22 0.40 0.19 SULM 0.33 0.756 0.714 0.842 0.821 0.821 HFT 0.651 0.849 0.663 SULM 0.745 0.707 0.862 0.818 Table 4: Aspect Ranking Performance. Table 5: Rating Prediction Performance. 5.2 Aspect Ranking Performance LRPPM model [7] that we described in Section 5.2 (1) In addition to measuring the e‚ect of recommendations of the most (2) Hidden Factors as Topics (HFT) [ 18 ] provides the state-of- valuable aspects, we also evaluate how well SULM predicts which the-art approach that incorporates user reviews into a rat- of the aspects would appear in the user reviews as described in ing prediction model. In particular, HFT combines the Section 4.3. As a baseline, we use the following popular approach Matrix Factorization and the Latent Dirichlet Allocation Learning to Rank User Preferences Based on Phrase-Level (1) (LDA) models to simultaneously train on the ratings and Sentiment Analysis Across Multiple Categories (LRPPM) [ 7 ], the texts of the reviews. which is a model trained on the results of sentiment mining We selected these two baseline models because they constitute of user reviews. LRPPM predicts user ratings and ranks the state-of-the-art in combining rating predictions and user re- the aspects of user reviews according to the probability 18 ] that HFT outperforms views. In particular, it is shown in [ of these aspects to appear in possible future reviews. We ]. We have the classical Matrix Factorization (MF) approach [ 14 download the LRPPM system from the authors’ website. also selected the LRPPM model because, as is shown in [ 7 ], this Œe results of this comparison are presented in Table 4. As Table model outperformed several previously proposed rating and as- 4 shows, our method is comparable to the LRPPM baseline, even pect ranking models, such as Probabilistic Matrix Factorization though our model does not optimize for the ( N = 3 [email protected] (PMF) [ ] and Rating-based Ten- ], Explicit Factor Model (EFM) [ 24 31 = 5) aspect ranking performance, whereas the LRPPM model and N ], and therefore also constitutes a strong 7 sor Factorization (RTF) [ does. We explain this interesting result by conjecturing that users state-of-the-art baseline. Comparing our proposed approach with tend to discuss those aspects in the reviews that have the highest these two models is sucient for our purposes because these are impact on the overall rating, and that this is the cause of the strong strong baselines that outperformed various other approaches dis- performance of the SULM model. We plan to explore this conjecture cussed in Section 2, including EFM, PMF, and RTF. Furthermore, further as a part of future research. Note that LRPPM predicts only , ], are 27 , 22 20 , the other models discussed in Section 2, such as [ 11 the importance of aspects for the user in an item, but it does not not directly comparable with SULM because they focus only on the take into account user sentiments. It means that LRPPM cannot rating prediction problem. recommend experiencing the most valuable aspects and, therefore, Œe results of the rating prediction performance of SULM and SULM outperforms it in functionality by providing such additional the two baselines are presented in Table 5. As Table 5 shows, SULM capability as we discussed in the previous subsection. outperformed the LRPPM model across all the applications and Furthermore, as Table 4 shows, aspects are predicted signi€cantly performance measures and performed comparably with the HFT beŠer for hotels than for restaurants and beauty & spas. Œis is the model. In particular, its performance is beŠer than that of HFT case because some aspects of hotels, such as “room” and “service,” for the hotel and beauty&spas application for the [email protected] are very popular in the reviews and, therefore, are easily predictable. measure and for the beauty&spas application in terms of AUC measure and is slightly worse in other cases. 5.3 Rating Prediction Performance Although the performance results of the SULM and HFT models are comparable for the rating prediction problem, SULM has more As explained in Section 4.3, we measure how well we predict if extensive functionality than HFT which only predicts ratings based the user would like (or dislike) the recommended items. We com- on reviews and, therefore, SULM dominates HFT in general, as will pare the performance of our model with the following baseline approaches be explained subsequently and further summarized in Section 6.

9 Recommending Items with the Most Valuable Aspects Based on User Reviews KDD ’17, August 13-17, 2017, Halifax, NS, Canada In summary, the SULM model performs well in predicting un- REFERENCES [1] Silvana Aciar. 2010. Mining context information from consumers reviews. In 19 ]), known ratings (at the level of the state-of-the-art HFT model [ Proceedings of Workshop on Context-Aware Recommender System . ACM. estimating the aspects that a user would specify in a review (at the G. Adomavicius and Y. Kwon. 2007. New Recommendation Techniques for [2] ]), and determining the 7 level of the state-of-the-art LRPPM model [ 22, 3 (May 2007), 48–55. IEEE Intelligent Systems Multicriteria Rating Systems. Gediminas Adomavicius and YoungOk Kwon. 2015. [3] Multi-Criteria Recommender impacts of various aspects on the overall rating of the review. How- Systems . Springer US, Boston, MA, 847–880. ever, the main advantage of the proposed SULM model is the new [4] Gediminas Adomavicius and Alexander Tuzhilin. 2015. Context-Aware Recom- additional functionality of providing not only recommendations mender Systems . Springer US, Boston, MA, 191–226. [5] Christopher M. Bishop. 2006. Paˆern Recognition and Machine Learning (Infor- of items to users, but also recommendations of the most valuable . Springer-Verlag New York, Inc. mation Science and Statistics) aspects that may enhance user experiences with items. Note, that Li Chen, Guanliang Chen, and Feng Wang. 2015. Recommender systems based on [6] user reviews: the state of the art. In User Modeling and User-Adapted Interaction . none of the baseline approaches support all these capabilities in one [7] Xu Chen, Zheng Qin, Yongfeng Zhang, and Tao Xu. 2016. Learning to Rank system. We showed that those users who followed our recommen- ACM SIGIR . 305–314. Features for Recommendation over Multiple Categories. In dations of important aspects rated their experiences signi€cantly [8] Qiming Diao, Minghui Qiu, Chao-Yuan Wu, Alexander J. Smola, Jing Jiang, and Chong Wang. 2014. Jointly Modeling Aspects, Ratings and Sentiments for Movie higher than the users who followed recommendations from the (KDD) Recommendation . baseline approaches. Furthermore, SULM provides recommenda- [9] Xiaowen Ding, Bing Liu, and Philip S. Yu. 2008. A Holistic Lexicon-based mangers tions not only to the users but also to the which can help . ACM, 10. Approach to Opinion Mining (WSDM ’08) ́ eLie Marian. 2013. Improving the [10] Gayatree Ganu, Yogesh Kakodkar, and Am them to provide beŠer services to the users. All this demonstrates Inf. ‹ality of Predictions Using Textual Information in Online User Reviews. that SULM signi€cantly enhances the functionality of the current 38, 1 (March 2013), 1–15. Syst. [11] N. Hariri, B. Mobasher, R. Burke, and Y. Zheng. 2011. Context-Aware Recom- recommender systems by providing all these additional capabilities (ITWP) . mendation Based On Review Mining to the traditional rating prediction and item recommendation tasks. Xiangnan He, Tao Chen, Min-Yen Kan, and Xiao Chen. 2015. TriRank: Review- [12] aware Explainable Recommendation by Modeling Aspects. In ACM CIKM . Yehuda Koren and Robert Bell. 2015. Advances in Collaborative Filtering. In [13] Recommender Systems Handbook , Francesco Ricci, Lior Rokach, and Bracha Shapira (Eds.). Springer US, Boston, MA, 77–118. 6 CONCLUSION Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization [14] Techniques for Recommender Systems. Computer 42, 8 (aug 2009), 30–37. In this paper, we presented a method that identi€es the most valu- Yize Li, Jiazhong Nie, Yi Zhang, Bingqing Wang, Baoshi Yan, and Fuliang Weng. [15] able user-controlled aspects of possible user experiences of the . 2010. Contextual Recommendation Based on Text Mining. In COLING: Posters Guang Ling, Michael R. Lyu, and Irwin King. 2014. Ratings Meet Reviews, a [16] items and recommends the items together with suggestions to con- Combined Approach to Recommend . ACM, New York, NY, USA, 8. (RecSys ’14) sume those most valuable aspects. Œe paper makes the following [17] Bing Liu. 2015. Sentiment analysis: Mining opinions, sentiments, and emotions . contributions. First, it proposed a novel approach to enhance the hŠp:// Cambridge University Press. DOI: Julian McAuley and Jure Leskovec. 2013. Hidden Factors and Hidden Topics: [18] functionality of recommender systems by recommending not only . ACM, 8. Understanding Rating Dimensions with Review Text (RecSys ’13) the item itself but also some positive aspects of the item to further Julian McAuley, Jure Leskovec, and Dan Jurafsky. 2012. Learning AŠitudes and [19] AŠributes from Multi-aspect Reviews . 1020–1025. (ICDM ’12) enhance user experiences with the item. Second, in this paper we ˇ ˇ ́ ́ [20] Opinion-Driven Matrix Factorization for a ath. 2013. Stefan Pero and Tom s Horv developed a method Sentiment Utility Logistic Model (SULM) for . Springer Berlin Heidelberg, Berlin, Heidelberg, 1–13. Rating Prediction identifying the most valuable aspects of future user experiences that [21] Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. 2011. Opinion Word Expansion Comput. Linguist. and Target Extraction Œrough Double Propagation. (2011). is based on the sentiment analysis of user reviews. Œird, we tested [22] Lin Qiu, Sheng Gao, Wenlong Cheng, and Jun Guo. 2016. Aspect-based la- our method on actual reviews across three real-life applications tent factor model by integrating ratings and reviews for recommender system. and showed that the proposed method performed well on these ap- Knowledge-Based Systems 110 (2016), 233 – 243. Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor (Eds.). 2011. [23] plications in the following sense. First of all, recommendations of a . Springer US. Recommender Systems Handbook set of valuable aspects worked well as those users who followed our Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian Probabilistic Matrix [24] recommendations rated their experiences signi€cantly higher than Factorization Using Markov Chain Monte Carlo. In ICML . ACM, 880–887. Nava Tintarev and Judith Mastho‚. 2015. Explaining Recommendations: Design [25] those who followed the baseline recommendations. Our method and Evaluation. In Recommender Systems Handbook 2nd edt. also managed to predict the unknown ratings of the reviews at the Hongning Wang, Yue Lu, and Chengxiang Zhai. 2010. Latent Aspect Rating [26] Analysis on Review Text Data: A Rating Regression Approach. In ACM SIGKDD . ]. In 18 level commensurate with the state-of-the-art HFT model [ Yinqing Xu, Wai Lam, and Tianyi Lin. 2014. Collaborative Filtering Incorporating [27] addition, it predicted the set of aspects that the user would mention Review Text and Co-clusters of Hidden User Communities and Item Groups. In in a possible future review of an item at the level of the state-of- . 251–260. ACM CIKM Chong Yang, Xiaohui Yu, Yang Liu, Yanping Nie, and Yuanhong Wang. 2016. [28] the-art LRPPM [ 7 ]. Moreover, SULM provides recommendations Collaborative €ltering with weighted opinion aspects. Neurocomputing 210 not only to the users but it also recommends valuable aspects of (2016), 185 – 196. mangers user experiences to the of the establishments that can help Z. J. Zha, J. Yu, J. Tang, M. Wang, and T. S. Chua. 2014. Product Aspect Ranking [29] IEEE Transactions on Knowledge and Data Engineering and Its Applications. 26, 5 them to provide beŠer services to the users. (May 2014), 1211–1224. As shown in Section 2, most of the existing works focused on Tong Zhang. 2004. Solving Large Scale Linear Prediction Problems Using Sto- [30] ICML chastic Gradient Descent Algorithms. In . 919–926. either leveraging user reviews to improve the rating prediction Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping [31] ]), or predicting the set of aspects that the user 18 (e.g. HFT model [ Ma. 2014. Explicit Factor Models for Explainable Recommendation Based on 7 might include in her review (e.g. LRPPM [ ]), or predicting the ACM SIGIR . Phrase-level Sentiment Analysis. In Yong Zheng, Bamshad Mobasher, and Robin Burke. 2016. User-Oriented Context [32] sentiments for each individual aspect of the user experience (e.g. Suggestion. In Proceedings of the 2016 Conference on User Modeling Adaptation [ 3 ]). SULM signi€cantly enhances the functionality of these systems and Personalization (UMAP ’16) . ACM, New York, NY, USA, 249–258. by providing recommendations of not only items of interest to the users but also additional aspects that may enhance user experiences with those items.

Related documents