Low Quality Product Review Detection in Opinion Summarization

Transcript

1 in Opinion Summarization ion Detect Quality Product Review - Low Yunbo Cao Jingjing Liu Microsoft Research Asia Nankai University Beijing, China Tianjin, China v [email protected] [email protected] - Chin - Yalou Huang Ming Zhou Lin Yew Microsoft Research Asia Microsoft Research Asia Nankai University Tianjin, China Beijing, China Beijing, China [email protected] [email protected] [email protected] 1 Amazon the allow s users to vote for helpfulness Abstract s based on the and then rank of each review reviews accumulated votes the according to our However, . P roduct reviews posted at online shopping survey at Amazon have in Section 3, users’ votes vary greatly in quality sites . This paper a d- ) (1 : as follow es kinds of bias three ce vote imbalan s - the problem of detecting low dresses bias, (2) early bird bias. (3) winner circle bias, and product i- Three types of b reviews. quality V a- studies Existing (Kim et al, 2006; Zhang and es as in the existing evaluation standard of , 2006) radarajan votes for trai n- ’ users these used product review s are discovered. To assess assess ing ranking models to the quality of reviews , the qu alit y set of sp e- of product reviews, a are to these biases. subject therefore which for judging the quality of r cification s e- aforeme n- , In this paper we demonstrate the classification A first defined. is s view - define a standard specification to tioned biases and proposed to detect the based approach is the then measure We . quality of product reviews . he - quality reviews low We apply t o- pr manually real - annotate a set of ground truth with posed approach to enhance opinion su m- spec i- conforming to the world product review data . E marization in a two - x- framework stage fication . perimental results show that the proposed To automatically detect low product e- r quality - low s discriminate 1) ( effectively approach - based approach - views, we propose a classification and quality ones high quality reviews from - - ground annotated pr o- The truth. learned from the s of opinion summar t he task ( i- 2) enhance posed approach explores three aspects of product - zation by detecting and filtering low reviews, namely informativeness, readability, and quality reviews. subjectiveness . proposed approach to opinion We apply the duction Intro 1 summarization, a typical opinion mining task. The proposed approach enhances the existing work in a an increasing has been In the past few years, there , framework the low where e- quality r - two - stage product reviews from interest in mining opinions applied is summar before the vi i- ew detection right Liu, et al, 2004; Popescu and (Pang, et al, 2002; zation stage . . However, d ue to the lack of Etzioni, 2005) that the proposed a p- Experimental results show editorial and quality control, reviews on products proach can discriminate low - quality reviews from a vary greatly in quality . Thus, it is crucial to have high task quality ones - effectively. In addition, the capable of assessing the quality of mechanism d e- of opinion summarization can be enhanced by - quality / noisy reviews . reviews and detecting low tect ing and filtering low - quality reviews. function ome shopping sites a already provide S For example, quality of reviews. the of assessing 1 http://www.amazon.com 334 Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational c Natural Language Learning 2007 Association for Computational Linguistics © , pp. 334–342, Prague, June 2007.

2 The rest of the paper is organized as follows ments is presented to users so that they can have a n : - other how users on glance view experienced a - at In Section 3, introduces the related work. 2 Section in The major weakness . certain product a on d rate we define the quality of product reviews. In Se c- the all the reviews we present our approach to detecting low , 4 tion - existing studies is that , inclu d- , we empirically verify - 5 . In Section quality reviews ing low quality ones , taken into consideration are In . summary generating the equally treated and the effectiveness of the proposed approach and its for e enhance this paper, w application by detecting a- summ 6 use for opinion summarization. Section the filtering s u- our work in this paper and points out the f s rize and quality review . In order to - low first define what the quality of achieve that, we ture work . reviews is. Related Work 2 Reviews 3 Quality of Product 2.1 Evaluating Helpfulness of Reviews In this section, we will first show three biases of evaluating helpfulness of reviews The problem o f and then users’ votes observed on Amazon , present learning utility of , also known as (Kim et al, 2006) . our specification on the quality of product reviews (Zhang and is quite , , 2006) Varadarajan reviews assessing the quality of similar to our problem of Ground - truth 3.1 Amazon . reviews product e use the digital In our study, w on reviews In practice, researchers in this area consider ed cameras . as our data set crawled from Amazon The problem the d and solve it ranking as a problem reviews on 946 digital 141 data set consists of 23 , with regression models. In the process of model , users could vote the Amazon site At cameras. for use , they and testing training truth - the ground d ” with a “ helpful ” review or “ unhelpful a label. derived from users ’ votes of helpfulness provided a re two number s Thus, for each review there by Amazon. As we will show later in Section 3, the statistics of these two labels indicating , namely from three types of vo t- ffered these models all su number of the and that of votes ” helpful “ ing bias. “ the used (2006) Kim et al ones. ” unhelpful In our work, we avoid using users ’ votes by d e- percentage of votes as the measure of ” helpful “ veloping a on the quality of reviews specification in their ” evalua ting the “ quality of reviews spec i- a and building according to the truth - ground experiments. We call the ground - truth based on . fication this measure as “ Amazon ground - truth ” . Certainly, t he ground - truth has the advantage of Mining Opinions from Reviews 2.2 convenience. However, we identify three types of on opinion mining from One area of research that make the Amazon ground truth not a - es bias l- product reviews is to judge whether a review e- ways suitable for determining the quality of r negative opinion. a For positive or expresses a e describe these biases in details in the views. W ey example, Tur n ( 2006 ) present ed a simple rest of this section . unsupervised learning algorithm in judging Im 3.1.1 Bias balance Vote reviews as “ thumbs up ” (recommended) or “ thumbs down Pang et al (not recommended). ” 10000 (2002) consider ed the same problem and present ed 8000 a set of supervised machine learning approaches to 6000 ), . ( 2003 Dave et al it. For other work see also Pang 4000 # Reviews ) . and Lee 2004 , 2005 ( 2000 Another area of research on opinion mining is to 0 from pro d- extract and summarize users’ opinions 0.6 0.5 0.4 0.3 0.2 0.1 0.7 0.8 0 0.9 1 . , 2005; uct reviews (Hu and Liu, 2004; Liu et al Percentage of 'helpful' votes Popescu and Etzioni, 2005 ) . Typically, a sentence R eviews ’ percentage scores Figure 1 . or a text segment in the reviews is treated as the others ’ he polarity of users’ sentiments A t the Amazon site, users tend to basic unit value . on a T feature product positively rather than negatively. s opinion From in each unit is extracted. he Then t i- aggregation of the polarities of individual sent 141 , 23 half of the a Figure 1, we can see that 335

3 reviews influence among labelers should be avoided when (corresponding to the two bars on the right of the figure) have more than 90% “ the votes are ” votes helpful used . standard as evaluation the , including 9 , 100 ” helpful “ with reviews 100% 3.1.3 Early Bird Bias - votes From an in . depth these investigation on voted high ly - some did reviews, we observed that 60 not really have as good quality as the votes hint. 50 40 For example, in Figure 2, the review about Canon 30 helpful ” “ PowerShot S500 votes out of receives 40 20 40 votes although it only gives very brief 10 description on the product features in its second 0 gra ph. We call this type of bias “imbalance para # Votes held by reviews 34 31 28 25 22 19 16 13 10 7 4 1 49 46 43 40 37 vote” bias. Publication Date This is my second Canon digital elph camera. Both were great Dependency on publication date 4 . Figure to the S500. About 6 months later I get cameras. Recently upgraded the dreaded E18 error. I searched the Internet and found numerous can influence Publication date the accumulation of people having problems. When I determined the problem to be the n ’ th publication date users ’ votes. In Figure 4 , the lens not fully extending I decided to give it a tug. It clicked and the camera came on , ready to take pictures. Turning it off and on pr o- th ’ n represents the month after the product is duced the E18 again. While turning it on I gave it a nice little bump in the figure released. The number is averaged over on the side (where the USB connector is) and the lens popped out e can W . set all the digital cameras in the data on its own. No problems since. s a nice compact and light camera a ’ It nd takes great photos and clear observe a is trend that the earlier a review second videos. Only complaint (other than E18) is the limit of 30 - poste d, the more votes it will get. This is simply videos on 640x480 mode. I've got a 512MB compact flash card, I because reviews posted earlier are expose d to user s should be able to take as much footage as I have memory in one take. some high quality for a longer time. Therefore , Figure xample review An e . 2 get reviews may because of later fewer users ’ vote We call this “early bird” bias. publication . 3.1.2 Winner Circle Bias Quality 3.2 of Specification 300 250 Be sides these aforementioned biases, using the raw 200 from rating s directly also fail to provide a readers 150 clear guideline for what a good review consists of. 100 e provide such a guideline In this section, w , which 50 as we name . ) SPEC the specification ( 0 #Votes held by reviews of r four categories define , we SPEC In the e- 40 1 4 7 10 13 16 19 22 25 49 46 43 28 37 34 31 view quality which different value s of represent Ranking positions of reviews : r best “ e- to users’ purchase decision s review the Figure 3 . V otes of the top - 5 0 ranked reviews “ , ” review good “ , ” view fair review ” , and “ bad r e- effect of There also exists “ bootstrapping a ” hot A generic description of the SPEC is as fo l- . ” view Figure reviews at the Amazon site . 3 shows the lows: ” helpful the top 50 ranked reviews. The “ votes for must be and d complete a rather e- A best rev iew over 127 digital cameras number s are average d tailed a on comment product. It presents several which than 50 reviews. have As shown in no less aspects of a product and provides convincing op i- , the top two reviews hold more than this figure 250 enough evidence. Usually a review nions with best respectively on average and 140 votes ; while the users o n- could be taken as the main reference that numbers of votes held by lower - ranked reviews i- making their e dec purchas ly need to read before decrease the d calle - so . This is exponentially sion on in Fi The first review certain product. a g- bias : the more votes a review ” “ winner circle review. It best presents several product ure 5 is a appear to gains, the more default authority it would convincing opinions with provides and features readers the the influence will in turn , which sufficient evidence. It is also in a good format for objectivity of Also, the higher readers’ the votes . readers to easily understand . Note that we omit ranked reviews would attract more eyeballs and some words in the example to save space. the is h T ore gain more people’s votes. theref mutual 336

4 it comes with a rechargeable lithium battery. Many use AA batt e- A review is a relatively complete comment good , in about two batteries the digital camera consumes theses AA ries i- on a product, but not with as much supporting ev the unit is on. That can add continuous hours time expense to while necessary It could be the camera. It's also the best resolution on the market. 6.0 megapix dence as used . as a strong is out, though only a few. And the smallest that we found. Also the reference re c- and influential , but not as the only best price for a major brand. is 5 ommendation. The second review in Figure Fair Review : There is nothing wrong with the 2100 except for the very notic e- such an example . able delay between pics. The camera's digital processor takes review contains a very brief description A fair about 5 seconds after a photo is snapped to ready itself for the next a supply detailed evaluation not on product. It does one. Otherwise, the optics, the 3X optical zoom and the 2 megapixel resolution are fine for anything from Internet apps to 8" x 10" print on the product, but only some on comment s s- a enlarging. It is competent, not spectacular, but it gets the job done For example, t of the product . pects he third review at an agreeable price point. the delay between mainly talks about “ 5 in Figure Bad Review: point out that you should never buy a generic battery, I want to ” pictures but , less about other aspects of the ca m- like the person from San Diego who reviewed the S410 on May 15, . era 2004, was recommending. Yes you'd save money, but there have been many reports of generic batteries exploding when charged for review is usually an incorrect bad A description And don't think if your generic battery explodes you can too long. misleading . I t talk s with of a product information sue somebody and win millions. These batteries are made in swea t- but much about little product about a specific some shops in China, India and Korea, and I doubt you can find anybody to sue. So play it safe, both for your own sake and the camera's general topics (e.g. photography). For example, the sake . If you want a spare, get a real Canon one. e- 5 last review in Figure alks about the t topic of “ g Figure . Example reviews 5 neric battery ” , but does not specify any digital camera. A bad review is an “ unhelpful review ” that of Quality 3.3 Annotation can be ignored. defined above, the SPEC According to we built a Best Review: We . Amazon data set truth from the ground - I purchased this camera about six months ago after my Kodak Easyshare camera completely died on me. I did a little research 100 digital cameras randomly selected and 50 and read only good things about this Canon camera so I decided to reviews for each camera we have 4, Totally . 909 go with it because it was very reasonably priced (about $200). Not reviews since some digital cameras have fewer nly did the camera live up to my expectations, it surpassed them o by leaps and bounds! Here are the things I have loved about this than 50 unique reviews. Then we hired two camera: annotators to the reviews with the SPEC as label A s the result, we have two their guideline . BATTERY - this camera has the best battery of any digital ca m- ... era I have ever owned or used. independent 09 9 , 4 on copies of annotations ” , fair “ , ” good w reviews , “ ith the labels of “ best ” , EASY TO USE - I was ab le to ... . ” “ and bad confusion matrix shows the 1 Table PICTURE QUALITY all of the pictures I've taken and printed - value of The between the two copies of annotation. out have been great. ... statistic calculated from kappa the (Cohen, 1960) - I love the ability to quickly and easily ... FEATURES shows 0. the matrix is 8 14 2. This that the two highly achieved annotators by consistent results I was hoping - LCD SCREEN ... they worked although following the SPEC, I was also looking for a camera that used - SD MEMORY CARD independently . SD memory cards. Mo stly because ... Annotation 2 Annot a- I cannot stress how highly I recommend this camera. I will never tion 1 total good fair bad best buy another digital camera besides Canon again. And the A610 (as well as the A620 - the 7.0MP version) is the best digital camera I've 44 0 34 0 294 2 best ever used. 6 6 639 18 11 3 8 0 good Good Review: , 85 7 1 3 11 0 200 1 , 4 72 fair "P10" Digital Camera is the top pick for CSC. The Sony DSC Running against cameras like Olympus stylus, Canon Powereshot, 885 , 1 2 1 78 66 9 , 1 bad Sony V1, Nikon, Fuji, and More. The new release of 5.0 mega pi x- 09 9 , 4 998 , 1 65 6 , 1 885 1 36 total els has shot prices for digital cameras up to $1000+. This camera I purchased throug h a Private Dealer cost me $400.86. The Retail Table 1. Confusion matrix bet. the annotations Price is Running $499.00 to $599.00. Purchase this camera from a wholesale dealer for the best price $377.00. Great Photo Even in In order to examine the difference between our fit into dim light w/o a flash. The p10 is very compact. Can easily truth, we evaluate - annotations and Amazon ground any pocket. The camera can record 90 minutes of mpeg like a home the Amazon ground - truth against the annotations , lot of great digital cameras on the market that movie. There are a shoot good pictures and video. What makes the p10 the top pick is 337

5 low with the measure of ” pairs error rate of preference “ - quality reviews , since our goal is to identify et al, 1999) . ( Herbrich low quality reviews that should not be considered | 표 푒 푐 푡 푝 푟 푒 푓 푒 푟 푒 푛 푐 푒 푝 푎 푖 푟 푠 푟 푐 푟 푛 푖 | . creating product review summaries when 푅 푎 푡 푒 푟 퐸 푟 표 푟 = (1) 푎 푙 푙 푝 푟 푒 푓 푒 푟 푒 푛 푐 푒 푝 푎 푖 | 푠 | 푟 Classification of Product Review s 4 is defined a pair of as ” preference pair “ where the review reviews with a order. For example, a best employ e W a statistical machine learning approach and a review correspond to a preference pair good quality problem of detecting low to address the - with the order of good review preferring to best “ products . reviews . The ” all preference pairs “ ” are collected review 푛 , we Given a training data 푥 set , 푦 퐷 = 푖 푖 1 from one of the annotations (the annotation 1 or error in construct a model that can minimize the the annotation 2) by ignoring the pairs from the (generalization error). Here prediction of y given x “ same ca tegory. The ” incorrect preference pairs 푢 푥 ∈ 푋 and 푦 , = { 푕 푖 푔 푕 푞 푢 푎 푙 푖 } 푦 푡 푖 푙 푎 푡 푞 푤 표 푙 푦 푖 푖 preference pair s collected from the Am are the a- , represents a product review and a label zon ground truth but not with the same order as - respectively . When applied to a new instance x , the in the that . The order of the all preference pairs model predicts the corresponding y and outputs the collected from the Amazon preference pair iction. score of the pred r- - truth is evaluated o n the basis of the pe ground in Section 3.1. centage score as described 4.1 The Learning Model The error rate of preference pairs based on the In our study, we focus on differentiating low - based on the annotation 2 are that annotation 1 and - reviews from product high quality . quality ones 0.448 and 0.446, respectively averaged over 100 , Thus, we treat the task as a binary classification . digital cameras The reference high error rate of p problem. pairs - truth that the Amazon ground s demonstrate ) Support Vector Machines SVM ( employ We - (our ground diverges from the annotations truth) (Vapnik, 1995) . classification the model of as . significantly Given an instance ( product review ), SVM assigns x which discover To ground - truth is more kind of a score to it based on reasonable, we ask an additional annotator (the 푇 ) 2 ( 푥 푤 = 푥 + 푓 푏 third annotator) kinds of to compare these two denotes b denotes a vector of weights and w where ground - truth. More specifically, we randomly s e- is, the a f(x) intercept. The higher the value of n lected 100 preference pairs whose orders the two higher is. In the quality of the instance x kinds of r- (called inco on truth don’t agree - ground f(x) classification, the sign of is used. If it is rect preference pairs in the evaluation above). As is classified into the positive positive, then x for our ground - choose the Annotation 1 truth, we - quality reviews) , otherwise into the category (high n the new test . Then, the third annota tor is asked i . negative category quality reviews) - (low selected order for each to assign pair. a prefer e nce The construction of SVM needs labeled training our both Note that the third annotator is blind to - “ high data (in our case, the categories are quality specification . existing preference order and the low , the ” and “ reviews - quality reviews ” ). Briefly Last, we evaluate the two kinds of ground truth - learning algorithm creates the hyper plane ” in ( “ ), 2 h the new annotation. Among 100 pairs, our wit separates the positive and hyper plane such that the - ground truth agree s to the new annotation on 85 in th e training data with the negative instances pairs while the Amazon ground truth agree s to the - largest “ margin ” . new annotation on 15 pairs. To confirm the result, yet another annotator (the fourth annotator) is 4.2 Product Feature Resolution to repeat the same annotation independently called ” image quality “ , roduct features (e.g. P for digital . And we obtain the same statistical as the third one camera) in a review are good indicators of review result (85 vs. 15) although the fourth annotator quality. , different product features may However does agree with the third annotator on some not “ e.g. , refer to the same meaning ( and ” battery life pairs. ” ), which will bring redundancy in the “ power first three the reviews treat In practice, we in the In this paper, we formulize the problem as study. ) as high - categories ( “ best ” , “ good ” and “ fair ” the “ resolution of product feature s ” . Thus, the quality reviews and as category ” bad “ the those in 338

6 problem is reduced to h determine the equ i- ow to T he average length of sentences  v a lence of a product . feature in different forms  The number of sentences with product features In (Hu and Liu, 2004), the matching of different W ord level  (WL) product features is mentioned briefly and a d- he number of words T  review the in dressed by fuzzy matching. However, there exist product  T he number of FZ50 s ( e.g., DMC - , many cases where the method fails to match the in the review EX - Z1000 ) ” , , ” power “ and battery life “ multiple mentions e.g. , review  T he number of product s in the title of a In this because it only considers string similarity . he number of brand names (  Canon, Sony) e.g., T by lev e- paper we propose to resolve the problem in the review string ” : o ne is “ surface raging two kinds of evidence  T he number of brand names in the title of a , evidence t he other is “ contextual evidence ” . review m- 1985) Ukkonen, ( edit distance We use to co the similarity between the surface strings of pare  roduct feature level P (PFL) e- tual contex and , mentions two similarity to r use  view re the he number of product features in T flect the semantic similarity between two mentions . of product features in the  he total frequency T ual When similarity, we split all using context review mention of a the reviews into sentences. For each The average frequency of product feature  s in product fea , we take it as a query and search ture the review for all the relevant sentences. e construct a Then w  a of he number of product features in the title T , by taking each unique term mention vector for the review c- in the relevant sentences as a dimension of the ve frequency total he of product features in the  T between two vectors of tor. The cosine similarity title of a review mentions t is ual present to measure the context hen 4.3.2 Features on Readability . between similarity two mentions We make use of several features at paragraph level Feature Development for Learning 4.3 lying structure of the which indicate the under Th reviews. e features include, es quality reviews, our proposed To detect low - The number of paragraphs  review the in approach explores three aspects of product reviews,  The average length of paragraphs in the review namely informativeness, subjectiveness, and e-  The number of paragraph separators in the r We denote the features employed for readability. view learning as “ ” , discriminative from learning features “ the ” we discussed above . product features ” H Pros “ ere, we refer to the keywords, such as 4.3.1 Features on Informativeness vs. “ Cons ” as “ paragraph separators ” . The ke y- the resolution of product As for informativeness, words usually appear at the beginning of par a- the rate we gene features is employed when graphs for categorizing two contrasting aspects of as listed Pairs mapping to . below learning features product. a We extract the nouns and noun phrases the same product feature will be treated as the 4 from the 909 , at the beginning of each paragraph e calculat we same product feature, when the 30 pairs of y- most frequent the and use reviews ke We frequency and the number of product features. Table 2 provides as paragraph separators. word s h proposed in (Hu and Liu, 2004 ) the approac apply some examples of the extracted separators. extract product features to . s S eparator s S eparator We also use a list of product names and a list of egative N ositive P P egative N ositive to generat e the learning features. Both brand names The Bad The Good ons C Pros lists can be collected from the Amazon site b e- B Strength T Weakness ummer humb up within a time inte r- are relatively stable cause they PLUSES MINUSES Positive Negative val. Advantages Likes Dislikes Drawbacks T he learning features on the informativeness of BAD GOOD a review . are as follows Downsides upsides The THINGS THINGS  level (SL) entence S Table 2 . Examples of paragraph separators  in the review s number of sentence he T 339

7 4.3.3 Features on Subjectiveness although the contribution is much the accuracy also take the subjectiveness of reviews into The features on subjectiveness, however, . less W e previous work Unlike . consideration make no contribution . (Kim et al, shallow using , 2006) Varadarajan 2006; Zhang and Feature Category Annotation2 Annotation1 a sentiment directly, we use information syntactic 73.59% 72.81% SL (Hu and Liu, 2004) which aggregates analysis tool Informativ e- 79.15% 80.41% WL information. The tool is a syntactic a set of shallow ness 82.37% 83.30% PFL determining the sentiment classifier capable of 83.93% 82.91% Readability . W e create three learning polarity of each sentence 83.84% 82.96% Subjectiveness subjectiveness features regarding the reviews. of - . Table 3 L ow quality review s detection of positive sentences age The percent in the  We also conduct a on analysis more detailed review each individual feature. features of categories Two  The percent age of negative sentences in the have “ poor perfo r- and ” title “ on brand name ” review in lack of information the due to is mance , which (r s sentence subjective of e-  T he percent age s brand name low coverage of the the title in a and in the review of p ositive or negative) gardle ss review , respectively . 5 Experiments Summarizing Sentiments of Reviews 5.2 s In this section, we describe our experiment with - quality review One potential application of low - the proposed classification - based approach to low . of detection is the opinion summarization reviews , and its effectiveness on detection quality review of The process of opinion summarization e- r the task of opinion summarization. product consists a views with regards to a query of of the following steps (Liu et al, 2005) : quality Reviews - Detecting Low 5.1 text 1. From each of the reviews, identify every In our proposed approach, the problem of assessing in the review segment with e- and d , opinion binary class i- a formalized as quality of reviews is ties termine the polari of the opinion segment s . ta We conduct experiments by fication problem. k- , g 2. each product feature For enerate a positive the categories of , and ” good “ ing reviews in , “ best ” of op i- opinion set and a negative opinion set quality reviews and “ fair ” as high - the those in ( POS as segments, denote d nion ) 푓 “ ” category as low - quality reviews. bad . ) 푓 ( NOS and , we utilize the As for classification model feature product m- 3. aggregate the nu , For each ( Joachims randomly , 2004) . We SVM L ight toolkit 푓 , as ) 푓 ( NOS and ) ( POS bers of segments in 100 queries of digital cameras the into two divide opinion summarization on the product feature. sets, namely a training set of 50 queries and a test set of 50 queries. For the two copies of annot a- contribute the In th is process, all the reviews he trai tions, we use the same division. n- We use t different same. However, different reviews do hold to ” annotation 1 “ ing set from train the model and authorities. A positive/negative opinion from a apply the model to the test sets from both “ a- annot the same - quality review should not have high tion 1 ” and “ annotation 2 ” , respectively . T able 3 quality review. weight as that from a low - reports the accuracies of our approach to review W e use a two - stage approach to enhance the r e- classification . r- pe The accuracy is defined as the a . liabilit T hat is, we add y of summarization centage of correctly classified reviews. review process of low quality detection before the - take the approach that utilizes only the cat e- We , summarization process so that the summarization features e- bas the as (SL) level sentence on gory of result is obtained based on the high - quality reviews incrementally add other categories of fe a- a nd line , only We . are to demonstrate how much difference tures c- and subje readability on informativeness, - the proposed two stage approach can bring into the . We c an see that both the features on word tiveness opinion summarization. (PFL) product feature level on (WL) and those level e use classification model trained as best the W improve the performance of classification can described in Section 5.1 e- to filter low - quality r can still readability eatures on The f increase . much views , and do summarization on the high - quality 340

8 reviews associated to the 50 test queries. by e d e- the two - stage approach is substantially different W - note the proposed approach and the old approach from that achieved by the one stage approach. one stage and ” - Due , respectively. “ ” stage - two “ as (+) RatioOfChange a the limited space, we only give to visual compa r- %Query >0.25 >0.05 >0.10 >0.30 >0.15 >0.20 “ in ” image quality ison of the two approaches on 22% 10% 14% 4% 4% 2% IQ Figure 6 . The upper figure shows the summariz a- 30% 50% 38% 10% 18% 14% Battery opinions tion of posit ive and the lower figure 18% 12% 28% 24% 22% 20% LCD shows that of negative opinions . From the figures 26% 42% 10% 6% 16% 20% Flash - stage approach preserves we can see that the two 26% 8% 6% 18% 12% 8% MM fewer text segments as the result of filter ing out - ( RatioOfChange ) quality product reviews. - many low %Query - 0.30 < - 0.25 < - 0.20 < - 0.15 < 0.05 < - < - 0.10 44% 10% 6% 4% 18% 14% IQ 120 One - stage - stage Two 10% 4% 4% 2% 22% 14% Battery 90 4% 8% 4% 12% 22% 28% LCD 60 4% 6% 8% 16% 18% 28% Flash 10% 8% 42% 34% 18% 16% 30 MM . on five features RatioOfChange 4 Table 0 sentences (Positive) Number of supporting s no standard way i There quality to evaluate the 28 13 46 43 40 37 34 31 49 25 22 19 16 10 7 4 1 c- s rather a subje of opinion summarization as it i QueryID tive problem. In order to demonstrate the impact of 80 approach the two - stage , we turn to external autho r- One - stage Two - stage other than Amazon.com as the o b- it source s ative 60 reference. We observe evaluation jective that 2 40 CN provides a professional “ editor’s review ” ET the for many products, which gives a rating in 20 9 10 o n product features. digital ca m- ~ range of 1 0 sentences (Negative) Number of supporting the 50 test queries are found to have the out of eras 4 10 7 13 16 19 22 25 28 31 34 37 40 43 46 49 1 “ at CN ET ” image quality . We s rating on ’ editor QueryID s of our is use th the result to compare with rating ummarization S . 6 Figure “ on image quality ” the . W e rescale opinion summarization Rate scores in To show the comparison on more features a - the one both obtained by stage approach and the compressed space, we give the statistic ratio of two - stage approac h into the range of 1 - 10 in order As for the change between two approaches instead. to perform . the comparison evaluation measure, we define “ RatioOfChange ” provides the 7 visual comparison Figure . We as, ROC) on a feature ( f can see that the stage - achieved by the two result 푓 Rate Rate ( 푓 ) − stage − one − stage two has a much approach resemblance ) closer better ( to 푓 ROC = 3 ( ) Rate ( 푓 ) stage − one - rating than one CNET stage approach does . This indicates that our two - stage approach can achieve a where is defined as , Rate ( f ) * s- more consistent summarization result to the profe 푓 ) POS | ( | Although the sional evaluations by the editors. Rate 푓 ( = ) ) ( 4 ∗ | NOS ( 푓 ) | | ) 푓 ( + POS | CNET rating is not the absolute standard for pro d- uct evaluation, it provides a professional yet obje c- shows some statistic results on ROC on 4 Table tive x- evaluation of the products. Therefore, the e “ image quality ” five product features, namely (IQ), perimental results demonstrate that our proposed “ battery ” , “ LCD screen ” (LCD), “ flash ” and “ mo v- approach could achieve more reliable opinion in The values (MM). ” ie mode the cells are the l- summarization which is closer to the generic eva percentage is larger/smaller ROC of queries whose uation from authoritative sources. than the respective thresholds. We can see that a have large portion of queries big changes on the value s of RO C . This means that the result achieved 2 http://www.cnet.com 341

9 Minqing Hu and B . Mining and Summ a- ing Liu. 2004 a 9 KDD’04 rizing Customer Reviews. . 8 7 a- Minqing Hu and Bing Liu. 2004 b . Mining Opinion Fe 6 04 . ’ AAAI tures in Customer Reviews. 5 Kalervo J . a rvelin & Jaana Kek a l a inen 2000 . IR : e valu a- One - stage Rating Score 4 Two stage - tion methods for retrieving highly relevant doc u- CNET Ground - truth 3 men 00 . ts. SIGIR ’ 4 9 1 7 6 5 8 2 3 a- Nitin Jindal and Bing Liu. 2006. Identifying Compar QueryID . SIGIR’06 tive Sentences in Text Documents. . 7 Comparison with CNE T rating Figure ing Nitin Jindal and B Liu. 2006. Mining comparative . ’ . 06 AAAI sentences and relations 6 Conclusion . SVMlight achims a- Thorsten Jo Support Vector M -- ied the problem of detecting In this paper, we stud chine. http://svmlight.joachims.org/ , 2004. low quality product reviews. Our contribution can - Soo - Min Kim, Patrick Pantel, Tim Chklovski, Marco be summarized in two - fold: (1) we discovered e- Pennacchiotti. 2006. Automatically Assessing R e x- the ground - truth three types of biases in used view Helpfulness. EMNLP ’ 06 . tensively sp e- in the existing work , and proposed a 1998, Automatic retrieval and clustering of Lin. D ekang cification on the quality of product reviews. The 98 ’ - COLING ACL similar words. . three biases that we discovered are ote v imbalance , and winner circle bias bias (2) . early bird bias , B ing Liu, M inqing 2005. Cheng. unsheng Hu, and J oot ing R on the new ground - truth (conforming to Opinion observer: analyzing and comparing opinions on the web. WWW ’0 5 . w , the proposed specification) class i- a proposed e fication h to low - quality product based - approac a- A sentimental educ Bo Pang and Lillian Lee. 2004. which yields better performance review detection, i- tion: Sentiment analysis using subjectivity summar . opinion summarization of zation based on minimum cuts. ACL’04 . future work in several We hope to explore our Bo Pang and Lillian Lee . Seeing stars: Exploiting 2005. the new areas, such as further consolidating class relationships for sentiment categorization with truth from different points - ground of view and v e- . 5 ACL’0 respect to rating scales. rifying the effectiveness of low - quality review d e- , and S. Vaithyanathan Bo Pang and Lillian Lee . 2002 . tection with other . applications Thumbs up? sentiment classification using machine . 02 ’ EMNLP learning techniques. s Reference M Extracting aria na A - 2005. Popescu and O Etzioni. Jacob Cohen . 1960. o- A coefficient of agreement for n - LT H product features and opinions from reviews. minal scales, Educational and Psychological Me a- ’ EMNLP . 05 – 20: 37 surement . 46 Peter D. Turney. 2001. Thumbs up or thumbs down?: Kushal Dave, Steve Lawrence, and David M. Pennock. i- semantic orientation applied to unsupervised classif 2003. Mining the peanut gallery: opinion extraction 02 cation of reviews. ’ ACL and semantic classification of product reviews. Esko Ukkonen. 1985. Algorithms for approximate string 03 WWW ’ . Information and Control , pp. 100 – 118. matching. Harris Drucker, Chris J.C., Burges Linda Kaufman, . Vladimir N. Vapnik. 1995 The Nature of Statistical Alex Smola and Vladimir Vapnik. 1997. Support Le arning Theory. Springer. n- . vector regression machines Advances in Neural I formation Processing Systems. Zhu Zhang and Balaji Varadarajan. 2006. Utility Sco r- ing of Product Reviews. CIKM’06 Christiane Fellbaum . 1998. WordNet: an Electronic Lexical Database, MIT Press. Ralf Herbrich, Thore Graepel, and Klaus Obermayer. 1999 . Support Vector Learning for Ordinal Regre s- sion. In Proc. of the 9th International Conference on . Artificial Neural Networks 342

Related documents

Available Classes Fall

Available Classes Fall

Available Classes Printed on May 2, 2019 at 9:55 PM. During business hours, this report should automatically update every 30 minutes. Please register for classes by clicking on "Register" located at t...

More info »
admin let

admin let

Page: 1 State of Ohio - Department of Transportation Date: 05/03/19 Bid Results Time: 2:52 PM Project Federal Completion PID Letting State Number Number State Date Date Estimate E160(998) 192003 11/15...

More info »
Conjunction Data Message

Conjunction Data Message

CCSDS His torical Document is docu ment’s Histo ri cal Th tu s indi cate s that it is no lo nger curre nt. It sta has eit her been replaced by a newer issu e or with drawn becau se it wa s deem rr ent...

More info »
airship pilot manual

airship pilot manual

NAVY STATES UNITED · K-TYPE AIRSHIPS PILOT'S MANUAL RESTRICTED CONTRACTS NOs-78121 and NOa(s)-257 GOODYEAR AIRCRAFT CORPORATION AKRON, OHIO ., . First Sept ., 1942 Issue: 43 Issue: Sept. , 19 Revised

More info »