ceas v1.dvi


1 Reply Expectation Prediction for Email Management Mark Dredze, John Blitzer, and Fernando Pereira Computer and Information Sciences Department University of Pennsylvania Philadelphia, PA 19104 mdredze | blitzer { pereira } @cis.upenn.edu | 2 System Abstract Our system relies on the intuition that a user’s previ- We reduce email overload by addressing the ous patterns of communication are indicative of future problem of waiting for a reply to one’s email. behavior [5]. While reply prediction, like spam detec- We predict whether sent and received emails tion, is a binary classification problem, they are quite necessitate a reply, enabling the user to both different. Nearly all agree on what is spam and thus it better manage his inbox and to track mail can be aggregated to obtain a large pool of (positive) sent to others. We discuss the features used training examples. By contrast, legitimate emails sent to discriminate emails, show promising initial to a group may require only one person to reply. Ad- results with a logistic regression model, and ditionally, keywords are less useful in reply prediction, outline future directions for this work. while social network factors are very good predictors. In addition to standard features such as word identity 1 Introduction and message length, we designed a variety of features (1) Dates and specifically tailored for reply prediction. Email has evolved to encompass a plethora of work- times . Emails containing dates and times are time related activity. Whittaker and Sidner [6] analyzed sensitive and might require a reply. (2) Salutations . the use of email to perform task management, per- “Dear John” or “Hi John” directly address the recip- sonal archiving, and asynchronous communication and (3) Ques- ient and might require personal attention. referred to the three as “email overload”. They con- tions . Questions indicate requests. . (4) Header fields cluded: (1) Users perform a large variety of work- The sender (for received emails), as well as the TO and related tasks with email. (2) As a result, users are CC recipients are important fields for reply prediction. overwhelmed with the amount of information in their The system’s classifier uses a logistic regression model mailbox. A quotation from interviews conducted by with a base set of features, including those above, to- [6] characterizes some frustrations: gether with feature induction [4]. The feature extrac- “Waiting to hear back from another ... employee can tion components are integrated with the IRIS applica- mean delays in accomplishing a particular task, which tion framework as part of the CALO project [1]. can ... have significant impact on our overall opera- tions. ... it can be critical or just frustrating.” 3 Evaluation “One of my pet-peeves is when someone does not get back to me, but I am one of the worst offenders. I get We evaluated our predictors on spam-free inboxes so many emails ... that I cannot keep up.” sent mail of two UPenn computer science grad- and uate students. We detected replies by matching the In this work, we address the issue of waiting to hear fields of a message with references and in-reply-to back from others by learning to predict whether emails field of potential parents. User 1 re- the Message-ID need replies. Our system identifies incoming messages ceived 1218 messages and replied to 449 of them. He that require a reply, providing another means of pri- sent 637 messages, and received replies to 215 of those. oritizing emails in a cluttered mailbox. Similarly, the User 2 received 596 messages and replied to 129 of system tracks outgoing messages to which it thinks them. He sent 323 messages and received replies to 91 the user expects a reply so as to maintain a list of of those. outstanding requests for follow-up.

2 ROC Curves: Sent Mail ROC Curves: Received Mail 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 User 1 True Positive Rate True Positive Rate 0.3 0.3 User 2 User 1 User 2 0.2 0.2 0.1 0.1 0 0 0.7 0.8 0.9 0.98 0.4 0.5 0.6 0.2 0.7 0.8 0.3 0.88 0.07 0.05 0.2 0.3 0.4 0.5 0.6 False Positive Rate False Positive Rate ROC curves for reply prediction on the received and sent mail of two UPenn graduate students Figure 1: We also plan to develop a GUI in conjunction with the Figure 1 shows ROC curves for 2 users on both of our prediction tasks. The curves are generated by weight- IRIS platform. A GUI should integrate user feedback and perhaps use reply prediction as a proxy for mes- ing negative (unreplied) instances. The false positive sage priority. We intend to investigate the possibility rate is the percentage of emails the classifier marked as replied, but were not actually replied to. The true of treating message priority prediction as an instance positive rate is the percentage of replied emails which ranking problem. Priority may indicate email reply were correctly identified as replied. By tuning the un- time, specifically what replies must be sent first. replied weight we can effectively trade off a low false positive rate for a high true positive rate. For example, 5 Acknowledgments we see that in order to correctly find 80% of user 1’s replied emails, 50% of the emails that we mark will be We thank Kuzman Ganchev for data collection, and incorrect. Each point on the curves is an average over Colin Evans, Girish Acharya, and Rich Giuli from SRI 10 9-1 random splits of the received and sent messages. for guidance. Dredze is supported by a NDSEG fel- lowship. This material is based upon work supported User 1 data performed better than user 2 data for sent by the Defense Advanced Research Projects Agency mail , perhaps because of less sent mail data from user (DARPA), through the Department of the Interior, 2. Additionally, user 2 data represented mostly per- NBC, Acquisition Services Division, under Contract sonal communications, while user 1 data were mostly No. NBCHD030010. work related. Work mail may be easier to predict be- cause it may be more structured and contain more ex- plicit requests. More analysis is needed to determine References the performance differentials. [1] Cognitive agent that learns and organizes. http://ai.sri.com/project/CALO, 2003. 4 Future Work [2] A. Culotta, R. Bekkerman, and A. McCallum. Extract- ing social networks and contact information from email Reply prediction is a difficult task, and while the ini- , Mountain View, CA, 2004. and the web. CEAS ’04 tial results are promising, there is room for improve- [3] L. Dabbish, R. Kraut, S. Fussell, and S. Kiesler. Under- ment. The context of an email is critical to predicting standing email usage: Predicting action on a message. whether or not it will be replied to, and while some of CHI ’05 , Portland, OR, 2005. the features we introduce serve as proxies for context, [4] A. McCallum. Efficiently inducing features of condi- we believe that important information is still missing. UAI ’03 tional random fields. , pages 403–410, San One goal is to incorporate social network analysis such Francisco, CA, 2003. Morgan Kaufmann Publishers. as that of [2]. Another is to incorporate a notion [5] J. Tyler and J. Tang. When can I expect an email of thread activity, under the assumption that active , 2003. response? ECSCW ’03 threads are likely to remain active. Additionally, [3] [6] S. Whittaker and C. Sidner. Email overload: exploring presents a survey of email users that yield features for , personal information management of email. CHI ’96 reply prediction. Finally, an analysis of the features is pages 276–283. ACM Press, 1996. needed to determine the most effective predictors.

Related documents