entitylinking tacl15


1 Design Challenges for Entity Linking Sameer Singh Xiao Ling Daniel S. Weld University of Washington, Seattle WA { @cs.washington.edu xiaoling,sameer,weld } provide semantic annotations to human readers Abstract but also a machine-consumable representation of Recent research on entity linking (EL) has in- the most basic semantic knowledge in the text. troduced a plethora of promising techniques, Many other NLP applications can benefit from ranging from deep neural networks to joint in- such links, such as distantly-supervised relation ference. But despite numerous papers there extraction (Craven and Kumlien, 1999; Riedel et al., is surprisingly little understanding of the state 2010; Hoffmann et al., 2011; Koch et al., 2014) that of the art in EL. We attack this confusion by uses EL to create training data, and some coreference analyzing differences between several versions of the EL problem and presenting a simple systems that use EL for disambiguation (Hajishirzi yet effective, modular, unsupervised system, et al., 2013; Zheng et al., 2013; Durrett and V called INCULUM , for entity linking. We con- Klein, 2014). Unfortunately, in spite of numerous duct an extensive evaluation on nine data sets, papers on the topic and several published data with two state-of-the- V comparing INCULUM sets, there is surprisingly little understanding about art systems, and elucidate key aspects of the state-of-the-art performance. system that include mention extraction, candi- We argue that there are three reasons for this con- date generation, entity type prediction, entity coreference, and coherence. there is no standard definition of the fusion. First, problem. A few variants have been studied in the liter- ature, such as Wikification (Milne and Witten, 2008; 1 Introduction Ratinov et al., 2011; Cheng and Roth, 2013) which Entity Linking (EL) is a central task in information aims at linking noun phrases to Wikipedia entities extraction — given a textual passage, identify entity and Named Entity Linking (aka Named Entity Dis- mentions (substrings corresponding to world entities) ambiguation) (McNamee and Dang, 2009; Hoffart et and link them to the corresponding entry in a given al., 2011) which targets only named entities. Here Knowledge Base (KB, e.g. Wikipedia or Freebase). Entity Linking we use the term as a unified name for For example, both problems, and (NEL) for Named Entity Linking the subproblem of linking only named entities. But begins direct service between JetBlue names are just one part of the problem. For many Barnstable Airport and JFK International . variants there are no annotation guidelines for scor- ing links. What types of entities are valid targets? Here, “JetBlue” should be linked to the en- When multiple entities are plausible for annotating KB:JetBlue , tity “Barnstable Airport” to a mention, which one should be chosen? Are nested KB:Barnstable Municipal Airport , and mentions allowed? Without agreement on these is- “JFK International” to KB:John F. Kennedy 1 sues, a fair comparison is elusive. International Airport . The links not only Secondly, it is almost impossible to assess ap- 1 , to indicate an KB:Entity We use typewriter font, e.g., proaches, because systems are rarely compared using entity in a particular KB, and quotes, e.g., “Mention”, to denote the same data sets. textual mentions. For instance, Hoffart et al. (2011) 315 Transactions of the Association for Computational Linguistics, vol. 3, pp. 315–328, 2015. Action Editor: Kristina Toutanova. Submission batch: 11/2014; Revision batch 3/2015; Published 6/2015. c © 2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 license.

2 developed a new data set (AIDA) based on the 2 No Standard Benchmark CoNLL 2003 Named Entity Recognition data set In this section, we describe some of the key differ- but failed to evaluate their system on MSNBC previ- ences amongst evaluations reported in existing litera- ously created by (Cucerzan, 2007); Wikifier (Cheng ture, and propose a candidate benchmark for EL. and Roth, 2013) compared to the authors’ previous system (Ratinov et al., 2011) using the originally se- 2.1 Data Sets lected datasets but didn’t evaluate using AIDA data. Nine data sets are in common use for EL evaluation; Finally, when two end-to-end systems are com- we partition them into three groups. The UIUC group it is rarely clear which aspect of a system pared, (ACE and MSNBC datasets) (Ratinov et al., 2011), makes one better than the other. This is especially AIDA group (with dev and test sets) (Hoffart et problematic when authors introduce complex mech- al., 2011), and TAC-KBP group (with datasets rang- anisms or nondeterministic methods that involve ing from the 2009 through 2012 competitions) (Mc- learning-based reranking or joint inference. Namee and Dang, 2009). Their statistics are summa- To address these problems, we analyze several sig- 3 rized in Table 1 . nificant inconsistencies among the data sets. To have Our set of nine is not exhaustive, but most other a better understanding of the importance of various e.g. datasets, CSAW (Kulkarni et al., 2009) and techniques, we develop a simple and modular, un- AQUAINT (Milne and Witten, 2008), annotate com- V supervised EL system, . We compare INCULUM mon concepts in addition to named entities. As we to the two leading sophisticated EL sys- INCULUM V argue in Sec. 3.1, it is extremely difficult to define an- tems on a comprehensive set of nine datasets. While notation guidelines for common concepts, and there- our system does not consistently outperform the best fore they aren’t suitable for evaluation. For clarity, EL system, it does come remarkably close and serves this paper focuses on linking named entities. Sim- as a simple and competitive baseline for future re- ilarly, we exclude datasets comprising Tweets and search. Furthermore, we carry out an extensive ab- other short-length documents, since radically differ- lation analysis, whose results illustrate 1) even a ent techniques are needed for the specialized corpora. near-trivial model using CrossWikis (Spitkovsky and Table 2 presents a list of recent EL publications Chang, 2012) performs surprisingly well, and 2) in- showing the data sets that they use for evaluation. corporating a fine-grained set of entity types raises The sparsity of this table is striking — apparently no that level even higher. In summary, we make the system has reported the performance data from all following contributions: three of the major evaluation groups. • We analyze the differences among several versions 2.2 Knowledge Base of the entity linking problem, compare existing Existing benchmarks have also varied considerably in data sets and discuss annotation inconsistencies the knowledge base used for link targets. Wikipedia between them. (Sections 2 & 3) has been most commonly used (Milne and Wit- ten, 2008; Ratinov et al., 2011; Cheng and Roth, • We present a simple yet effective, modular, unsu- 2013), however datasets were annotated using dif- pervised system, V INCULUM , for entity linking. ferent snapshots and subsets. Other KBs include We make the implementation open source and pub- Yago (Hoffart et al., 2011), Freebase (Sil and Yates, 2 (Section 4) licly available for future research. 2013), DBpedia (Mendes et al., 2011) and a subset of Wikipedia (Mayfield et al., 2012). Given that al- to 2 state-of-the-art sys- • We compare V INCULUM most all KBs are descendants of Wikipedia, we use tems on an extensive evaluation of 9 data sets. We 4 Wikipedia as the base KB in this work. also investigate several key aspects of the system 3 including mention extraction, candidate genera- An online appendix containing details of the datasets is avail- tion, entity type prediction, entity coreference, and able at https://github.com/xiaoling/vinculum/ raw/master/appendix.pdf . coherence between entities. (Section 5) 4 Since the knowledge bases for all the data sets were around 2 http://github.com/xiaoling/vinculum 2011, we use Wikipedia dump 20110513. 316

3 Group Data Set Entity Types KB # of NILs Eval. Metric # of Mentions ACE Wikipedia 0 BOC F1 244 Any Wikipedia Topic UIUC Any Wikipedia Topic BOC F1 Wikipedia 654 0 MSNBC 5917 Yago 1126 Accuracy AIDA-dev PER,ORG,LOC,MISC AIDA PER,ORG,LOC,MISC Yago AIDA-test Accuracy 5616 1131 T T TAC09 ,ORG PER 3904 TAC ⊂ Wiki 2229 Accuracy ,GPE T T PER ⊂ ,ORG TAC10 ,GPE TAC 2250 Wiki 1230 Accuracy T T TAC KBP Wiki TAC10T ,GPE TAC ⊂ PER 426 Accuracy 1500 ,ORG T T 3 TAC11 2250 ,GPE TAC ⊂ PER 1126 B ,ORG + F1 Wiki 3 T T TAC12 B ,ORG TAC 2226 Wiki 1049 ,GPE PER + F1 ⊂ Table 1: Characteristics of the nine NEL data sets. Entity types: The AIDA data sets include named entities in four NER classes, T Person (PER), Organization (ORG), Location (LOC) and Misc. In TAC KBP data sets, both Person (PER ) and Organization entities T ) are defined differently from their NER counterparts and geo-political entities (GPE), different from LOC, exclude places (ORG . KB (Sec. 2.2): The knowledge base used when each data was being developed. Evaluation like KB:Central California 3 B + F1 used Metric (Sec. 2.3): Bag-of-Concept F1 is used as the evaluation metric in (Ratinov et al., 2011; Cheng and Roth, 2013). in TAC KBP measures the accuracy in terms of entity clusters, grouped by the mentions linked to the same entity. AIDA-test TAC09 TAC10 TAC11 TAC12 Data Set ACE MSNBC AQUAINT CSAW x Cucerzan (2007) Milne and Witten (2008) x Kulkarni et al. (2009) x x x Ratinov et al. (2011) x x Hoffart et al. (2011) x x Han and Sun (2012) x He et al. (2013a) x x x x x He et al. (2013b) x x x x Cheng and Roth (2013) Sil and Yates (2013) x x x Li et al. (2013) x x Cornolti et al. (2013) x x x x x x x TAC-KBP participants Table 2: A sample of papers on entity linking with the data sets used in each paper (ordered chronologically). TAC-KBP proceedings comprise additional papers (McNamee and Dang, 2009; Ji et al., 2010; Ji et al., 2010; Mayfield et al., 2012). Our intention is not to exhaust related work but to illustrate how sparse evaluation impedes comparison. NIL entities: 2.3 Evaluation Metrics In spite of Wikipedia’s size, there are many real-world entities that are absent from the While a variety of metrics have been used for evalu- KB. When such a target is missing for a mention, it ation, there is little agreement on which one to use. NIL entity (McNamee and Dang, is said to link to a However, this detail is quite important, since the 2009) (aka out-of-KB or unlinkable entity (Hoffart choice of metric strongly biases the results. We de- et al., 2014)). In the TAC KBP, in addition to deter- scribe the most common metrics below. mining if a mention has no entity in the KB to link, Bag-of-Concept F1 (ACE, MSNBC): For each all the mentions that represent the same real world document, a gold bag of Wikipedia entities is evalu- entities must be clustered together. Since our focus is ated against a bag of system output entities requiring not to create new entities for the KB, NIL clustering exact segmentation match. This metric may have its is beyond the scope of this paper. The AIDA data historical reason for comparison but is in fact flawed sets similarly contain such NIL annotations whereas since it will obtain 100% F1 for an annotation in ACE and MSNBC omit these mentions altogether. which every mention is linked to the wrong entity, We only evaluate whether a mention with no suitable but the bag of entities is the same as the gold bag. entity in the KB is predicted as NIL. Micro Accuracy (TAC09, TAC10, TAC10T): For a list of given mentions, the metric simply measures 317

4 the percentage of correctly predicted links. seems KB:American football At first glance, 3 the gold-standard link. However, there is another B + F1 TAC-KBP (TAC11, TAC12): The men- KB:College football entity , which is clearly tions that are predicted as NIL entities are required to also, if not more, appropriate. If one argues be clustered according to their identities (NIL cluster- should be the right KB:College football that ing). The overall data set is evaluated using a entity 3 choice given the context, what if KB:College cluster-based B + F1. does not exist in the KB? Should NIL be football (AIDA): Similar to official CoNLL NER-style F1 5 returned in this case? The question is unanswered. NER F1 evaluation, a link is considered correct only For the rest of this paper, we focus on the (better and if the mention matches the gold boundary the 6 defined) problem of solely linking named entities. linked entity is also correct. A wrong link with the AQUAINT and CSAW are therefore not used for eval- correct boundary penalizes both precision and recall. uation due to an disproportionate number of common We note that Bag-of-Concept F1 is equivalent to concept annotations. the measure for Concept-to-Wikipedia task proposed in (Cornolti et al., 2013) and NER-style F1 is the 3.2 How Specific Should Linked Entities Be? same as strong annotation match. In the experiments, we use the official metrics for the TAC data sets and It is important to resolve disagreement when more NER-style F1 for the rest. The TAC- than one annotation is plausible. KBP annotation guidelines (tac, 2012) specify 3 No Annotation Guidelines that different iterations of the same organization (e.g. the KB:111th U.S. Congress and the Not only do we lack a common data set for evalua- KB:112th U.S. Congress ) should not be con- define tion, but most prior researchers fail to even the sidered as distinct entities. Unfortunately, this is not problem under study, before developing algorithms. a common standard shared across the data sets, where Often an overly general statement such as annotat- often the most specific possible entity is preferred. ing the mentions to “referent Wikipedia pages” or “corresponding entities” is used to describe which Example 2 Adams and Platt are both injured and entity link is appropriate. This section shows that World Cup will miss England’s opening qualifier failure to have a detailed annotation guideline causes against Moldova on Sunday. (AIDA) a number of key inconsistencies between data sets. A few assumptions are subtly made in different papers, Here the mention “World Cup” is labeled as which makes direct comparisons unfair and hard to , a specific occur- KB:1998 FIFA World Cup comprehend. rence of the event KB:FIFA World Cup . It is indeed difficult to decide how specific the gold 3.1 Entity Mentions: Common or Named? link should be. Given a static knowledge base, which is often incomplete, one cannot always find the most Which entities deserve links? Some argue for re- specific entity. For instance, there is no Wikipedia named entities. Others argue that any stricting to KB:116th U.S. Congress page for the be- be linked to a Wikipedia entity adds phrase that can cause the Congress has not been elected yet. On value. Without a clear answer to this issue, any data the other hand, using general concepts can cause set created will be problematic. It’s not fair to pe- troubles for machine reading. Consider president-of nalize a NEL system for skipping a common noun relation extraction on the following sentence. phrases; nor would it be fair to lower the precision of a system that “incorrectly” links a common concept. is the Senate President in Example 3 Joe Biden However, we note that including mentions of com- the 113th United States Congress . mon concepts is actually quite problematic, since the 5 Note that linking common noun phrases is closely related choice is highly subjective. to Word Sense Disambiguation (Moro et al., 2014). 6 In December 2008, Hoke was hired Example 1 We define named entity mention extensionally: any name uniquely referring to one entity of a predefined class, e.g. a coach at San Diego State Uni- football as the head specific person or location. versity. (Wikipedia) 318

5 Person Location pared to the AIDA entity types, it is obvious that TAC- TAC Person KBP is more restrictive, since it does not have Misc. TAC GPE (Geo- political Entities) KB:FIFA World Cup ). Moreover, entities (e.g. Common Concepts Brain_Tumor , E.g. , etc. Water , Desk TAC entities don’t include fictional characters or KB:Sherlock Holmes organizations, such as . TAC Organization TAC GPEs include some geographical regions, such Organization Misc. as KB:France , but exclude those without govern- Figure 1: Entities divided by their types. For named enti- ments, such as or lo- KB:Central California ties, the solid squares represent 4 CoNLL(AIDA) classes; 8 cations such as KB:Murrayfield Stadium . the red dashed squares display 3 TAC classes; the shaded Figure 1 summarizes the substantial differences be- rectangle depicts common concepts. tween the two type sets. Failure to distinguish different Congress iterations would cause an information extraction system to 3.5 Can Mention Boundaries Overlap? KB:Joe Biden is falsely extracting the fact that We often see one entity mention nested in another. KB:United States the Senate President of the For instance, a U.S. city is often followed by its state, at all times! Congress such as “Portland, Oregon”. One can split the whole mention to individual ones, “Portland” for the city 3.3 Metonymy and “Oregon” for the city’s state. AIDA adopts this Another situation in which more than one annotation segmentation. However, annotations in an early TAC- is plausible is metonymy, which is a way of referring KBP dataset (2009) select the whole span as the men- to an entity not by its own name but rather a name of tion. We argue that all three mentions make sense. some other entity it is associated with. A common In fact, knowing the structure of the mention would example is to refer to a country’s government using facilitate the disambiguation (i.e. the state name pro- its capital city. vides enough context to uniquely identify the city Example 4 Moscow ’s as yet undisclosed propos- entity). Besides the mention segmentation, the links als on Chechnya’s political future have , mean- for the nested entities may also be ambiguous. while, been sent back to do the rounds of various Example 5 Dorothy Byrne, a state coordinator government departments. (AIDA) for the Florida , said she had been in- Green Party The mention here, “Moscow”, is labeled as undated with angry phone calls and e-mails from KB:Government of Russia in AIDA. If this Democrats, but has yet to receive one regretful sentence were annotated in TAC-KBP, it would have note from a Nader voter. KB:Moscow (the city) instead. Even been labeled as The gold annotation from ACE is KB:Green the country KB:Russia seems to be a valid label. even though the mention Party of Florida However, neither the city nor the country can ac- doesn’t contain “Florida” and can arguably be linked make a proposal tually . The real entity in play is to KB:US Green Party . KB:Government of Russia . 4 A Simple & Modular Linking Method 3.4 Named Entities, But of What Types? , a simple, INCULUM In this section, we present V Even in the data sets consisting of solely named en- unsupervised EL system that performs compara- tities, the types of the entities vary and therefore INCULUM V bly to the state of the art. As input, the data distribution differs. TAC-KBP has a clear takes a plain-text document d and outputs a set of definition of what types of entities require links, segmented mentions with their associated entities namely Person, Organization and Geo-political enti- INCULUM V . } m A ) = { ( begins with mention ,l i i d ties. AIDA, which adopted the NER data set from the m extraction. For each identified mention , candi- CoNLL shared task, includes entities from 4 classes, 7 are generated for linking. } date entities C c = { j m Person, Organization, Location and Misc. Com- assigns each candidate a linking score V INCULUM 7 http://www.cnts.ua.ac.be/conll2003/ner/ 8 annotation.txt http://nlp.cs.rpi.edu/kbp/2014/elquery.pdf 319

6 All possible entities less context CrossWikis dictionary, which was computed from Mention Candidate Generation a Google crawl of the Web (Spitkovsky and Chang, phrases 2012). The dictionary contains more than 175 million Entity Type sentence unique strings with the entities they may represent. Coreference document In the literature, the dictionary is often built from world knowledge Coherence the anchor links within the Wikipedia website (e.g., more context One most likely entity (Ratinov et al., 2011; Hoffart et al., 2011)). In addition, we employ two small but precise dic- Figure 2: The process of finding the best entity for tionaries for U.S. state abbreviations and demonyms a mention. All possible entities are sifted through as V INCULUM proceeds at each stage with a widening range when the mention satisfies certain conditions. For of context in consideration. U.S. state abbreviations, a comma before the men- tion is required. For demonyms, we ensure that the c s | ( ) based on the entity type compatibility, its m,d j mention is either an adjective or a plural noun. coreference mentions, and other entity links around this mention. The candidate entity with the maxi- 4.3 Incorporating Entity Types s mum score, i.e. l = arg max ( c | m,d ) , is picked as C c ∈ m For an ambiguous mention such as “Washington”, the predicted link of m . knowing that the mention denotes a person allows an Figure 2 illustrates the linking pipeline that follows KB:George Washington EL system to promote while lowering the rank of the capital city in the candi- INCULUM V mention extraction. For each mention, date list. We incorporate this intuition by combining ranks the candidates at each stage based on an ever it probabilistically with the CrossWikis prior. widening context. For example, candidate generation ∑ ∑ (Section 4.2) merely uses the mention string, entity c ( p ) = m,s | ) = m,s | c,t ( p , m,t,s c | p ) ) m,s | t ( p ( typing (Section 4.3) uses the sentence, while corefer- t ∈ T t ∈ T ence (Section 4.4) and coherence (Section 4.5) use denotes the sentence containing this men- s where the full document and Web respectively. Our pipeline tion T m and represents the set of all possible types. mimics the sieve structure introduced in (Lee et al., c and the sentential con- We assume the candidate 2013), but instead of merging coreference clusters, text s are conditionally independent if both the men- we adjust the probability of candidate entities at each tion and its type are given. In other words, t m stage. The modularity of enables us to V INCULUM p , the RHS of which can be ) m,t | ( ( p ) = m,t,s | c c study the relative impact of its subcomponents. ) m | c ( p estimated by renormalizing : t w.r.t. type 4.1 Mention Extraction ( ) m | c p ∑ ( ) = m,t | c p , The first step of EL extracts potential mentions from ) m | c ( p t 7→ c restricts attention V INCULUM the document. Since to named entities, we use a Named Entity Recogni- c ’s entity indicates that t 7→ c where is one of t 9 tion (NER) system (Finkel et al., 2005). Alternatively, t ( p The other part of the equation, , types. ) m,s | an NP chunker may be used to identify the mentions. can be estimated by any off-the-shelf Named Entity Finkel et al. (2005) and Ling e.g. Recognition system, 4.2 Dictionary-based Candidate Generation and Weld (2012). While in theory a mention could link to any entity in 4.4 Coreference the KB, in practice one sacrifices little by restricting attention to a subset (dozens) precompiled using a It is common for entities to be mentioned more than dictionary. A common way to build such a dictionary once in a document. Since some mentions are less is by crawling Web pages and aggregating anchor D ambiguous than others, it makes sense to use the links that point to Wikipedia pages. The frequency 9 We notice that an entity often has multiple appropriate types, m , links to a par- with which a mention (anchor text), e.g. a school can be either an organization or a location depend- , allows one to estimate c ticular entity (anchor link), ing on the context. We use Freebase to provide the entity types the conditional probability p ) . We adopt the m | c ( and map them appropriately to the target type set. 320

7 c most representative mention for linking. To this end, p ) ( didate is the sum of coherence | P and type d φ INCULUM V applies a coreference resolution system compatibility p ( c | m,s ) . ( e.g. Lee et al. (2013)) to cluster coreferent mentions. Two coherence measures have been found to be The representative mention of a cluster is chosen for useful: Normalized Google Distance (NGD) (Milne 10 linking. While there are more sophisticated ways to and Witten, 2008; Ratinov et al., 2011) and rela- integrate EL and coreference (Hajishirzi et al., 2013), tional score (Cheng and Roth, 2013). NGD be- ’s pipeline is simple and modular. V INCULUM c and tween two entities c is defined based on the j i link structure between Wikipedia articles as follows: ∩ − log( | | L L | ) )) log(max( | L L | , | i i i j 4.5 Coherence ,c ) = 1 − ( φ c j i NGD log( L L | , | )) | | log(min( − ) W i i L and L are the incoming (or outgoing) links where When KB:Barack Obama appears in a document, i j c respectively c in the Wikipedia articles for and it is more likely that the mention “Washington” rep- i j is the total number of entities in Wikipedia. W and as resents the capital KB:Washington, D.C. The relational score between two entities is a binary the two entities are semantically related, and hence indicator whether a relation exists between them. We . A number of re- the joint assignment is coherent 11 as the source of the relation triples use Freebase searchers found inclusion of some version of coher- . Relational coherence { = ) } ( φ sub,rel,obj F ence is beneficial for EL (Cucerzan, 2007; Milne REL is thus defined as and Witten, 2008; Ratinov et al., 2011; Hoffart et al., { 2011; Cheng and Roth, 2013). For incorporating it in ( e ,r,e 1 ) or ( e ∃ ,r,e ) ∈ r, F j i j i INCULUM V , we seek a document-wise assignment e ,e φ ( ) = i j REL of entity links that maximizes the sum of the coher- 0 otherwise. ence scores between each pair of entity links pre- ∑ 5 Experiments ,l dicted in the document d i.e. , l ( φ ) m m j i ≤ M ≤| | i

8 1 ACE MSNBC AIDA-test AIDA-dev P P R P R R R P 0.8 87.1 74.0 89.0 75.6 NER 89.7 10.9 77.7 65.5 k 0.6 +NP 92.2 21.8 94.7 21.2 90.2 12.4 96.0 2.4 90.8 9.3 95.8 14.0 93.8 13.5 96.8 1.8 +DP 0.4 95.9 9.4 98.0 1.2 +NP+DP 92.0 5.8 94.1 9.4 Recall@ CrossWikis Intra−Wikipedia 0.2 Table 3: Performance(%, R: Recall; P: Precision) of Freebase Search the correct mentions using different mention extraction 0 Inf 100 50 30 20 10 5 3 1 strategies. ACE and MSNBC only annotate a subset of all k the mentions and therefore the absolute values of precision Figure 3: Recall@ k on an aggregate of nine data sets, are largely underestimated. comparing three candidate generation methods. 1 alone, is used to detect mentions. Some of the miss- 0.8 ing mentions are noun phrases without capitalization, MSNBC ACE a well-known limitation of automated extractors. To 0.6 AIDA−dev AIDA−test recover them, we experiment with an NP chunker TAC09 0.4 TAC10 12 Recall@k and a deterministic noun phrase extractor (NP) TAC10T 0.2 TAC11 based on parse trees (DP). Although we expect them TAC12 0 to introduce spurious mentions, the purpose is to esti- 100 30 20 10 5 1 Inf 2 50 3 k mate an upper bound for mention recall. The results using CrossWikis for candidate gen- Figure 4: Recall@ k confirm the intuition: both methods improve recall, is chosen to be the cut-off 30 eration, split by data set. but the effect on precision is prohibitive. Therefore, value in consideration of both efficiency and accuracy. we only use NER in subsequent experiments. Note that the recall of mention extraction is an upper bound lower coverage of the gold candidates compared to 15 of the recall of end-to-end predictions. . Freebase API offers a better coverage CrossWikis than the intra-Wikipedia dictionary but is less effi- 5.2 Candidate Generation cient than CrossWikis. In other words, Freebase API In this section, we inspect the performance of can- needs a larger cut-off value to include the gold entity didate generation. We compare CrossWikis with an in the candidate set. 13 and Freebase Search intra-Wikipedia dictionary Using CrossWikis for candidate generation, we 14 . Each candidate generation component takes API plot the recall@ k curves per data set (Figure 4). To a mention string as input and returns an ordered list our surprise, in most data sets, CrossWikis alone can of candidate entities representing the mention. The achieve more than 70% recall@1. The only excep- candidates produced by Crosswikis and the intra- tions are TAC11 and TAC12 because the organizers Wikipedia dictionary are ordered by their conditional intentionally selected the mentions that are highly probabilities given the mention string. Freebase API ambiguous such as “ABC” and/or incomplete such as provides scores for the entities using a combination “Brown”. For efficiency, we set a cut-off threshold at of text similarity and an in-house entity relevance 80% recall for all but one data set). Note that 30 ( > score. We compute candidates for the union of all Crosswikis itself can be used a context-insensitive EL the non-NIL mentions from all 9 data sets and mea- system by looking up the mention string and predict- . From Figure 3, it k sure their efficacy by recall@ ing the entity with the highest conditional probability. is clear that CrossWikis outperforms both the intra- The second row in Table 4 presents the results using Wikipedia dictionary and Freebase Search API for this simple baseline. Crosswikis alone, using only the almost all . The intra-Wikipedia dictionary is on a k mention string, has a fairly reasonable performance. par with CrossWikis at k but in general has a = 1 15 We also compared to another intra-Wikipedia dictionary 12 OpenNLP NP Chunker: opennlp.apache.org (Table 3 in (Ratinov et al., 2011)). A recall of 86.85% and 13 adopted from A (Hoffart et al., 2011) IDA 88.67% is reported for ACE and MSNBC, respectively, at a cut- 14 off level of 20. CrossWikis has a recall of 90.1% and 93.3% at https://www.googleapis.com/freebase/v1/ the same cut-off. candidates per query. 220 , restricted to no more than search 322

9 Approach AIDA-dev AIDA-test TAC09 TAC10 TAC10T TAC11 TAC12 ACE MSNBC 85.6 CrossWikis only 62.4 62.6 60.4 87.7 70.3 86.9 80.4 78.5 83.3 76.6 61.1 66.4 66.2 77.0 71.8 79.2 +NER 85.1 86.1 86.9 78.8 63.5 66.7 64.6 87.7 +FIGER 81.0 75.4 85.7 88.0 80.1 66.7 72.6 72.0 89.3 83.3 +NER(GOLD) 87.4 87.4 88.8 81.6 66.1 76.2 76.5 91.8 84.1 +FIGER(GOLD) 89.0 incorporating entity types , comparing two sets of entity types (NER and FIGER). Table 4: Performance (%) after Using a set of fine-grained entity types (FIGER) generally achieves better results. mention (the oracle types of its gold entity). The per- 5.3 Incorporating Entity Types formance is significantly boosted with the assistance Here we investigate the impact of the entity types from the gold types, which suggests that a better per- on the linking performance. The most obvious forming NER/FIGER system can further improve choice is the traditional NER types ( = T NER performance. Similarly, we notice that the results , MISC } ). To predict the types of { PER , ORG , LOC using FIGER types almost consistently outperform the mentions, we run Stanford NER (Finkel et al., the ones using NER types. This observation endorses of each mention t 2005) and set the predicted type m our previous recommendation of using fine-grained ). As to to have probability 1 ( ( t i.e. | m,s p ) = 1 m m types for EL tasks. the types of the entities, we map their Freebase types 16 to the four NER types . 5.4 Coherence A more appropriate choice is 112 fine-grained en- Two coherence measures suggested in Section 4.5 are tity types introduced by Ling and Weld (2012) in 17 tested in isolation to better understand their effects in . These fine- FIGER, a publicly available package terms of the linking performance (Table 5). In gen- grained types are not disjoint, i.e. each mention is eral, the link-based NGD works slightly better than allowed to have more than one type. For each men- the relational facts in 6 out of 9 data sets (comparing tion, FIGER returns a set of types, each of which row “+NGD” with row “+REL”). We hypothesize ( ) : t ,g { ) = ( is accompanied by a score, t m j j FIGER that the inferior results of REL may be due to the in- ∈ T t } . A softmax function is used to proba- j FIGER completeness of Freebase triples, which makes it less bilistically interpret the results as follows: robust than NGD. We also combine the two by taking { 1 ∈ ( m ) , if exp( g ) ( t ,g ) t the average score, which in most data set performs j j j FIGER Z ) = p ( t | m,s j the best (“+BOTH”), indicating that two measures 0 otherwise provide complementary source of information. ∑ Z = where . ) g exp( k ∈ ,g ) m t ( ( ) t FIGER k k 5.5 Overall Performance We evaluate the utility of entity types in Table 4, which shows that using NER typically worsens the To answer the last question of how well does performance. This drop may be attributed to the V INCULUM perform overall, we conduct an end-to- rigid binary values for type incorporation; it is hard end comparison against two publicly available sys- to output the probabilities of the entity types for a 18 tems with leading performance: mention given the chain model adopted in Stanford A IDA (Hoffart et al., 2011): We use the recom- NER. We also notice that FIGER types consistently package (Ver- A mended GRAPH variant of the IDA improve the results across the data sets, indicating sion 2.0.4) and are able to replicate their results when that a finer-grained type set may be more suitable for gold-standard mentions are given. the entity linking task. 18 We are also aware of other systems such as TagMe-2 (Fer- To further confirm this assertion, we simulate the ragina and Scaiella, 2012), DBpedia Spotlight (Mendes et al., scenario where the gold types are provided for each 2011) and WikipediaMiner (Milne and Witten, 2008). A trial 16 The Freebase types “/person/*” are mapped to PER, “/lo- Wikifier AIDA and test on the AIDA data set shows that both cation/*” to LOC, “/organization/*” plus a few others like tops the performance of other systems reported in (Cornolti et “/sports/sports team” to ORG, and the rest to MISC. al., 2013) and therefore it is sufficient to compare with these two 17 http://github.com/xiaoling/figer systems in the evaluation. 323

10 Approach TAC09 TAC10 TAC10T TAC11 TAC12 ACE MSNBC AIDA-dev AIDA-test 80.9 86.2 78.6 59.9 68.9 66.3 87.7 86.6 no COH 87.0 85.7 79.7 63.2 69.5 67.7 88.1 86.8 81.8 +NGD 86.8 86.3 87.0 79.3 63.1 69.1 66.4 88.5 86.1 +REL 81.2 81.4 86.8 79.9 63.7 69.4 67.5 88.5 86.9 +BOTH 87.0 coherence measures Table 5: Performance (%) after re-ranking candidates using coherence scores, comparing two (NGD and REL). “no COH”: no coherence based re-ranking is used. “+BOTH”: an average of two scores is used for re-ranking. Coherence in general helps: a combination of both measures often achieves the best effect and NGD has a slight advantage over REL. Approach TAC09 TAC10 TAC10T TAC11 TAC12 AIDA-dev AIDA-test ACE MSNBC Overall 80.4 85.6 86.9 78.5 62.4 62.6 62.4 87.7 70.3 75.0 CrossWikis 81.0 +FIGER 86.9 78.8 63.5 66.7 64.5 87.7 75.4 76.7 86.1 87.7 80.9 78.6 59.9 68.9 66.3 87.0 86.6 78.0 86.2 +Coref +Coherence 79.0 81.4 86.8 87.0 79.9 63.7 69.4 67.5 88.5 86.9 INCULUM =V IDA 73.2 78.6 77.5 68.4 52.0 71.9 74.8 77.8 75.4 72.2 A 72.1 IKIFIER 86.2 86.3 82.4 64.7 79.7 69.8 85.1 90.1 79.6 W Table 6: End-to-end performance (%): We compare V INCULUM in different stages with two state-of-the-art systems, A W . The column “Overall” lists the average performance of nine data sets for each approach. and IDA IKIFIER is 0.6% shy from IKIFIER , each winning in four data sets; INCULUM W CrossWikis appears to be a strong baseline. V INCULUM and W A on AIDA-test. IDA tops both V IKIFIER (Cheng and Roth, 2013): We are able to increase, among other subcomponents. The corefer- IKIFIER W ence stage and the coherence stage also give a rea- reproduce the reported results on ACE and MSNBC 3 B + F1 number on TAC11 and obtain a close enough sonable lift. (82.4% vs 83.7%). Since overgenerates W IKIFIER In terms of running time, V INCULUM runs reason- mentions and produce links for common concepts, ably fast. For a document with 20-40 entity mentions we restrict its output on the AIDA data to the men- on average, V INCULUM takes only a few seconds to tions that Stanford NER predicts. finish the linking process on one single thread. Table 6 shows the performance of V INCULUM 5.6 System Analysis after each stage of candidate generation (Cross- Wikis), entity type prediction (+FIGER), coreference We outline the differences between the three system (+Coref) and coherence (+Coherence). The column architectures in Table 7. For identifying mentions to “Overall” displays the average of the performance link, both rely solely on NER A V INCULUM and IDA numbers for nine data sets for each approach. W IKI - IKIFIER W detected mentions, while additionally in- FIER achieves the highest in the overall performance. cludes common noun phrases, and trains a classifier V INCULUM performs quite comparably, only 0.6% to determine whether a mention should be linked. IKIFIER , despite its simplicity and un- W shy from For candidate generation, CrossWikis provides better supervised nature. Looking at the performance per coverage of entity mentions. For example, in Fig- W V IKIFIER each is superior and INCULUM data set, ure 3, we observe a recall of 93.2% at a cut-off of in 4 out of 9 data sets while A IDA tops the perfor- 30 by CrossWikis, outperforming 90.7% by ’s A IDA mance only on AIDA-test. The performance of all dictionary. Further, Hoffart et al. (2011) report a the systems on TAC12 is generally lower than on the precision of 65.84% using gold mentions on AIDA- other dataset, mainly because of a low recall in the test, while CrossWikis achieves a higher precision candidate generation stage. at 69.24%. Both use coarse IKIFIER W A IDA and We notice that even using CrossWikis alone works NER types as features, while V INCULUM incorpo- pretty well, indicating a strong baseline for future rates fine-grained types that lead to dramatically im- comparisons. The entity type prediction provides proved performance, as shown in Section 5.3. The the highest boost on performance, an absolute 1.7% differences in Coreference and Coherence are not cru- 324

11 V INCULUM W IKIFIER A IDA NER NER, noun phrases Mention Extraction NER an intra-Wikipedia dictionary an intra-Wikipedia dictionary CrossWikis Candidate Generation FIGER NER Entity Types NER Coreference find the representative mention re-rank the candidates - Coherence link-based similarity, relation triples link-based similarity link-based similarity, relation triples trained on AIDA trained on a Wikipedia sample unsupervised Learning Table 7: Comparison of entity linking pipeline architectures. V INCULUM components are described in detail in are highlighted. Section 4, and correspond to Figure 2. Components found to be most useful for V INCULUM serve a notably high percentage of metonymy-related cial to performance, as they each provide relatively small gains. Finally, V errors. Since many of these errors are caused due to is an unsupervised INCULUM A W incorrect type prediction by FIGER, improvements in IKIFIER and system whereas are trained on IDA labeled data. Reliance on labeled data can often hurt type prediction for metonymic mentions can provide performance in the form of overfitting and/or incon- substantial gains in future. The especially high per- ’s lower perfor- sistent annotation guidelines; A IDA centage of metonymic mentions in the AIDA datasets thus explains V INCULUM ’s lower perfomance there mance on TAC datasets, for instance, may be caused by the different data/label distribution of its train- (see Table 6). ing data from other datasets ( e.g. CoNLL-2003 con- makes quite INCULUM V Second, we note that tains many scoreboard reports without complete sen- a number of “Context” errors on the TAC11 and tences, and the more specific entities as annotations TAC12 datasets. One possible reason is that when for metonymic mentions). highly ambiguous mentions have been intentionally selected, link-based similarity and relational triples We analyze the errors made by V INCULUM are insufficient for capturing the context. For exam- and categorize them into six classes (Table 8). Freeport ple, in “... while returning from to Port- “Metonymy” consists of the errors where the men- land. (TAC)”, the mention “Freeport”is unbounded tion is metonymic but the prediction links to its lit- by the state, one needs to know that it’s more likely eral name. The errors in “Wrong Entity Types” are to have both “Freeport” and “Portland” in the same mainly due to the failure to recognize the correct en- 19 i.e. . Maine) to make a correct prediction state ( tity type of the mention. In Table 8’s example, the Another reason may be TAC’s higher percentage link would have been right if FIGER had correctly of Web documents; since contextual information is predicted the airport type. The mistakes by the coref- more scattered in Web text than in newswire docu- erence system often propagate and lead to the errors ments, this increases the difficulty of context model- under the “Coreference” category. The “Context” cat- ing. We leave a more sophisticated context model for egory indicates a failure of the linking system to take future work (Chisholm and Hachey, 2015; Singh et into account general contextual information other al., 2012). than the fore-mentioned categories. “Specific Labels” Since “Specific Labels”, “Metonymy”, and refers to the errors where the gold label is a specific “Wrong Entity Types” correspond to the annotation instance of a general entity, includes instances where issues discussed in Sections 3.2, 3.3, and 3.4, the the prediction is the parent company of the gold en- distribution of errors are also useful in studying tity or where the gold label is the township whereas annotation inconsistencies. The fact that the er- the prediction is the city that corresponds to the town- rors vary considerably across the datasets, for in- ship. “Misc” accounts for the rest of the errors. In makes many more “Specific stance, INCULUM V the example, usually the location name appearing Labels” mistakes in ACE and MSNBC, strongly in the byline of a news article is a city name; and suggests that annotation guidelines have a consid- INCULUM V , without knowledge of this convention, erable impact on the final performance. We also mistakenly links to a state with the same name. observe that annotation inconsistencies also cause The distribution of errors shown in Table 9 pro- reasonable predictions to be treated as a mistake, ’s varying V INCULUM vides valuable insights into 19 performance across the nine datasets. First, we ob- e.g. Cucerzan (2012) use geo-coordinates as features. 325

12 Category Example Prediction Gold Label South Africa managed to avoid a fifth successive defeat in 1996 at the South Africa national Metonymy South Africa rugby union team hands of the All Blacks ... Wrong Entity Types Bob Hope Airport Burbank, California Instead of Los Angeles International, for example, consider flying into Burbank or John Wayne Airport ... It is about his mysterious father, Barack Hussein Obama Barack Obama Sr. Barack Obama Coreference , an imperious if alluring voice gone distant and then missing. removed himself from the race, but Green never really Context Scott Walker Scott Walker (politician) Scott Walker (singer) stirred the passions of former Walker supporters, nor did he garner out- sized support “outstate”. champion Lindsay ) Davenport Specific Labels 1996 Summer Olympics Olympic Games What we like would be Seles , ( Olympic and Mary Joe Fernandez . NEW YORK 1996-12-07 New York City New York Misc six error categories and provide an example for each class. Table 8: We divide linking errors into TAC09 TAC10 TAC10T TAC11 TAC12 Error Category ACE MSNBC AIDA-dev AIDA-test 16.7% 3.3% 0.0% 0.0% 60.0% 60.0% 5.3% 20.0% Metonymy 0.0% 13.3% 23.3% 20.0% 6.7% 10.0% 6.7% 10.0% 31.6% 5.0% Wrong Entity Types 20.0% 30.0% 20.0% 6.7% 3.3% 0.0% 0.0% 0.0% 6.7% Coreference Context 30.0% 26.7% 26.7% 70.0% 70.0% 13.3% 16.7% 15.8% 15.0% Specific Labels 6.7% 36.7% 16.7% 10.0% 3.3% 3.3% 3.3% 36.9% 25.0% Misc 3.3% 6.7% 13.3% 16.7% 10.0% 10.5% 15.0% 6.7% 13.3% 30 20 30 30 30 30 19 30 # of examined errors 30 We analyze a random sample of 250 of V INCULUM ’s errors, categorize the errors into six Table 9: Error analysis: classes, and display the frequencies of each type across the nine datasets. for example, IDA predicts KB:West Virginia benchmark for evaluation. When complex EL sys- A tems are introduced, there are limited ablation studies for “..., Alabama of- Mountaineers football fered the job to Rich Rodriguez, but he decided to for readers to interpret the results. In this paper, we West Virginia examine 9 EL data sets and discuss the inconsisten- . (MSNBC)” but the gold label stay at KB:West Virginia University . is cies among them. To have a better understanding of an EL system, we implement a simple yet effective, 6 Related Work unsupervised system, V , and conduct ex- INCULUM tensive ablation tests to measure the relative impact of Most related work has been discussed in the earlier each component. From the experimental results, we sections; see Shen et al. (2014) for an EL survey. show that a strong candidate generation component Two other papers deserve comparison. Cornolti et al. (CrossWikis) leads to a surprisingly good result; us- (2013) present a variety of evaluation measures and ing fine-grained entity types helps filter out incorrect experimental results on five systems compared head- links; and finally, a simple unsupervised system like to-head. In a similar spirit, Hachey et al. (2014) pro- V INCULUM can achieve comparable performance vide an easy-to-use evaluation toolkit on the AIDA with existing machine-learned linking systems and, data set. In contrast, our analysis focuses on the prob- therefore, is suitable as a strong baseline for future lem definition and annotations, revealing the lack of research. consistent evaluation and a clear annotation guide- There are several directions for future work. We line. We also show an extensive set of experimental hope to catalyze agreement on a more precise EL an- results conducted on nine data sets as well as a de- notation guideline that resolves the issues discussed tailed ablation analysis to assess each subcomponent in Section 3. We would also like to use crowdsourc- of a linking system. ing (Bragg et al., 2014) to collect a large set of these 7 Conclusion and Future Work annotations for subsequent evaluation. Finally, we hope to design a joint model that avoids cascading Despite recent progress in Entity Linking, the com- errors from the current pipeline (Wick et al., 2013; munity has had little success in reaching an agree- Durrett and Klein, 2014). ment on annotation guidelines or building a standard 326

13 Acknowledgements The authors thank Luke Zettle- J.R. Finkel, T. Grenager, and C. Manning. 2005. Incor- moyer, Tony Fader, Kenton Lee, Mark Yatskar for porating non-local information into information extrac- constructive suggestions on an early draft and all Proceedings of tion systems by gibbs sampling. In members of the LoudLab group and the LIL group the 43rd Annual Meeting on Association for Compu- , pages 363–370. Association for tational Linguistics for helpful discussions. We also thank the action edi- Computational Linguistics. tor and the anonymous reviewers for valuable com- ments. This work is supported in part by the Air Ben Hachey, Joel Nothman, and Will Radford. 2014. Force Research Laboratory (AFRL) under prime con- . ACL Cheap and easy entity evaluation. In tract no. FA8750-13-2-0019, an ONR grant N00014- Hannaneh Hajishirzi, Leila Zilles, Daniel S. Weld, and 12-1-0211, a WRF / TJ Cable Professorship, a gift Luke Zettlemoyer. 2013. Joint Coreference Resolution from Google, an ARO grant number W911NF-13- and Named-Entity Linking with Multi-pass Sieves. In 1-0246, and by TerraSwarm, one of six centers of . EMNLP STARnet, a Semiconductor Research Corporation Xianpei Han and Le Sun. 2012. An entity-topic model for program sponsored by MARCO and DARPA. Any entity linking. In Proceedings of the 2012 Joint Confer- opinions, findings, and conclusion or recommenda- ence on Empirical Methods in Natural Language Pro- tions expressed in this material are those of the au- cessing and Computational Natural Language Learn- thor(s) and do not necessarily reflect the view of , pages 105–115. Association for Computational ing DARPA, AFRL, or the US government. Linguistics. Zhengyan He, Shujie Liu, Mu Li, Ming Zhou, Longkai Zhang, and Houfeng Wang. 2013a. Learning entity rep- References Proc. ACL2013 resentation for entity disambiguation. . Jonathan Bragg, Andrey Kolobov, and Daniel S Weld. Zhengyan He, Shujie Liu, Yang Song, Mu Li, Ming Zhou, 2014. Parallel task routing for crowdsourcing. In Sec- and Houfeng Wang. 2013b. Efficient collective entity ond AAAI Conference on Human Computation and EMNLP linking with stacking. In , pages 426–435. Crowdsourcing . Johannes Hoffart, Mohamed A. Yosef, Ilaria Bordino, Ha- Xiao Cheng and Dan Roth. 2013. Relational inference ̈ gen F urstenau, Manfred Pinkal, Marc Spaniol, Bilyana for wikification. In . EMNLP Taneva, Stefan Thater, and Gerhard Weikum. 2011. Robust disambiguation of named entities in text. In Andrew Chisholm and Ben Hachey. 2015. Entity disam- Proceedings of the Conference on Empirical Methods biguation with web links. Transactions of the Associa- in Natural Language Processing , pages 782–792. As- tion for Computational Linguistics , 3:145–156. sociation for Computational Linguistics. Marco Cornolti, Paolo Ferragina, and Massimiliano Cia- Johannes Hoffart, Yasemin Altun, and Gerhard Weikum. ramita. 2013. A framework for benchmarking entity- 2014. Discovering emerging entities with ambiguous annotation systems. In Proceedings of the 22nd interna- names. In Proceedings of the 23rd international confer- tional conference on World Wide Web , pages 249–260. ence on World wide web , pages 385–396. International International World Wide Web Conferences Steering World Wide Web Conferences Steering Committee. Committee. Mark Craven and Johan Kumlien. 1999. Constructing Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke biological knowledge bases by extracting information Zettlemoyer, and Daniel S Weld. 2011. Knowledge- from text sources. In Proceedings of the Seventh Inter- based weak supervision for information extraction of national Conference on Intelligent Systems for Molecu- Proceedings of the 49th An- overlapping relations. In lar Biology (ISMB-1999) , pages 77–86. nual Meeting of the Association for Computational Lin- , volume 1, guistics: Human Language Technologies S. Cucerzan. 2007. Large-scale named entity disam- pages 541–550. biguation based on wikipedia data. In Proceedings of , volume 2007, pages 708–716. EMNLP-CoNLL Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Grif- fitt, and Joe Ellis. 2010. Overview of the tac 2010 Silviu Cucerzan. 2012. The msr system for entity linking Text Analysis Con- knowledge base population track. In . at tac 2012. In Text Analysis Conference 2012 ference (TAC 2010) . Greg Durrett and Dan Klein. 2014. A joint model for en- Mitchell Koch, John Gilmer, Stephen Soderland, and Trans- tity analysis: Coreference, typing, and linking. Daniel S Weld. 2014. Type-aware distantly supervised actions of the Association for Computational Linguis- . EMNLP relation extraction with linked arguments. In , 2:477–490. tics Paolo Ferragina and Ugo Scaiella. 2012. Fast and ac- Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and curate annotation of short texts with wikipedia pages. Soumen Chakrabarti. 2009. Collective annotation of IEEE Software , 29(1):70–75. Wikipedia entities in web text. In Proceedings of the 327

14 15th ACM SIGKDD international conference on Knowl- Valentin I Spitkovsky and Angel X Chang. 2012. A cross- , pages 457–466. ACM. lingual dictionary for english wikipedia concepts. In edge discovery and data mining LREC , pages 3168–3175. Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2013. 2012. Tac kbp entity selection. http://www.nist. Deterministic coreference resolution based on entity- gov/tac/2012/KBP/task_guidelines/ centric, precision-ranked rules. Computational Linguis- . TAC_KBP_Entity_Selection_V1.1.pdf tics , pages 1–54. Michael Wick, Sameer Singh, Harshal Pandya, and An- Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, Dan Roth, drew McCallum. 2013. A joint model for discovering and Xifeng Yan. 2013. Mining evidences for named and linking entities. In CIKM Workshop on Automated entity disambiguation. In Proceedings of the 19th ACM Knowledge Base Construction (AKBC) . SIGKDD international conference on Knowledge dis- Jiaping Zheng, Luke Vilnis, Sameer Singh, Jinho D. Choi, covery and data mining , pages 1070–1078. ACM. and Andrew McCallum. 2013. Dynamic knowledge- Xiao Ling and Daniel S Weld. 2012. Fine-grained entity base alignment for coreference resolution. In Confer- recognition. In AAAI . ence on Computational Natural Language Learning James Mayfield, Javier Artiles, and Hoa Trang Dang. (CoNLL) . 2012. Overview of the tac2012 knowledge base popu- lation track. . Text Analysis Conference (TAC 2012) P. McNamee and H.T. Dang. 2009. Overview of the tac Text Analysis 2009 knowledge base population track. . Conference (TAC 2009) ́ ́ ıa-Silva, and es Garc Pablo N Mendes, Max Jakob, Andr Christian Bizer. 2011. Dbpedia spotlight: shedding Proceedings of light on the web of documents. In the 7th International Conference on Semantic Systems , pages 1–8. ACM. David Milne and Ian H. Witten. 2008. Learning to link with wikipedia. In Proceedings of the 17th ACM con- ference on Information and knowledge management , pages 509–518. ACM. Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity linking meets word sense disambiguation: A unified approach. Transactions of the Association for Computational Linguistics , 2. Lev-Arie Ratinov, Dan Roth, Doug Downey, and Mike Anderson. 2011. Local and global algorithms for dis- ambiguation to wikipedia. In ACL , volume 11, pages 1375–1384. Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without , pages 148–163. labeled text. In ECML/PKDD (3) Wei Shen, Jianyong Wang, and Jiawei Han. 2014. Entity linking with a knowledge base: Issues, techniques, and TKDE solutions. . Avirup Sil and Alexander Yates. 2013. Re-ranking for joint named-entity recognition and linking. In Pro- ceedings of the 22nd ACM international conference on Conference on information & knowledge management , pages 2369–2374. ACM. Sameer Singh, Amarnag Subramanya, Fernando Pereira, and Andrew McCallum. 2012. Wikilinks: A large- scale cross-document coreference corpus labeled via links to wikipedia. Technical report, University of Massachusetts Amherst, CMPSCI Technical Report, UM-CS-2012-015. 328

Related documents

Corporate Surveillance in Everyday Life

Corporate Surveillance in Everyday Life

Wolfie Christl CORPORATE SURVEILLANCE IN EVERYDAY LIFE How Companies Collect, Combine, Analyze, Trade, and Use Personal Data on Billions EPORT BY CRACKED LABS A R Vienna, June 2017 Author: Wolfie Chri...

More info »