# A Multi World Approach to Question Answering about Real World Scenes based on Uncertain Input

## Transcript

4 Predicate Definition ( A,B above ( A,B ) and ( Y closeAbove ( B ) < Y ) ( A ) +  ) max min closeLeftOf leftOf ( A,B ) and ( X ( ( B ) < X ) ( A ) +  ) A,B min max B inFrontOf ( A,B ) and ( Z ( ( closeInFrontOf ) < Z ) ( A ) +  ) A,B max min ( A,B ) X ) X A ) < X A ( B ) and X ( ( B ) < X ( mean max mean aux min ( ) ) Z Z ( A ) < Z A,B ( B ) and Z A ( B ) < Z ( mean aux mean max min h A,B ) closeAbove ( A,B ) or closeBelow ( A,B ( ) aux ( A,B ) closeLeftOf ( v ) or closeRightOf ( A,B ) A,B aux auxiliary relations ) d ) closeInFrontOf ( A,B A,B or closeBehind ( A,B ) ( aux ( leftOf X A,B ( A ) < X ) ( B )) mean mean above ) Y < Y ( A ) A,B ( ( B ) mean mean ) ) Z inFrontOf ( A A,B < Z )) ( B ( mean mean spatial ( A,B ) closeAbove ( A,B ) and Z A,B ( A,B ) and X ) ( on aux aux ( A,B ) h close ( A,B ) or v ) ( A,B ) or d A,B ( aux aux aux and . Auxiliary relations define actual spatial re- Table 1: Predicates defining spatial relations between B A axis points downwards, functions X lations. The ,X Y ,... take appropriate values from the tuple min max , and  is a ’small’ amount. Symmetrical relations such as rightOf , below , behind , etc. can readily predicate below be defined in terms of other relations (i.e. A,B ) = above ( B,A ) ). ( color [16] (Figure 1 - middle part). Every object hypothesis is therefore represented as an n-tuple: predicate instance ( ∈{ id, color, spatial ) id, image predicate loc bag,bed,books,... } , where instance id is the object’s id, image id is id of the image containing the object, color is esti- mated color of the object [16], and spatial is the object’s position in the image. Latter is loc ( represented as ,X X ,X and defines mini- ,Y ) ,Y ,Z ,Y ,Z ,Z min mean max mean min mean max min max X,Y,Z mal, maximal, and mean location of the object along axes. To obtain the coordinates we fit axis parallel cuboids to the cropped 3d objects based on the semantic segmentation. Note that the X,Y,Z coordinate system is aligned with direction of gravity [15]. As shown in Figure 2b, this is a more meaningful representation of the object’s coordinates over simple image coordinates. The complete schema will be documented together with the code release. We realize that the skilled use of spatial relations is a complex task and grounding spatial relations is a research thread on its own (e.g. [17], [18] and [19]). For our purposes, we focus on predefined relations shown in Table 1, while the association of them as well as the object classes are still dealt within the question answering architecture. Multi-worlds approach for combining uncertain visual perception and symbolic reasoning Up to now we have considered the output of the semantic segmentation as “hard facts”, and hence ignored uncertainty in the class labeling. Every such labeling of the segments corresponds to dif- ferent interpretation of the scene - different perceived world. Drawing on ideas from probabilistic databases [14], we propose a multi-world approach (Figure 1 - lower part) that marginalizes over multiple possible worlds - multiple interpretations of a visual scene - derived from the segmen- W . Therefore the posterior over the answer A given question Q tation S S and semantic segmentation W and logical forms T : of the image marginalizes over the latent worlds ∑ ∑ | P ) = A Q,S ( (2) P ( A |W , T ) P ( W | S ) P ( T | Q ) T W s with the associated probabilities The semantic segmentation of the image is a set of segments i where over the C object categories c } . More precisely S = { ( s ) ,L ,L ) , ( p s ,L ( ) ,..., s 2 k 1 k 1 j ij 2 C L { ( c ,p ) } = is the number of segments of given image. Let k = p ) = , P ( s c , and ij j i i ij j j =1 } { ˆ = s be an assignment of the categories into segments of ,c S )) ) , ( s ,c ,c s ( ) ,..., ( k 2 1 f ( k ) f f (2) f (1) { 1 ,...,k } 1 ,...,C } the image according to the binding function f ∈ F = . With such notation, for { ˆ W W is a set of tuples consistent with f S a fixed binding function , and define P ( , a world | S ) = f ∏ k . Eq. 2 becomes p . Hence we have as many possible worlds as binding functions, that is C i )) ( i,f ( i quickly intractable for and C seen in practice, wherefore we use a sampling strategy that draws a k ~ ,..., = ( W s , W under an assumption that for each segment finite sample W ) W from P ( ·| S ) N 2 1 i every object’s category c . A few sampled perceived worlds is drawn independently according to p ij j are shown in Figure 2a. ∑ ( can be done inde- P ( A |W ) , T ) P Regarding the computational efficiency, computing T | Q i T pendently for every W , and therefore in parallel without any need for synchronization. Since for i small N the computational costs of summing up computed probabilities is marginal, the overall cost is about the same as single inference modulo parallelism. The presented multi-world approach to question answering on real-world scenes is still an end-to-end architecture that is trained solely on the question-answer pairs. 4

9 References [1] Liang, P., Jordan, M.I., Klein, D.: Learning dependency-based compositional semantics. Com- putational Linguistics (2013) [2] Kwiatkowski, T., Zettlemoyer, L., Goldwater, S., Steedman, M.: Inducing probabilistic ccg grammars from logical form with higher-order unification. In: EMNLP. (2010) [3] Zettlemoyer, L.S., Collins, M.: Online learning of relaxed ccg grammars for parsing to logical form. In: EMNLP-CoNLL-2007. (2007) [4] Matuszek, C., Fitzgerald, N., Zettlemoyer, L., Bo, L., Fox, D.: A joint model of language and perception for grounded attribute learning. In: ICML. (2012) [5] Krishnamurthy, J., Kollar, T.: Jointly learning to parse and perceive: Connecting natural language to the physical world. TACL (2013) [6] Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: ECCV. (2012) [7] Kong, C., Lin, D., Bansal, M., Urtasun, R., Fidler, S.: What are you talking about? text-to- image coreference. In: CVPR. (2014) [8] Karpathy, A., Joulin, A., Fei-Fei, L.: Deep fragment embeddings for bidirectional image sentence mapping. In: NIPS. (2014) [9] Matuszek, C., Herbst, E., Zettlemoyer, L., Fox, D.: Learning to parse natural language com- mands to a robot control system. In: Experimental Robotics. (2013) [10] Levit, M., Roy, D.: Interpretation of spatial language in a map navigation task. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on (2007) [11] Vogel, A., Jurafsky, D.: Learning to follow navigational directions. In: ACL. (2010) [12] Tellex, S., Kollar, T., Dickerson, S., Walter, M.R., Banerjee, A.G., Teller, S.J., Roy, N.: Un- derstanding natural language commands for robotic navigation and mobile manipulation. In: AAAI. (2011) [13] Kruijff, G.J.M., Zender, H., Jensfelt, P., Christensen, H.I.: Situated dialogue and spatial orga- nization: What, where... and why. IJARS (2007) [14] Wick, M., McCallum, A., Miklau, G.: Scalable probabilistic databases with factor graphs and mcmc. In: VLDB. (2010) [15] Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from rgb-d images. In: CVPR. (2013) [16] Van De Weijer, J., Schmid, C., Verbeek, J.: Learning color names from real-world images. In: CVPR. (2007) [17] Regier, T., Carlson, L.A.: Grounding spatial language in perception: an empirical and compu- tational investigation. Journal of Experimental Psychology: General (2001) [18] Lan, T., Yang, W., Wang, Y., Mori, G.: Image retrieval with structured object queries using latent ranking svm. In: ECCV. (2012) [19] Guadarrama, S., Riano, L., Golland, D., Gouhring, D., Jia, Y., Klein, D., Abbeel, P., Darrell, T.: Grounding spatial relations for human-robot interaction. In: IROS. (2013) ̈ [20] Manning, C.D., Raghavan, P., Sch utze, H.: Introduction to information retrieval. Cambridge university press Cambridge (2008) [21] Tukey, J.W.: Exploratory data analysis. (1977) [22] Zadeh, L.A.: Fuzzy sets. Information and control (1965) [23] Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: ACL. (1994) [24] Guadarrama, S., Krishnamoorthy, N., Malkarnenkar, G., Mooney, R., Darrell, T., Saenko, K.: Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In: ICCV. (2013) [25] Miller, G.A.: Wordnet: a lexical database for english. CACM (1995) [26] Fellbaum, C.: WordNet. Wiley Online Library (1999) 9

### 2017 NAICS Manual

ORTH N A MERICAN I NDUSTRY LASSIFICATION C YSTEM S United States, 2017 EXECUTIVE OFFICE OF THE PRESIDENT AND BUDGET OFFICE OF MANAGEMENT

### U7112 UCARE CONNECT + MEDICARE PROVIDERDIR MAY 2019 DATA.sv

UCare Connect + Medicare Provider and Pharmacy Directory Introduction This Provider and Pharmacy Directory includes information about the provider and pharmacy types in UCare Connect + Medicare and li...

### u7112 connectplus directories 2019

UCare Connect + Medicare Provider and Pharmacy Directory Introduction This Provider and Pharmacy Directory includes information about the provider and pharmacy types in UCare Connect + Medicare and li...

### November 2018 PPI Detailed Report

D Data for November 2018

### April 2019 PPI Detailed Report

D Data for April 2019

### Nastran Dmap Error Message List

Overview of Error Messages NX Nastran displays User Information, Warning, and Error messages in the printed output. The amount of information reported in a message is controlled by system cell 319. Wh...

### Microsoft Word 147400

Federal Communications Commission FCC 15 -24 the Before Federal Communications Commission Washington, D.C. 20554 In the Matter of ) ) GN Docket No. 14 -28 Protecting and Promoting the Open Internet ) ...

### capitalists

∗ Capitalists in the Twenty-First Century Matthew Smith, US Treasury Department Danny Yagan, UC Berkeley and NBER Owen Zidar, Princeton and NBER Eric Zwick, Chicago Booth and NBER December 19, 2018 Ab...

### Microsoft Word 2019 Directory of CJ Agencies 032019

N ORK S TATE Y EW Andrew M. Cuomo, Governor DIRECTORY of New York State Criminal Justice Agencies 33rd Edition March 2018 D IVISION OF C RIMINAL J USTICE S ERVICES Michael C. Green Executive Deputy Co...

### rptAgencyDirectory

Washington State Directory of Certified Mental Health, Substance Use Disorder, and Problem & Pathological Gambling Services June 2018 Referrals to Behavioral Health Services and Crisis Intervention: ...

### FD GeneralGuidelines BestPractices HandlingRetrievals Chargebacks

Retrieval & Chargeback Best Practices A Merchant User Guide to Help Manage Disputes Visa MasterCard Discover American Express April 2018 www.First D ata.com

### Board Meeting Reference Manual

Board Meeting Reference Manual January 2015 Publication 311

### Microsoft Word ReportSP363EB751MARKTinternalmarketEN final.doc

Special Eurobarometer 363 European Commission Internal Market: Awareness, Perceptions and Impacts REPORT February-March 2011 Fieldwork: Publication: September 2011 TNS opinion & social This survey has...

### web directory

2019 Telephone Directory National Interagency Fire Center 3833 South Development Avenue Boise, Idaho 83705 -5354 (208) 387 + 4 -digit extension Information (208) 387 -5512 Updated 5/ 201 9

### MasterBooklist18 19

Ursuline Academy of Dallas Master Booklist 2018-19 NOTE: BOOKS CODED AS "NEW" AND HIGHLIGHTED IN YELLOW ARE NEW FOR THIS YEAR, YOU CAN PURCHASE USED IF AVAIL. 3313 ALGEBRA II/PRE-CALCULUS H OR 6132 HI...

### tr recent

Page: 1 Date: 05/01/2019 City of San Diego New Business Listing Report 04/01/2019 to 04/30/2019 Creation Date Business Name Owner Name Owner Start Exp Business Tax -----Business Location----- -----Bus...

### Baby 2018 19 Dance.xls

5/12/2019 1 of 5 2016-2018 Big Band Ballroom Dances/Cruises Dance Compliments of Merrymakers Dances & Cruises (BB or dance related) Big Band Ballroom 2019-2021 See current schedule at <www.dancemm.com...

### BE 13A: Survey of New Foreign Direct Investment in the United States

FORM (REV. 11/2014 ) OMB No. 0608-0035: Approval Expires 10/31/2017 BE-13A 5 00 3 MANDATORY — CONFIDENTIAL* SURVEY OF NEW FOREIGN DIRECT INVESTMENT IN THE UNITED STATES FORM BE-13A (Report for Acquisi...

### Government in Palm Beach County

February 2019 Contact Public Affairs at 561-355-2754 for informatio n. this document may be made available in an alternate format. In accordance with the provisions of the ADA, Prepared by Palm Beach ...