Is Learning The n th Thing Any Easier Than Learning The First?

Transcript

1 Is Learning n-th Thing Any Easier Than The The Learning First? Sebastian I Thrun Computer Science Department Carnegie University Mellon PA 15213-3891 Pittsburgh, http://www.cs.cmu.edul'''thrun Web: Wide World Abstract learning in a lifelong paper Lifelong learning This investigates context. learn- of a learner situations stream which in addresses faces a whole Such provide the opportunity to transfer knowledge scenarios tasks. ing learning tasks, in order to generalize more accurately from across multiple lifelong data. this paper, several different approaches to In training less an object recognition domain. It are described, and learning in applied is that across the board, lifelong learning approaches generalize shown more accurately less training data, by their ability to from consistently across learning tasks. knowledge transfer 1 Introduction learning is concerned with approximating an unknown function based on Supervised exam- ples. all current approaches to supervised learning assume that one is given a set Virtually characterize input-output denoted by X, which examples, an unknown function, denoted of a f. The target function f is drawn from a class of functions, F, and the learner is given by hypotheses, by H, and an denoted (preference/prior) with which it considers of order space by might an H represented be the space of functions during learning. example, them For network neural different weight vectors. artificial with this formulation establishes a rigid framework for research in machine learning, it While human important are essential for that learning. Psychological studies aspects dismisses shown that humans often employ more than just the training data for generalization. have [2, 10]. One able to generalize correctly even from a single training example They are often of the aspects of the learning problem faced by humans, which differs from the vast key learning, of studied in the field of neural network humans is the fact that problems majority encounter a whole stream of learning problems over their entire lifetime. When faced with and of training data humans can usually exploit an enormous amount a new to learn, thing Informatik I affiliated with: Institut fur also III, Universitat Bonn, Romerstr. 164, Germany

2 Is Learning n-th Thing Any Easier Than Learning the First? the 641 experiences from other , related learning tasks . For example, when learning to drive that stem , years a car with basic motor skills, typical traffic patterns, logical learning of experience The much and influence this learning task. precede transfer reasoning, and more language of tasks seems to play an essential role for generalizing accurately, knowledge across learning data particularly . when training is scarce is the transfer of the framework. of knowledge learning lifelong study for framework A the learning problems of a learner , it is assumed a whole collection that framework this In faces entire over lifetime. its the for synergy. When facing its Such opportunity a scenario opens a learner can n-th knowledge gathered in its previous learning task, re-use 1 learning - n boost the generalization tasks to accuracy. this we will be interested in the most simple version In paper lifelong problem, of the learning functions , the specifically More of concept learning tasks. a family faces learner the in which be learned over the lifetime to the learner, denoted by 11 , 12 , 13 , .. . E F , are all of the type of : I {O , I} and sampled from F. Each function I E {II , h , 13, . . . } is an indicator I --+ defines concept: a pattern that function a particular I is member of this concept if x E function, if = 1. When learning the n-th indicator only In , the training set X and I(x) examples of the type (x , In(x)) (which may be distorted by noise). In addition to contains training the the learner is also given set, - 1 sets examples of other concept functions, n of by denoted Each 1, Xk . , n - I). (k Xk contains training examples that characterize = .. Since this additional Ik. is desired to support learning In, Xk is called a support set data for training set X . the example An the above is the recognition of faces of , 7]. When learning to recognize the [5 face of example IBob, the learner is given a set of positive and negative say person, n-th images of In lifelong learning , it may also exploit training information stemming this person. ve , E {/Ri eh, IMike , ID a I ... }. The support sets usually cannot be persons as from other , such as training patterns when learning a new concept different they describe , since directly used (like have labels) . However, certain features class the shape concepts different (hence of the more important than others (like the facial expression, are eyes) the location of the or face within image). the the invariances the domain are learned, they can be transferred Once of learning hence (new people) and new improve generalization. to tasks tasks the importance of related learning illustrate in lifelong learning, this To potential does paper present not just particular approach to the transfer of knowledge . Instead, one several it describes , all which extend conventional memory-based or neural network of traditional algorithms. are compared with more approaches learning algorithms, i. These e., those that do not transfer knowledge. The goal of this research is to demonstrate that, independent of learning approach, more complex functions can be learned from a particular training data is embedded into a lifelong context. less iflearning Learning Memory-Based Approaches 2 memorize all training Memory-based explicitly and interpolate them algorithms examples query-time. We will first sketch two simple, well-known approaches at memory-based to learning, propose extensions that take the support sets into account. then Nearest Shepard's and 2.1 Method Neighbor algorithm the widely Probably memory-based learning most is J{ -nearest neighbor used (KNN) [15]. Suppose x is a query pattern, for which we would like to know the output y . those KNN set of training examples X for the J{ examples (X i, Yi ) E X whose searches Euclidian the Xi are nearest to X (according to some distance metric , e.g., patterns input distance). It then returns the mean output value k 2:= Yi of these nearest neighbors. to Another used method , which is due commonly Shepard [13], averages the output values

3 s. THRUN 642 all of but weights each example according to the inverse distance to the training examples query x. ( I) -I :~~~t ( ) (1) II E· L Ilx - X i + + E - ~: II L Ilx (x"y.)EX (x. , y.)EX zero. 0 constant that prevents division by is a small Plain memory-based learning > E Here obvious incorporate to X for learning. There is no the way set the exclusively uses training since support wrong class labels. they carry the sets, 2.2 New Representation A Learning modification of memory-based learning proposed in this paper The the support first employs More employed new representation of the data. are specifically, the support sets learn a sets to a new to patterns in 9 : I --+ I', which maps input space, I by a function, learn to denoted . This space I' forms the input space for a memory-based algorithm. I' new multiple key a good data representations is that of examples of a Obviously, property the an of example representation, a similar the representation have should concept single whereas a counterexample and be be more of This property can directly a concept should different. g: into an energy function for transformed ( ) n-I ~ E:= , y~EXk (X"y~EXk Ilg(x)-g(x')11 ( X "y~EXk Ilg( x )-g(x')11 (2) (X of 9 E forces the distance between pairs minimize examples of the same Adjusting to of a concept small, and the distance between an concept and a counterexample to be example a neural the using our implementation, 9 is realized by trained network and be . In large to Back-Propagation algorithm [12]. that Assuming sets. support the g, is obtained through representation, new that Notice the appropriate for new learning tasks, standard memory-based is learned representation the be n-th using this new representation when learning the applied concept. can learning 2.3 A Distance Function Learning way for exploiting An sets to improve memory-based learning is to learn alternative support which [0 , accepts --+ I x d : I I] a function approach This 9]. [3, function a distance learns x and and x' , and outputs whether x x' are members of the same input say patterns, two d are the is. Training what examples for regardless concept, concept x , x ify=y'=l (( '),I) ((x, x'), 0) if(y=IAy'=O)or(y=OAy'=I). are derived from pairs They examples (x , y) , (x', y') E Xk taken from a single support of trained X network (k = 1, . .. , n - I). In our implementation, d is an artificial neural set k d lack information concerning Back-Propagation. for Notice the training examples with that support for concept were originally derived. Hence, all which sets can be used to the they d. After training, d can are interpreted as the probability that two patterns x, x' E I be train of same concept. the examples trained, d can be Once as a generalized distance function for a memory-based approach. used Suppose one is given a training set X and a query point x E I. Then, for each positive can (x' y' = I) EX , d( x , x') , be interpreted as the probability that x is a member example of the target concept. Votes from multiple positive examples (XI, I) , (X2' I), ... E X are combined Bayes' rule, yielding using Prob(fn(x)=I) .- 1- (I + II I:(~(::~,))-I (3) (x' , y'=I)EXk

4 Is Learning n-th Thing Any Easier Than Learning the First? the 643 Notice is not a distance metric. It generalizes the notion of a distance metric, because that d inequality needs hold, and because the triangle not the example x' can an concept target of that provide evidence a member of that concept (if d(x, x') < 0.5). x is not Approaches 3 Neural Network describe comparison complete, we will To briefly more approaches that rely our make now artificial neural networks for learning exclusively on In. 3.1 Back-Propagation using can to learn the indicator function In, used X as training Standard be Back-Propagation approach does not employ the support sets, hence is unable to transfer knowledge set. This tasks. across learning With 3.2 Hints Learning hints [1, 4, 6, 16] constructs a neural Learning with n output units, one for with network function each (k = 1,2, .. . , n). Ik is then trained to simultaneously minimize This network internal and the training set {Xk} By doing so, the X. sets the on both the support error of this is not only determined by X but also shaped through the network representation are k }. If similar internal representations {X required for al1 functions Ik support sets = 1,2, .. . , n), the support sets provide additional training examples for the internal (k representation. 3.3 Neural Network Learning Explanation-Based last method described here uses the explanation-based neural network learning al- The context (EBNN), original1y proposed in the was of reinforcement learning gorithm which just : 1], [0, ----+ I like h by EBNN network, neural artificial an trains 17]. [8, denoted given However, Back-Propagation. to the target values in by the training set addition X, EBNN estimates the slopes (tangents) of the target function In for each example in X. More specifically, training in EBNN are examples the sort In (x), \7 xl n (x)), which are fit of (x, taken x and target value In(x) are from input Tangent-Prop [14]. the The using algorithm ning trai set the third term, the slope \7 xl X. ( X ), is estimated using the learned distance The n function described above. Suppose (x', y' = 1) E X is a (positive) training d example. ' ' : Then, function the d(z I 1] with d d to (z) := [0, , x') maps a single input pattern ----+ x x 1], and is an approximation to In. Since [0, z, x') is represented by a neural network and d( (z) In of slope the 8d of estimate is an /8z ' networks gradient the differentiable, are neural x at z. Setting z := x yields the desired estimate of \7 xln (x) . As stated above, both the target value In and the slope vector \7 x In (x) are fit using the Tangent-Prop algorithm for each (x) example training EX . x slope \7 xln provides additional information about the target function In. Since d is The using support learned sets, EBNN approach transfers knowledge from the support sets the relies the learning task. EBNN new on the assumption that to d is accurate enough to yield helpful sensitivity However, since EBNN fits both training patterns (values) information. for slopes, slopes can be and by training examples. See [17] misleading a more overridden detailed description of EBNN and further references. 4 Experimental Results color All tested using a database of were camera images of different objects approaches The size. n-th or of the object in the database has a distinct color . Each Fig. (see 3.3)

5 THRUN S. 644 I't' 'I ... l1 'I sup- The 1: Figure ~~~ :. < • > , port com- sets were -~""":::.~ ~ - '" ... . - piled out of a hundred -,:~~,} I ~ 1:1 ,I bottle, a a of images ... ~.. ' coke hammer, a a hat, , £ The can, and a book. ~ . ».~ <.~ ,,- n-th learning tasks <.- :t " ~ ... , ~ - ~_ ~~- -_,1- ~ __ ~ ~_l_~ '~~ E~ distinguish- involves III ;'j ;' ~,-l/> m t II _e· ;, ;1 ~ ~' '1 ... - ~[~ t! ''',ll ~,~ ,AA( the the shoe from ing .d!t~)ltI!{iH-"" , , sunglasses. Images were to subsampled , ~ a 100x pixel 100 ma- c- _. :R;1-; , " has pixel (each trix a It ""111':'i, color, saturation, and r f4~ ML~._ , ... I value), brightness a '''!!!i!~, shown on the right =' ;~~~ side. , was recognition task learning the of these objects, namely the shoe. The previous of one five 1 tasks correspond to the recognition of learning other objects, namely the bottle, n - images latter the that could ensure To and coke book. , can the hammer, the hat, the the be used not simply counterexamples training for In, additional only data of the shoe as the the seventh object, the sunglasses. Hence, the training set for was contained of In images shoe and the sunglasses, and the support sets contained images of the other the objects. five The recognition domain is a good testbed object the transfer for of knowledge in lifelong learning. This is because finding a good approximation to In involves target recognizing the lighting of in size, change of scaling and so on. Since rotation, translation, object invariant showing common object recognition tasks, images to all other objects are these invariances additional information and boost can generalization accuracy. provide the Transfer is most important when training data is scarce. Hence, in an initial of knowledge tested we methods using a single image experiment all the and the sunglasses only. of shoe methods that are able to transfer knowledge were also provided Those images 100 of each The other The results are intriguing. objects. generalization accuracies the five of KNN g+Shep. distanced Back-Prop hints Shepard repro EBNN 60.4% 60.4% 74.4% 75.2% 74.8% 62.1% 59 .7% ±8.3% ±18.9% ±9.0% ±10.2% ±11.1% ±18.5% ±8.3% that all approaches that transfer knowledge (printed in bold font) generalize sig- illustrate of the hint learning technique, do not. With the exception nificantly better than those that the approaches be grouped into two categories: Those which generalize approximately can which of set correctly, and those testing achieve approximately 75% generaliza- 60% the accuracy. The former group contains the tion supervised learning algorithms, and standard here, transferring of "new" algorithms proposed capable which are latter the contains the knOWledge. The differences within each group are statistically not significant, while the differences between them are the (at level). classifies that random guessing 95% 50% Notice the testing examples correctly . of These results suggest that the generalization accuracy merely depends on the particular choice of learning algorithm (memory-based the neural networks). Instead, the main vs. factor determining the generalization accuracy is the fact whether or not knowledge is transferred from past learning tasks.

6 Is Learning n-th Thing Any Easier Than Learning the First? the 645 95% 95% d function distance 85% ~ 80% hepard with representation g ' s method ,,'~ . , ~ 15% .

7 S. THRUN 646 perform various for various users. Pattern recognition, speech recognition, time to tasks and series might be other, potential application domains for the database prediction, mining presented techniques here. References Complexity, Learning hints in neural networks. Y. of from 6: 192-198, S. Abu-Mostafa. [1] Journal 1990. and W F. Brewer. [2] studies of explanation-based learning. In W-K. Ahn Psychological editor, Explanation-Based Learning . Kluwer Academic Investigating G. Dejong, Publishers, . 1993 BostonlDordrechtILondon, A. Atkeson. Using locally weighted [3] for robot learning. In Proceedings of the 1991 c. regression pages 958-962, Sacramento, CA , International on Robotics and Automation, 1EEE Conference 1991. April the Proceedings of Conference on Computation representations. In [4] internal Learning J. Baxter. 1995. Learning Theory, Proceedings Beymer D. Poggio. Face recognition from one model view. In and of the [5] T. Conference on Computer Vision, 1995. International R. Caruana. [6] learning: A knowledge-based of source of inductive bias. In P. E. Utgoff, MuItitask editor, of the International Conference on Machine Learning, pages 41-48, Proceedings Tenth CA, . Morgan Kaufmann. San Mateo, 1993 Lando and S. Edelman. Generalizing from a single view in face recognition . Technical Report M. [7] CS-TR of Applied Mathematics and Computer Science, The Weizmann Department 95-02, Institute Science, Rehovot 76100, Israel, January 1995. of [8] M. Mitchell and S. Thrun. Explanation-based neural network learning for T. control. In robot S. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing J. Systems 1993. San Mateo, CA, 287-294, Morgan Kaufmann. 5, pages [9] W Moore, D. 1. Hill, and M. P. Johnson. An A. Investigation of Brute Force to choose Empirical Features, and Function Approximators. In Smoothers S. S. Judd, and T. Petsche, editors, Hanson, Computational Learning Theory and Natural Learning Systems, Volume 3. MIT Press, 1992. [10] Y. S. Ullman, and S. Edelman. Generalization across changes in illumination and viewing Moses, of Applied in faces. Technical Report CS-TR 93-14, Department upright inverted position and Computer The Weizmann Institute and Mathematics Science, Rehovot 76100, Israel, of Science, . 1993 Explanation-based , T. M. Mitchell, and S. Thrun. J. neural network learning from O'Sullivan [11] robot perception. mobile K. Ikeuchi and M. Veloso, editors, Symbolic Visual Learning. Oxford In University Press, 1995. [12] internal , G . E. Hinton, and R. J . Williams. Learning Rumelhart representations by error D. E. In propagation. E. Rumelhart and 1. L. McClelland, editors, Parallel Distributed Processing. D. Vol. I II. MIT Press, 1986. + In D. Shepard two-dimensional interpolation function for [13] spaced data. . A irregularly 23rd pages 517-523, 1968 . National Conference ACM, specifying P. Simard, B. Victorri, Y. LeCun, and J. Denker . Tangent prop - a formalism for [14] selected in an adaptive network. In invariances R. E. J. Hanson, and S. P. Lippmann, 1. Moody, editors, in Neural Information Processing Systems 4, pages 895-903, San Mateo, CA, Advances . Morgan 1992 Kaufmann. [15] of and D. Waltz. Towards memory-based c. Communications Stanfill the ACM, reasoning. 29(12): 1213-1228, December 1986. [16] S. C. Suddarth and A. Holden. Symbolic neural systems and the use of hints for developing complex systems. Journal Machine Studies, 35, 1991. International of Approach. Learning Explanation-Based Neural Network Learning: Kluwer Lifelong A . [17] S. Thrun Publishers, Boston, Academic 1996. to appear . MA, [18] S. Thrun and J. Clustering learning tasks and the selective cross-task transfer O'Sullivan. of knowledge . Technical Report CMU-CS-95-209, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA 15213, November 1995.

Related documents

Medium and High Pressure Fittings Tubing Valves and Accessories (MS 02 472;rev K;en US;Catalog)

Medium and High Pressure Fittings Tubing Valves and Accessories (MS 02 472;rev K;en US;Catalog)

Medium- and High-Pressure Fittings, Tubing, Valves, and Accessories 1 www.swagelok.com www.swagelok.com Medium- and High-Pressure Fittings, Tubing, Valves, and Accessories ® Series Products FK, FKB, I...

More info »
2018 MOIL Provider Directory 2 13 18 FA WEB

2018 MOIL Provider Directory 2 13 18 FA WEB

Provider Directory For more information, please contact Essence Healthcare at 866-597-9560, or for TTY users 711, 8 a.m. For more infor to 8 p.m., seven days a week, or visit essencehealthcare.com. Th...

More info »
CUSTOMARY INTERNATIONAL HUMANITARIAN LAW   VOLUME I RULES

CUSTOMARY INTERNATIONAL HUMANITARIAN LAW VOLUME I RULES

Customary International Humanitarian Law Henckaerts Volume I: Rules and In 1996, the International Committee of the Red Cross, alongside a range of Doswald-Beck Customary International renowned expert...

More info »
50 Insurance Cases Every Self Respecting Attorney or Risk Professional Should Know

50 Insurance Cases Every Self Respecting Attorney or Risk Professional Should Know

50 I NSURANCE ASES E VERY C S -R ESPECTING A TTORNEY OR ELF R ISK P ROFESSIONAL S HOULD K NOW AN ANALYSIS OF THE TOP 50 PROPERTY-CASUALTY COVERAGE CASES AND THEIR IMPLICATIONS International Risk Manag...

More info »
Tools

Tools

Tools, Installation, Operation and Maintenance Safe, efficient operation of any product is inherently dependent upon its proper installation. In this section the preparation and assembly of low, mediu...

More info »
Fourth National Report on Human Exposure to Environmental Chemicals Update

Fourth National Report on Human Exposure to Environmental Chemicals Update

201 8 Fourth National Report on Human Exposure to Environmental Chemicals U pdated Tables, March 2018 , Volume One

More info »
CityNT2019TentRoll 1

CityNT2019TentRoll 1

STATE OF NEW YORK 2 0 1 9 T E N T A T I V E A S S E S S M E N T R O L L PAGE 1 VALUATION DATE-JUL 01, 2018 COUNTY - Niagara T A X A B L E SECTION OF THE ROLL - 1 CITY - North Tonawanda TAX MAP NUMBER ...

More info »
Accessories

Accessories

Accessories Parker Autoclave Engineers offers a complete selection of accessories to complete your system requirements. Components such as thermocouples and thermowells are used for monitoring and con...

More info »
airship pilot manual

airship pilot manual

NAVY STATES UNITED · K-TYPE AIRSHIPS PILOT'S MANUAL RESTRICTED CONTRACTS NOs-78121 and NOa(s)-257 GOODYEAR AIRCRAFT CORPORATION AKRON, OHIO ., . First Sept ., 1942 Issue: 43 Issue: Sept. , 19 Revised

More info »
C:\Users\aholmes4\AppData\Roaming\SoftQuad\XMetaL\7.0\gen\c\H5515 ~1.XML

C:\Users\aholmes4\AppData\Roaming\SoftQuad\XMetaL\7.0\gen\c\H5515 ~1.XML

H. R. 5515 One Hundred Fifteenth Congress of the United States of America AT THE SECOND SESSION Begun and held at the City of Washington on Wednesday, the third day of January, two thousand and eighte...

More info »
doj final opinion

doj final opinion

UNITED STAT ES DIS TRICT COURT IC F OR THE D ISTR T OF CO LU M BIA UNITED STAT F AMERICA, : ES O : : la in t if f, P 99 No. on cti l A vi Ci : 96 (GK) -24 : and : TOBACCO-F UND, : REE KIDS ACTION F : ...

More info »
OperatorHoursReport

OperatorHoursReport

John Bel Edwards Rebekah E. Gee MD, MPH SECRETARY GOVERNOR State of Louisiana Louisiana Department of Health Office of Public Health Certified Water and Wastewater Operators 2018 - 2019 Hours Hours li...

More info »
An Introduction to Computer Networks

An Introduction to Computer Networks

An Introduction to Computer Networks Release 1.9.18 Peter L Dordal Mar 31, 2019

More info »
USAID Acquisition Regulation (AIDAR)   A Mandatory Reference to ADS 300 Series Acquisition Chapters

USAID Acquisition Regulation (AIDAR) A Mandatory Reference to ADS 300 Series Acquisition Chapters

USAID Acquisition Regulation (AIDAR) A Mandatory Reference to ADS 300 Series Acquisition Chapters Partial Revision Date: 04/03 /2019 Responsible Office: M/OAA/P File Name: AIDAR_ 0403 19

More info »
Microsoft Word   Legal Rights of Students with Diabetes 2007.doc

Microsoft Word Legal Rights of Students with Diabetes 2007.doc

Legal Rights of Students with Diabetes Prepared By James A. Rapp Attorney, Hutmacher & Rapp, P.C. Quincy, Illinois Shereen Arent Managing Director of Legal Advocacy American Diabetes Association Brian...

More info »
[ME] Multilevel Mixed Effects

[ME] Multilevel Mixed Effects

STATA MULTILEVEL MIXED-EFFECTS REFERENCE MANUAL RELEASE 13 ® A Stata Press Publication StataCorp LP College Station, Texas

More info »
Price Book: Canvas Office Landscape Wall and Private Office

Price Book: Canvas Office Landscape Wall and Private Office

Price Book Y Canvas Office Landscape ® Prices effective January 7, 2019 Published May 2019 Wall and Private Office page 2 Introduction Canvas Office 3 Landscape Office Wall and Private 5 Walls Work Su...

More info »
Clovis channel lineup download

Clovis channel lineup download

Channels) $27/mo (34+ Lite TV Digital L LINEUP – CHANNE 4 NBC 14 NBC 38 Stadium 1010 CBS HD 5 HSN 18 Comet 42 Plateau Country Weather HD 1013 Fox 6 MYTV 27 Telemundo 47 C -SPAN HD NBC 1014 7 ABC 28 Co...

More info »
Pro Tools Reference Guide

Pro Tools Reference Guide

® Pro Tools Reference Guide Version 9.0

More info »