1 Neural Networks, pp. 861-867, 1993 0893-6080/93 $6.00 + .00 Vol. 6, the Printed rights reserved. Copyright © 1993 Pergamon Press Ltd. USA. in All CONTRIBUTION ORIGINAL Feedforward Nonpolynomial Networks a Multilayer With Any Approximate Function Activation Function Can YA. LIN, 2 ALLAN PINKUS, 2 AND SHIMON SCHOCKEN MOSHE LESHNO, I VLADIMIR 3 Israel, Israel and 3New 2Technion, University University, Hebrew aThe York 1993; revised and accepted 15 March 1993 ) (Received 9 Februao, the characterized fimction under which multilayer feedforward networks Abstract--Several researchers activation universal reported We show that most of all the characterizations that were as thus far in can act approximators. standard special of the following general result: A cases multilayer feedforward network with a literature the are piecewise continuous activation fimction can approximate an3, continuous function to locally degree bounded any accuracy and only if the network's activation function is not a polynomial. We also emphasize the important of if of the that without it the last theorem does not hold. role threshold, asserting Activation networks, of threshold, Universal approximation Role feedforward functions, Keywords--Multilayer approximation. LP(t,) capabilities, question: fundamental following the on focus we we If 1. BACKGROUND are free to any w, choose 0, and a that we desire, which a of block building a basic The pro- is network neural n "real functions f : R life" R" can multilayer feed- --~ cessing-unit that is linked to n input-units through a forward emulate? networks is model unit of The connections. directed n set single During muitilayer decade, last feedforward net- the 0; by threshold value, denoted a (2) 1 ( ) characterized many works have been shown to be quite effective in function, denoted a R; R -~ a univariate activation : that different applications, with most papers reporting wn "weights," ... and (3) a vector of denoted w = they perform at least as well as their traditional com- input-vector x = xj ... x~ is fed into wn. When an petitors (e.g., linear discrimination models and Bayes- the input-units, the processing- network the through This ian classifiers). success has recently led several the computes unit - x being w. 0), x g(w. function researchers to undertake the of analysis rigorous a in inner-product standard the R ~. The value of this mathematical properties enable feedforward net- that is function taken to be the network's output. then motivation works to perform well in the field. The for and A network consisting of a layer ofn input-units line of Homik research described this by was eloquently a of m processing-units can be "trained" to ap- layer and & White, Stinchcombe, (Hornik, colleagues his f m. limited class of functions proximate : R" --~ R a of apparent "The follows: as ability sufficiently 1989) vectors When network is with new examples of the fed quite approximate to networks feedforward elaborate x ~ R ~ and their correct mappingsf(x), a "learning encountered function applications in any nearly well algorithm" is applied to adjust the weights and the one leads of capabilities ultimate the about wonder to difference thresholds in a direction that minimizes the re- such date to observed successes the Are networks. f(x) between and the network's output. Similar back flective of some deep and fundamental approximation exist algorithms learning propagation for multilayer resulting merely flukes, from they are or capabilities, feedforward referred networks, and the reader is to of choice fortuitous prob- and reporting selective a on the subject. for Hinton an excellent (1989) survey lems?" This paper, however, does not concern learning. Rather, approximation research on the capabilities Previous of Cun feedforward networks can be found in (1987), le Cybenko and Gallant (1989), Funahashi (1989), 4 proof Acknowledgement: The of Step to as given herein is due White Hecht-Nielson (1989), (1988), Hornik et al., appreciative. Benyamini to whom we Y. are most and (1989), Irei Miyake (1988), Lapedes and Farber of for Requests School Moshe Leshno, to sent be should reprints (1988), Stinchcombe and White (1990), and Chui and Scopus, Business Administration, The Hebrew University, Je- Mount Li (1992). These studies show that if the network's Israel. rusalem 91905 861

2 862 al. et Leshno M. explicit activation set functions obey an of assumptions Y another), net- paper one then from vary (which to the universal approxi- be to shown be indeed can work a 1988 ) proved mator. For example, Gallant and White ( func- with that a network "cosine squasher" activation Fou- tions of properties approximations the all possess rier series representations. Hornik et al. (1989) ex- a network with ar- tended this result and proved that of bitrary squashing activation functions are capable Ui'n'i ' any approximating recently, Most interest. of function 1991 fol- ) has ( Hornik proven two general results, as lows: THEOREM 1. Whenever the activation function HORNIK X8 X n 2 X Xx for any finite measure bounded and is then, nonconstant, net. neural feedforward layer hidden Single 1. FIGURE ~, ap- can networks feedforward multilayer standard in LP( of all fimc- #) space proximate any function (the on tions such that fR. If(x)IPd#(x) < ~) arbi- R" threshold the and weights-vector the figure, the In sufficiently that trarily well, provided many hidden units with associated de- are processing-unit jth value the are available. 0j, and j, w noted as- weights-vector The respectively. is sociated with the single output-unit denoted/3, and 2. HORNIK fimction activation the Whenever THEOREM x. With this notation, we input-vector the denoted is for is continuous, bounded and nonconstant, then, ar- see function the that that a multilayer net- feedforward X compact ~_ bitrary R", subsets standard multilayer is: work computes feedforward networks approximate any continuous can k to respect with well arbitrarily X on function uniform 0j) (1) f(x) = ~] flj.a(%.x- provided that man), distance, hidden units sufficiently j~l available. are hidden k being the number of processing-units in the generalize In particular Hornik's in paper we this com- layer. Hence, the family of functions that can be necessary and sufficient 2 by establishing Theorem charac- puted by multilayer feedforward networks is for In universal conditions particular, approximation. follows: as parameters, four by terized show feedforward network that a standard multilayer we The 1. of processing-units, denoted k; number approximate any continuous function to any degree can pair of con- The weights { wij}, one for of set 2. each only network's of accuracy if and activation if the func- nected units; addition, tion is not polynomial. In and emphasize we threshold }, each 3. The set of for one { values pro- 0j the of role the illustrate parameter (a value threshold cessing-unit; of activation function ), without which theorem the the function same --,- R : a for activation An 4. each R, is theorem The ) not does 1 ( because intriguing hold. processing-unit. the conditions that it imposes on the activation function the what In follows, we denote space of these param- almost are minimal; and (2) it embeds, as special cases, quad- eters A = (k, { wij}, particular {0i}, ~, and a far all the activation functions that were reported thus is to E A. The network of ruple parameters denoted literature. the in is characterized by to input-units denoted n with that is drop we will the n and use dV~(n), but for brevity the Finally, computes that function the N~ N~. notation is 2. MULTILAYER family all of the and R", --~ R : denotedf~ func- such FEEDFORWARD NETWORKS S tions t denoted = { f~ I to E A }. is The architecture of a multilayer feedforward general that is the all find to thus objective Our functions n input-units, network consists of an input layer with be may by multilayer feedforward net- approximated an output layer with m output-units, and one or more char- of N~. In order to do so, we will form the works hidden processing- intermediate of consisting layers A E clo- This }. f~lto { closure = t 5 closure the acterize R --~ R" : f mapping a Because com- be can m units. of is based on some metric defined sure the set over --~ R, it is (theoretically) puted by m mappings~ : R" described functions next the in section. R, to R" from networks with sufficient to focus on one output-unit addition, require findings our since only In only. a single DEFINITIONS 3. will assume hereafter that the network hidden we layer, of three layers only: input, hidden, and output. consists S DEFINITION A metric on a set 1. is a function d : S Figure depicted is network such One 1. in X S --~ R such that:

3 Multilayer 863 Networks Feedforward 1. d(s,t)>-O We Note fact. this use will b]/U. [a, on continuous 0 if if d(s, t) = = s 2. and t only of one-sided limits that we do not demand the existence d(s, t) = d(t, s) 3. discontinuity. at points of 4. d(s, + d(t, t) d(s, < u) u). result: following the have then We functions, metric the d(f, If of set a be to S take we THEOREM Let 1. ~r E M. Set g) will enable us to measure the distance between func- S. E g f, tions { OER}. wER", : 0) + = span cr(w.x n closure metric space a The 2. DEFINITION of set a of S d) is defined (Y, follows: as Then is dense in C(R") if Z, only if or is not an and ). algebraic polynomial ( a.e. S= > 0, qsES, d(s, t) < ~}. {tlV~ = closure(S) PROPOSITION ~ is a non-negative finite Assume mea- 1. defined everywhere almost DEFINITION u function A 3. compact sure on R" continuous absolutely support, with with measure Lebesgue to measurable a on u respect in is ~, Then dense Lebesgue to respect with measure. set ~2 in R ~ said is to be essentially bounded on ~2 ( u oo, polyno- a not is a if only LP(U), 1 < p < (land is almost L~°(f~)). tf lu(x) l everywhere bounded on ). a.e. ( mial the norm ft. We denote u E L o~(f~) with all We that LP(u) is the of recall measurable set Ilull,~c~j functions fsuch that: 0} I,(x)l lu(x)l. > : X} = = esssup inf{Xlu{x: xEll (fR [IfL,,,,= ~. lf(x)lpdu(x))'/'< defined DEFINITION everywhere u fimction A 4. almost worth is proposition a following The is it as stating measure a ~2 a on ( Lebesgue to respect with domain of simple consequence Theorem 1 and some known ~) es- an open set in R is is said to be domain locally results. com- ), f~) ~( L E u ( ~2 on bounded sentially if for every ~ u K). pact set K C fl, E L ( ~ polynomial a not ( is M 2. a If a.e. ), PROPOSITION then a DEFINITION set F of fimctions in 5. We say that tl oo Lto~(R in dense is ) if g C(R") E for every fimction E + 0): ~,, 0 E R, w (.4) ..4} = span{a(~w.x compact exists R") and for ever.v C( set K C R". there n such a sequence offunctionsfj E F that o4 some there for R") C( in dense is if only and if R" ~_ does exist a nontrivial homogeneous polynomial not lim lie-£IIL®~K~ 0. = j~oo vanishing ,4. on can if we set show that a given Hence, of functions every F is for that conclude can we C(R"), in dense continuous set compact each and ~) C(R E g function AND DISCUSSION CONCLUSION 5. that there a function f R", Fsuch is f is a good KC ~ element threshold the why illustrate to wish we First, this to take we paper C(R") In K. on g approximation theorems. Consider the acti- is the in above essential to may one that functions world" of"real family the be a(x) = a (without function vation sin(x). threshold) wish to approximate with feedforward network archi- In polynomial; not This function is a it addition, is tectures the form Ate. F is taken to be the family of of continuous, bounded, and non-constant. Now, the set functions all implied by the network's architecture, x)lw E consists of only odd functions {sin(w. R} 1 [eqn family the namely )], its all over runs ( ~0 when -a(-x)). = (a(x) Thus, an function like cos(x) even The question is this: Under values. possible key which be cannot 1], [-1, in family this using approximated necessary family and sufficient conditions a will the on in R} E {sin(w.x)[w that implying not dense is to any de- of networks A t be capable of approximating adding corrected 1, 1 the to C([- by ]). be could This sired function? continuous given any accuracy ele- family sin(. ) functions with a threshold (offset) + sin(x = cos(x)). Moreover, if a is (e.g., ment ~r/2) 4. RESULTS an function, there exist sufficient and necessary entire which Theorem 1 will on hold conditions a under Let M denote set of functions which are in the a without see discussion general more a for ( threshold closure L Io~(R) and have the property. The following 1987). the other & On Micchelli, hand, the Dahmen function in of the set of points of discontinuity of any ex- for Take effect. no absolutely have may threshold measure. Lebesgue M is of zero This implies that for x) e". = ample the function a( > O, there M, a interval [a, ~ a any b], and/~ exists our essential The is analysis in threshold the of role we which of union intervals, open of number finite the the of light in interesting artificial of backdrop biological measure is by U, of denote 6, such that a uniformly

4 864 et Leshno M. al. mr types neu- most of neural biological Because networks. + coa(woY ~ - IfCY) ~/2k, < 00)1 fire to are known processed rons only when their inputs j=l value, intriguing to it note is exceed a certain threshold E y all for Thus, ]. cti,/3i [ must ar- their in present be mechanism same the that k m i counterparts well. as tificial + cijcr(wo(ai'x) e, Z - Ig(x) 0ij)l < ~ func- activation In a similar vein, our finding that i=1 j=l has an tions not be continuous or need smooth also Et C(R) in dense that Thus K. E x all for implies biological interpretation, because acti- the important dense ~, is • in C(R"). discon- neurons vation functions of be well real may even or tinuous, These restrictions on nonelementary. ~ C ~ have Step 3. which fimctions all of set If (the a the activation functions have no bearing on our results, C(R). is Y.~ then order), all of derivatives dense in require "nonpolynomiality." merely which then a ~ C°°(R) - because [a((w + h)x + O) If ) or "Whether 1991 As Hornik pointed out, ( the not h w, for El E O)]/h + a(wx 0 R and every 4= 0, E continuity assumption can entirely be dropped is still that follows it same the By Y-i. E 0) + (d/dw)a(wx and quite challenging problem." We hope that open an N argument O) E Y.--~l for all k E (dk/dwk)a(wx + solve way. satisfactory a in problem this results our Now + O) (dk/dwk)a(wx R). E 0 w, all (and = denotes the kth derivative (k)( WX + 0), where xka (k) a 6. PROOFS and a, of exists there polynomial a a is a since not Ok R such that a(k)(0k) 4: 0. Thus, E definition re- our prove to main following the use We suits: z dw--- = .r%~k~(0~.) E ~. 0) + a(wx woo.o=0, dk DEFINITION 6. For a fimction u supp( u) by denote we the set supp(u) = {xl u(x) 4: 0}. By implies that ~ contains all polynomials. This Theorem Proqfo[ divide the proof into a series We 1. Z~ it Theorem Weierstrass's contains that follows of steps. K C R. in C(K) dense is t for each E is, That • C(R). polynomial, then Y,, is not dense a a Step is I.f 1. R"). C( in ( Co, ~ compact with function ~o each For 4. ~ C Step a*~o ~ ~l. support), a of polynomial a is If k, degree then a(w. x + 0) is We first that recall a polynomial of degree k for every w and 0, and in fact Y., is exactly the of polynomials algebraic of set y)~o(y)d.v, = f ~(x (~.so)(x) - dense degree Thus E,, cannot be k. in most at • C(R"). the convolution of tr well-defined. and ~o, and is is that In what follows we always assume a is not a We continuous were a (If constructively. 4 Step prove soft this easily be proven using could analysis ap- a polynomial. proach.) Step dense is Y., then C(R), in dense is Z~ If 2. that Without loss ___ ~o supp of assume generality, R"). C( in and ~, - [ uni- can we that prove to wish we that a], [-a, on Y-t from approximate formly a*~p We a]. = C(R)} f~ The space V R", span{f(a, x)la E prove that will [e.g., ways, various in follows This C(R"). in dense is Chui (1992), Li & (1987), Micchelli & Dahmen m Vostrecov & (in Pinkus & Lin (1961), Kreines x yi ) yi )~o( A Yi - tr( press)]. Now, let g E C(R") and any be R" C K i=1 R". of subset compact Thus C(K). in dense is V uniformly converges to a, So on [-a, a], where given a i E R", = i > 0 there exist f ~ E C(R) and 1 ... k, such that 2ia y~ ... m, =-a+--, i= 1 k m ~f(al'x)l Ig(x)- <~/2, i~l 1 = i 2a/m, = Ayi and m. ... {a K} all x E K. Now for i. x[x E for /3i] [ai, ~ we first choose t5 E > 0 so that: Given > 0, = some interval fl~], i finite 1 ... k. Because [ai, there exist Y.~ is dense in [a~, /3~], i = 1 ... k, (2) 10allallL~t-2..2.~ IkollL® ~ *. , ... 1 = j o, 1,..., and o w o, c constants = i m~, 0 such that k, For this given 6 > 0, we know that there exists a finite

5 Multilayer 865 Feedforward Networks obtain Thus, we of the of whose union number r(fi) measure intervals, a that continuous on [ -2o~, uniformly ~, such is is U < a(x ~ 3~, y)SO(y)dy - tr(x f yi)SO(yi)Ayi - - We now choose msutficiently large so that 2a]/U. i=l , and: c~r( > m6 ) 6 [-a, allxE for a]. • t[ -< 2a/m, then If Is- SO C~ not is SO ~r, that have we E a some for If 5. Step ISO(s) - (3) ~ SO(t)I then polynomial, R C( ). dense is Y,I in 2~ll~ll~=t-~..z~ " 4, Step From that follows thus It I. Z E SO a, 2a]/U, then tl - Is and 2a/m, [-2a, tE s, If -< E 0 w, each for ~--~l, in also is 0) + so)(wx (a, R. tr E ~, C E so tr, have we C~, SO any and for Now la(s) - a(t)l ~ (4) IlSOlIv " 29-31 pp. if Thus from Step 3, see Adams (1975, ). a, is in dense Y,~ then polynomial a not is SO these conditions can be satisfied. Equation (3) All • C(R). from the uniform continuity of SO. By as- follows uniformly continuous on [-2a, 2o d / sumption a is is assume therefore now that ~. polyno- a SO We holds. and thus U eqn (4) for mial this fact from conclude will We C~. E SO all polynomial (a.e.). cr is itself a that [Yi-I, xE a]. Fix A i = [-a, Yi], (Y0 = O/). Now Set E C~, or. SO is a polynomial, then 6. If for all Step SO a a(x exists an m E N such that a* SO is there polynomial Y. y)SO(j,)dy- - a(x y,)so(y)d.v - "f~ f i=1 i all C~. of degree at most m E for SO b] < For any a b, define the set of functions C~[a, f -< y,)llsoml ey, i i=l the be to with support in [a, C~ all functions of set in We E prove the claim b]. the case of SO first does not in- because so C [-a, c~]. If x - Ai supp C~[a, o b]. by: b] C~[a, on We metric a define U, the tersect eqn from then (4), so~ll. o(~,. so;) = Z 2-" Ib,, - dy. I so(.v) -< I I ~(x - 3') - ~(x - 3,, ) Ilso(.v) I dy so=ll.. .=o 1 -4-Ilso,- ' i t where sup.,.~ta,o]lsoU~(x)l. 7,y=0 = Ilsoll. b] C~[a, if sum this which Thus holds we over those A~ for o metric is a complete the metric vector space with an ¢. most we get at error of Fr6chet ( space). Let us now consider those intervals A, for which is SO assumption cr. SO By a polynomial for any E by 4= 13 A,) - (x ~. We denote such intervals U Cff~[a, b]. of Because U has A~. a and is composed measure Define: length of the 2x~ intervals is intervals, r(a) the total m) ~5 at most + (4~ / we r(6). By our choice of m, degree(a.SO) Vk <- {SO @ = C~[a, b]l k}. (4a/m)r(~) < 56. Thus from eqn (2), have + that ~ is _ Vk subspace, closed a Vk+~, Vk that have We and Z y,)[iso(.v)l - a(x - y) - i,~(x r. d3, 0 = C [a, b]. 2114~®t-_,o.~.111SOlIL®5a < ~. < k=0 Finally, C~ As [a, Baire's by space, metric complete a is b] p. 1972, ( Narici and Bachman [ Theorem Category g f )AYi )SO()'i Yi m ~(x- Z y~)SO(y)4v- ~(x- that exists there such = V,, m an 77)] integer i=1 i i=1 [ C~ and category second the of is b] a, [ C~ ( b] a, some I/',, Be- set. open non-void a contains therefore i=~faa(x-yi)[SO()')-SO(yi)]dy = V,, a space thus vector = C~[a, b]). cause V,, is proof the C~[a, for b] case. For the completes This I not m number the that does we case general the note SO(y~)l dy, I (x - y,)l ISO(y) E r as depend seen the interval [a, b]. This can be on most at depends m translation By follows. the of and eqn (3) from interval. any be B] [A, Let length of the interval. E bi], SOl find can i For SO C~[a~, C~[A, B] we E [ ' ] v~)ldy 2all~rll~=t_z=.z,] la(x , = < that - bg = 1 ... k, such bi], [A, B] _ U~=l [ai, -- ~(~" i=~l

6 866 et Leshno M. al. based on a mean- result. It comes from the theory of deep i ~* soi, ai b - a and SO = X ~=, SOi. Thus a. SO = X ~= = is and for every i = 1 of polynomial a ... 9~ a, k, One periodic fimctions introduced by Schwartz (1947). equal than less degree or to m. Therefore de- or a of not is C(R) E that if that is theory consequence then pol.vnomial, so) <_ gree(a, m . • polynomial Step If a.SO is a 7. of degree at most m for , O) + span{a(x : 0 E R } all then E Cff ~ , is a polynomial of degree at ~ most ~r (a.e.). m the closure a function of in contains its form eX-"cos vx real v) such X, ( fimction any Because 0). (0, for some 4= From 6, Step is in C ~ , Theorem 1 then necessity the without follows of proving 4-7. Steps = y)~o~'+~(.v)dy - o(x f 0 2. Proposition of Proof degree of polynomial a is a If Distribution results standard From Cff. C So all for in set polynomial of m, then the in contained is of ~, [e.g., Theory, Friedman ( 1963, 57-59)],, is itself be cannot thus and m, _< degree total LP(V), in dense a polynomial of degree at most m (a.e.). • l_

such C(K) E that a exists there 0 of shows" REMARK 2. 1 A reading of the proof Theorem of on some g fimction a approximating that problem the lif- gllL,<,~ -< ~12, almost can ~, from of K compact into divided be R" and for this given g ~ C(K) there exists an h E ~,, such by parts. One part is the approximation ofg(x) two that of form X i.fi (a i" x) where the f are fimc- fimctions the on C( ). The other is the approximation off in tions R " IIg - hliL=IK, ~ 2---c t Y~ from the appropriate set separable, is ) R C( Because . R ) can choose ~r ~ C( one so that for each and every f Thus llg - where hllLp~.) ~ ~/2, and c = ~/P(K). C( R ], b a, [ interval any and ) < ~. hllL,~,~ - lie + gIIL,I,~ IIf- -< hllL,~,~ Ill- • + 0 If(x) - c~r(w~ max 0)1. inf = a~_l'~.h c,w,O ) Proposition 3. In Vostrecov and Kreines ( Proof 1961 of only "processing unit" is needed. However one That is, Lin [see found be can press)] (in Pinkus and the also by g(x) approximating of problem the remains there that given o4 R" for fact C Z, are called ridge functions or f(a (these x) ~" latter M(o4) span{f(w.x)lfE C(R), w ~ 04 }, = plane which seems to be the more difficult waves), problem. does a there is dense in exist if and not only if C(R") on 04. non-trivial homogeneous polynomial vanishing discontinuity, 3. lf a has jump REMARK say at O, and a f( { span ~_ } R E 0 X, [ ~r( { span Now O) + x If x) w. kw. 71] [-rt, O) > o (some with (0, and 0) in continuous is for every w ~ 04. This proves the } necessity. C(R) E and lim.,-_o- a( x) existing and unequal, a(x) lim.,-~o, To prove the sufficiency assume M(O4) is dense in ahnost then one can obtain Theorem 1 directly (from use and C(R") given Step 2 of the as in argument the C( after K any and ) R Step E f any given is, That 2). of in proof dense is Y.j if that show to C(R) 1 Theorem ~ compact it is possible to approximate f from in R, • C(R"). in dense is Z,,(O4) then thus and (ca(O)), Y.~ in are Constants K on choosing a ~ 1, 1 } and multiplying by {- constant we can w REFERENCES assume that York: New spaces. Sobohw (1975). A. R. Adams, Press. Academic 1. a(x)= lim lim a(x)=0, Bachman, analysis (5th ed.). G., & Narici. L. (1972). Functional x~O- x~O* Press. Academic New York: Letting w --~ 0 in a(wx), can then prove that the we Li. functions C. K., & Xin Chui, (1992). Approximation by ridge hidden of and neural networks layer. Journal one with Approxi- fimction E Y~t, where x(x) = × for x < O, and X(x) O l- 13 70, Theory; mation I. 14 It now I for x > O. = is easy to see how linear combi- a sigmoidal Cybenko, (1989). Approximation by superposifions of G. nations uniformly can translates its and x of approxi- Mathematics function. 303- 2, Systems, and Signals, Control, of anyfinite fimction interval (and mate continuous on an), 314. subset of R). thus an), compact remarks Some ridge on (1987). A. C. Micchelli, & W., Dahmen, and functions. Approximation Theory its Applications, 3, 2-3. proof of REMARK There is 4. another method of Theo- Friedman, A. (1963). Generalized fimctions and partial differential which fimctions, continuous for 1 rem but simple is equations. Englewood Cliffs. N J: Prentice-HalL

7 Multilayer Networks 86 Feed.lorward 7 ). 1989 realization of continuous approximate K. the On ( Funahashi, Lapedes, How (1988). (Tech. R. networks Farber, neural & A., work 183-192. 2, Networks, Neural networks. neural by mappings Los Alamos National NM: Alamos, Los LA-UR-88-418). Rep. White, network exists There (1988). H. a & R. Gallant, A. neural Library. make avoidable mistakes. Proceedings of the IEEE that does not Mater's LeCun, ( 1987 ). Models connexionistes de I'apprentissage. Y. 657-664). 1, ( Networks Neural on International Second Conference Pierre Universit~ thesis, Paris. Curie, Marie et San SOS Printing. Diego: Lin, V. Ya., & press). A. (in Fundamentality of ridge functions. Pinkus, propagation neural of the Hecht-Nielsen, R. Theory (1989). back Approximation Theory. of Journal International of Proceedings network. on Conference Joint the Schwartz, (1947). fonctions Th~orie g6n6rale moyenne-l~r- des L. Printing. Diego: San ). 593-606 (I, Networks. Neural SOS iodiques. Annals of Mathematics, 48, 857-929. M., ). H. White, & ( Stinchcombe, Approximating and learning 1990 E. G. Hinton, procedure. Artificial learning Connectionist (1989). unknown mappings using multilayer feedforward networks with 185-234. Intelligence, 40, (Tech. weights bounded California, of University Diego: San Rep.). ( feed- multilayer of capabilities Approximation ). 1991 K. Hornik, Economics. of Dept. 251-257. forward networks. Neural Networks, 4, Kreines, & of B. con- Approximation (1961). A. M. Vostrecov, A., Multilayer ). 1989 ( H. White, & M., Stinchcombe, K., Hornik, feed- waves. Aka- Doklad), plane of superposition by functions tinuous approximators. Neural Networks. forward universal are networks Nauk demia 2. 140, Dokl.), Math. (Soviet SSSR 359-366. 2, S. Irie, B., & Miyake, three perceptrons. layer of Capabilities ). 1988 ( Proceedings on Conference International Second IEEE the of SOS Neural Networks (1, 641-648). San Diego: Printing.

i Reinforcement Learning: An Introduction Second edition, in progress ****Complete Draft**** November 5, 2017 Richard S. Sutton and Andrew G. Barto c © 2014, 2015, 2016, 2017 The text is now complete,...

More info »Educating the Young Child 7 Advances in Theory and Research, Implications for Practice Leslie Haley Wasserman Debby Zambo Editors Early Childhood and Neuroscience - Links to Development and Learning

More info »bioRxiv preprint first posted online Apr. 30, 2019; doi: http://dx.doi.org/10.1101/622837 . The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted b...

More info »bioRxiv preprint first posted online Apr. 30, 2019; doi: http://dx.doi.org/10.1101/622837 . The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted b...

More info »bioRxiv preprint first posted online Apr. 30, 2019; doi: http://dx.doi.org/10.1101/622837 . The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted b...

More info »Fuzzy Control Kevin M. Passino Department of Electrical Engineering The Ohio State University Stephen Yurkovich Department of Electrical Engineering The Ohio State University ddison-Wesley Longman, In...

More info »9 PM Page 3:25 1 3/28/05 mr05_Article Journal of Economic Literature Vol. XLIII (March 2005), pp. 9–64 Neuroeconomics: How Neuroscience Can Inform Economics ∗ C RAZEN P RELEC EORGE , G , and L C OEWEN...

More info »G ARE AND C UIDELINES FOR THE SE OF M AMMALS IN N EUROSCIENCE U B EHAVIORAL R ESEARCH AND Committee on Guidelines for the Use of Animals in Neuroscience and Behavioral Research Institute for Laborator...

More info »Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981 You can buy this book for 30 pounds or $50. See http://www.inference...

More info »Bandit Algorithms ́ Tor Lattimore and Csaba Szepesv ari st Draft of Wednesday 1 May, 2019 Revision: c0525791b66f0f41db4e87204ac91f41693d4365

More info »Journal of Machine Learning Research 15 (2014) 1929-1958 Submitted 11/13; Published 6/14 Dropout: A Simple Way to Prevent Neural Networks from Overfitting [email protected] Nitish Srivastava Geoff...

More info »BEHAVIORAL AND BRAIN SCIENCES (2016), Page 1 of 72 , e62 doi:10.1017/S0140525X1500031X The Now-or-Never bottleneck: A fundamental constraint on language Morten H. Christiansen Department of Psychology...

More info »Journal of Machine Learning Research 12 (2011) 2493-2537 Su bmitted 1/10; Revised 11/10; Published 8/11 Natural Language Processing (Almost) from Scratch ∗ @ COLLOBERT . COM Ronan Collobert RONAN † JW...

More info »Computer Vision: Algorithms and Applications Richard Szeliski September 3, 2010 draft c © 2010 Springer This electronic draft is for non-commercial personal use only, and may not be posted or re-distr...

More info »759697 STX Abend XX 10.1177/0735275118759697Sociological Theory X 2018 research-article Sociological Theory 2018, Vol. 36(1) 88 –116 © American Sociological Association 2018 The Love of Neuroscience: ...

More info »Guidelines for the Use of Antiretroviral Agents in Adults and Adolescents Living with HIV Downloaded from https://aidsinfo.nih.gov/guidelines on 5/7/2019 Visit the AIDS info website to access the most...

More info »Journal of Machine Learning Research 17 (2016) 1-40 Submitted 10/15; Revised 4/16; Published 4/16 End-to-End Training of Deep Visuomotor Policies † Sergey Levine [email protected] † [email protected]

More info »1 A Survey of Machine Learning for Big Code and Naturalness Microsoft Research MILTIADIS ALLAMANIS, EARL T. BARR, University College London PREMKUMAR DEVANBU, University of California, Davis CHARLES S...

More info »