2016 cc BBS

Transcript

1 BEHAVIORAL AND BRAIN SCIENCES (2016), Page 1 of 72 , e62 doi:10.1017/S0140525X1500031X The Now-or-Never bottleneck: A fundamental constraint on language Morten H. Christiansen Department of Psychology, Cornell University, Ithaca, NY 14853 The Interacting Minds Centre, Aarhus University, 8000 Aarhus C, Denmark Haskins Laboratories, New Haven, CT 06511 [email protected] Nick Chater Behavioural Science Group, Warwick Business School, University of Warwick, Coventry, CV4 7AL, United Kingdom [email protected] Abstract: Memory is eeting. New material rapidly obliterates previous material. How, then, can the brain deal successfully with the fl bottleneck, the brain must compress and “ Now-or-Never continual deluge of linguistic input? We argue that, to deal with this ” recode linguistic input as rapidly as possible. This observation has strong implications for the nature of language processing: (1) the eagerly ” recode and compress linguistic input; (2) as the bottleneck recurs at each new representational level, “ language system must the language system must build a multilevel linguistic representation; and (3) the language system must deploy all available ” ; once the original input is lost, information predictively to ensure that local linguistic ambiguities are dealt with “ Right-First-Time ” Chunk-and-Pass processing. Similarly, language learning must also occur “ there is no way for the language system to recover. This is in the here and now, which implies that language acquisition is learning to process, rather than inducing, a grammar. Moreover, this perspective provides a cognitive foundation for grammaticalization and other aspects of language change. Chunk-and-Pass processing also helps explain a variety of core properties of language, including its multilevel representational structure and duality of patterning. This approach promises to create a direct relationship between psycholinguistics and linguistic theory. More generally, we outline a framework within which to integrate often disconnected inquiries into language processing, language acquisition, and language change and evolution. chunking; grammaticalization; incremental interpretation; language acquisition; language evolution; language processing; Keywords: online learning; prediction; processing bottleneck; psycholinguistics 5. The nature of what is learned during language 1. Introduction acquisition; fl eeting. As we hear a sentence unfold, we Language is 6. The degree to which language acquisition involves rapidly lose our memory for preceding material. Speakers, item-based generalization; too, soon lose track of the details of what they have just said. 7. The degree to which language proceeds item- change “ ” : If lin- Language processing is therefore Now-or-Never by-item; guistic information is not processed rapidly, that informa- 8. The connection between grammar and lexical tion is lost for good. Importantly, though, while knowledge; fundamentally shaping language, the Now-or-Never bottle- 9. The relationships between syntax, semantics, and 1 neck c to language but instead arises from fi is not speci pragmatics. general principles of perceptuo-motor processing and memory. The existence of a Now-or-Never bottleneck is relatively Thus, we argue that the Now-or-Never bottleneck has uncontroversial, although its precise character may be fundamental implications for key questions in the language conse- debated. However, in this article we argue that the sciences. The consequences of this constraint are, more- of this constraint for language are remarkably quences over, incompatible with many theoretical positions in lin- far-reaching, touching on the following issues: guistic, psycholinguistic, and language acquisition research. Note, however, that arguing that a phenomenon arises from the Now-or-Never bottleneck does not necessarily 1. The multilevel organization of language into sound- undermine alternative explanations of that phenomenon based units, lexical and phrasal units, and beyond; (although it may). Many phenomena in language may 2. The prevalence of linguistic relations (e.g., in local simply be overdetermined. For example, we argue that phonology and syntax); incrementality (point 3, above) follows from the Now-or- 3. The incrementality of language processing; Never bottleneck. But it is also possible that, irrespective 4. The use of prediction in language interpretation and of memory constraints, language understanding would still production; 1 © Cambridge University Press 2016 0140-525X/16 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

2 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language 2. The Now-or-Never bottleneck be incremental on functional grounds, to extract the linguis- tic message as rapidly as possible. Such counterfactuals are, Language input is highly transient. Speech sounds, like fi cult to evaluate. By contrast, the properties of of course, dif other auditory signals, are short-lived. Classic speech per- the Now-or-Never bottleneck arise from basic information ception studies have shown that very little of the auditory processing limitations that are directly testable by experi- 1962 ), with more trace remains after 100 ms (Elliott ment. Moreover, the Now-or-Never bottleneck should, we recent studies indicating that much acoustic information suggest, have methodological priority to the extent that it ). Similar- 2010 already is lost after just 50 ms (Remez et al. provides an integrated framework for explaining many ly, and of relevance for the perception of sign language, aspects of language structure, acquisition, processing, and studies of visual change detection suggest that the ability evolution that have previously been treated separately. 70 ms is very – to maintain visual information beyond 60 In Figure 1 , we illustrate the overall structure of the argu- limited (Pashler 1988 ). Thus, sensory memory for language ment in this article. We begin, in the next section, by brie fl y input is quickly overwritten, or interfered with, by new in- making the case for the Now-or-Never bottleneck as a the perceiver in some way pro- coming information, unless general constraint on perception and action. We then cesses what is heard or seen. discuss the implications of this constraint for language pro- The problem of the rapid loss of the speech or sign signal cessing, arguing that both comprehension and production is further exacerbated by the sheer speed of the incoming Chunk-and-Pass processing: incre- “ involve what we call ” linguistic input. At a normal speech rate, speakers mentally building chunks at all levels of linguistic structure 15 phonemes per second, corresponding – produce about 10 as rapidly as possible, using all available information predic- 6 syllables every second or 150 words per to roughly 5 – tively to process current input before new information ). However, the resolution 1986 minute (Studdert-Kennedy arrives (sect. 3). From this perspective, language acquisition of the human auditory system for discrete auditory events is involves learning to process: that is, learning rapidly to create only about 10 sounds per second, beyond which the sounds and use chunks appropriately for the language being learned ). Conse- 1948 fuse into a continuous buzz (Miller & Taylor (sect. 4). Consequently, short-term language change and quently, even at normal rates of speech, the language longer-term processes of language evolution arise through system needs to work beyond the limits of auditory tempo- variation in the system of chunks and their composition, sug- ral resolution for nonspeech stimuli. Remarkably, listeners gesting an item-based theory of language change (sect. 5). can learn to process speech in their native language at up to This approach points to a processing-based interpretation twice the normal rate without much decrement in compre- of construction grammar, in which constructions corre- ). Although the production of signs 1965 hension (Orr et al. spond to chunks, and where grammatical structure is funda- appears to be slower than the production of speech (at least mentally the history of language processing operations when comparing the production of ASL signs and spoken within the individual speaker/hearer (sect. 6). We conclude English; Bellugi & Fischer 1972 ), signed words are still fl y summarizing the main points of our argument. by brie very brief visual events, with the duration of an ASL syllable 2 1986 ). being about a quarter of a second (Wilbur & Nolkn Making matters even worse, our memory for sequences of auditory input is also very limited. For example, it has been known for more than four decades that naïve listeners are unable to correctly recall the temporal order of just four M ORTEN H . C HRISTIANSEN is Professor of Psychology – for example, hisses, buzzes, and tones – distinct sounds and Co-Director of the Cognitive Science Program at even when they are perfectly able to recognize and label Cornell University as well as Senior Scientist at the ). 1969 each individual sound in isolation (Warren et al. Haskins Labs and Professor of Child Language at the Our ability to recall well-known auditory stimuli is not sub- Interacting Minds Centre at Aarhus University. He is ) to 4 ± 1 1956 stantially better, ranging from 7 ± 2 (Miller c papers and has the author of more than 170 scienti fi 2000 (Cowan ). A similar limitation applies to visual ve books. His research focuses on fi written or edited the interaction of biological and environmental con- memory for sign language (Wilson & Emmorey ). 2006 straints in the processing, acquisition, and evolution of The poor memory for auditory and visual information, com- language, using a combination of computational, behav- eeting nature of linguistic input, bined with the fast and fl ioral, and cognitive neuroscience methods. He is a imposes a fundamental constraint on the language Fellow of the Association for Psychological Science, system: the Now-or-Never bottleneck . If the input is not and he delivered the 2009 Nijmegen Lectures. processed immediately, new information will quickly over- write it. C HATER is Professor of Behavioural Science at ICK N Importantly, the Now-or-Never bottleneck is not unique Warwick Business School, United Kingdom. He is the to language but applies to other aspects of perception and c publications in fi author of more than 250 scienti action as well. Sensory memory is rich in detail but decays psychology, philosophy, linguistics, and cognitive science, and he has written or edited ten books. He ; rapidly unless it is further processed (e.g., Cherry 1953 has served as Associate Editor for Cognitive Science , ). Likewise, short-term Coltheart 1960 ; Sperling 1980 , , and Man- Psychological Review Psychological Science memory for auditory, visual, and haptic information is . His research explores the cognitive agement Science also limited and subject to interference from new input and social foundations of human rationality, focusing ; Pavani & Turatto 1983 ; Haber 2006 (e.g., Gallace et al. on formal models of inference, choice, and language. 2008 ). Moreover, our cognitive ability to respond to He is a Fellow of the Cognitive Science Society, the sensory input is further constrained in a serial (Sigman & Association for Psychological Science, and the British ) or near-serial (Navon & Miller 2002 ) Dehaene 2005 Academy. manner, severely restricting our capacity for processing BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 2 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

3 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Figure 1. The structure of our argument, in which implicational relations between claims are denoted by arrows. The Now-or-Never bottleneck provides a fundamental constraint on perception and action that is independent of its application to the language system (and gure). Speci fi hence outside the diamond in the c implications for language (indicated inside the diamond) stem from the Now-or-Never fi ’ s necessitating of Chunk-and-Pass language processing, with key consequences for language acquisition. The impact of the bottleneck Now-or-Never bottleneck on both processing and acquisition together further shapes language change. All three of these interlinked claims concerning Chunk-and-Pass processing, acquisition as processing, and item-based language change (grouped together in the shaded upper triangle) combine to shape the structure of language itself. face-to-face, was surreptitiously exchanged for a complete- multiple inputs arriving in quick succession. Similar limita- 1998 ly different person (Simons & Levin ). Information not tions apply to the production of behavior: The cognitive encoded in the short amount of time during which the – a system cannot plan detailed sequences of movements sensory information is available will be lost. long sequence of commands planned far in advance Second, because memory limitations also apply to would lead to severe interference and be forgotten recoded representations, the cognitive system further before it could be carried out (Cooper & Shallice 2006 ; chunks the compressed encodings into multiple levels of 1960 ). However, the cognitive system adopts Miller et al. representation of increasing abstraction in perception, several processing strategies to ameliorate the effects of and decreasing levels of abstraction in action. Consider, the Now-or-Never bottleneck on perception and action. for example, memory for serially ordered symbolic infor- eager processing: First, the cognitive system engages in mation, such as sequences of digits. Typically, people are It must recode the rich perceptual input as it arrives to quickly overloaded and can recall accurately only the last capture the key elements of the sensory information as eco- three or four items in a sequence (e.g., Murdock ). 1968 , and as distinctively , as possible (e.g., Brown nomically But it is possible to learn to rapidly encode, and recall, 1991 ; Crowder & Neath et al. ); and it must do so 2007 long random sequences of digits, by successively chunking rapidly, before new input overwrites or interferes with such sequences into larger units, chunking those chunks the sensory information. This notion is a traditional one, into still larger units, and so on. Indeed, an extended dating back to early work on attention and sensory ), study of a single individual, SF (Ericsson et al. 1980 1958 ; Coltheart 1980 ; Haber memory (e.g., Broadbent showed that repeated chunking in this manner makes it ). The resulting com- 1964 ; Treisman 1960 ; Sperling 1983 possible to recall with high accuracy sequences containing pressed representations are lossy: They provide only an ab- as many as 79 digits. But, crucially, this strategy requires stract summary of the input, from which the rich sensory learning to encode the input into multiple, successive, input cannot be recovered (e.g., Pani 2000 ). Evidence and distinct levels of representations each sequence of – from the phenomena of change and inattentional blindness chunks at one level must be shifted as a single chunk to a suggests that these compressed representations can be very higher level before more chunks interfere with or overwrite selective (see Jensen et al. 2011 for a review), as exempli- the initial chunks. Indeed, SF chunked sequences of three ed by a study in which half of the participants failed to fi or four digits, the natural chunk size in human memory notice that someone to whom they were giving directions, 3 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

4 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language object requires anticipating the grip force required to (Cowan 2000 ), into a single unit (corresponding to running deal with the loads generated by the accelerations of the times, dates, or human ages), and then grouped sequences object. Grip force is adjusted too rapidly during the manip- of three to four of those chunks into larger chunks. Inter- ulation of an object to rely on sensory feedback (Flanagan estingly, SF also verbally produced items in overtly discern- & Wing 1997 ). Indeed, the rapid prediction of the sensory ible chunks, interleaved with pauses, indicating how action consequences of actions (e.g., Poulet & Hedwig 2006 ) sug- also follows the reverse process (e.g., Lashley 1951 ; Miller gests the existence of so-called forward models, which allow ). The case of SF further demonstrates that low-level 1956 the brain to predict the consequence of its actions in real information is far better recalled when organized into time. Many have argued (e.g., Wolpert et al. 2011 ; see higher-level structures than merely coded as an unorga- also Clark 2013 ; Pickering & Garrod 2013a ) that forward nized stream. Note, though, that lower-level information models are a ubiquitous feature of the computational ma- is typically forgotten; it seems unlikely that even SF could chinery of motor control and more broadly of cognition. fi recall the speci c visual details of the digits with which The three processing strategies we mention here eager – he was presented. More generally, the notion that percep- processing, computing multiple representational levels, tion and action involve representational recoding at a suc- provide the cognitive system with impor- – and anticipation fi ts with a cession of distinct representational levels also tant means to cope with the Now-or-Never bottleneck. long tradition of theoretical and computational models in Next, we argue that the language system implements cognitive science and computer vision (e.g., Bregman similar strategies for dealing with the here-and-now ; Miller et al. ; see 2010 ; Zhu et al. 1960 1990 1982 ; Marr nature of linguistic input and output, with wide-reaching 2001 Gobet et al. for a review). Our perspective on repeat- and fundamental implications for language processing, ac- ed multilevel compression is also consistent with data from quisition and change as well as for the structure of language functional magnetic resonance imaging (fMRI) and intra- itself. Speci cally, we propose that our ability to deal with fi cranial recordings, suggesting cortical hierarchies across sequences of linguistic information is the result of what from low-level sensory to high-level – vision and audition we call processing, by which the lan- ” Chunk-and-Pass “ – perceptual and cognitive areas integrating information guage system can ameliorate the effects of the Now-or- at progressively longer temporal windows (Hasson et al. Never bottleneck. More generally, our perspective offers 2008 ; Honey et al. 2012 2011 ; Lerner et al. ). a framework within which to approach language compre- Third, to facilitate speedy chunking and hierarchical Table 1 summarizes the impact hension and production. , anticipation compression, the cognitive system employs of the Now-or-Never bottleneck on perception/action and using prior information to constrain the recoding of language. ; Clark current perceptual input (for reviews see Bar 2007 The style of explanation outlined here, focusing on pro- ). For example, people see the exact same collection 2013 cessing limitations, contrasts with a widespread interest in of pixels either as a hair dryer (when viewed as part of a rational, rather processing-based, explanations in cognitive bathroom scene) or as a drill (when embedded in a science (e.g., Anderson ths & fi Grif 2006 ; Chater et al. 1990 2004 picture of a workbench) (Bar ). Therefore, using prior information to future input is likely to be es- predict ; Tenen- Tenenbaum 2009 ; Oaksford & Chater 1998 ; 2007 sential (as well to successfully encoding that future input 2011 ), including language processing (Gibson baum et al. as helping us to react faster to such input). Anticipation ; ). 2001 ; Hale 2011 ; Piantadosi et al. 2006 et al. 2013 allows faster, and hence more effective, recoding when on- Given the fundamental nature of the Now-or-Never bottle- coming information creates considerable time urgency. neck, we suggest that such explanations will be relevant Such predictive processing will be most effective to the only for explaining language use insofar as they incorporate extent that the greatest possible amount of available infor- processing constraints. For example, in the spirit of rational mation (across different types and levels of abstraction) is analysis (Anderson 1990 ) and bounded rationality (Simon integrated as fast as possible. Similarly, anticipation is im- 1982 ), it is natural to view aspects of language processing portant for action as well. For example, manipulating an and structure, as described below, as “ optimal ” responses Table 1. Summary of the Now-or-Never bottleneck ’ s implications for perception/action and language Strategies Perception and action Language Mechanisms Eager processing Lossy chunking Chunking in memory and action (Lashley Incremental interpretation (Bever 1970 ) ); lossy descriptions 1951 ; Miller ); multiple and production (Meyer 1996 1956 2000 ) constraints satisfaction (MacDonald (Pani et al. 1994 ) Multiple levels of linguistic structure Multiple levels of Hierarchical compression Hierarchical memory (Ericsson et al. (e.g., sound-based, lexical, phrasal, representation ), 1980 ), action (Miller et al. 1960 discourse); local dependencies ) 2001 problem solving (Gobet et al. (Hawkins 2004 ) Anticipation ); 1996 Syntactic prediction (Jurafsky Predictive processing Fast, top-down visual processing (Bar multiple-cue integration (Farmer et al. ); forward models in motor 2004 2011 control (Wolpert et al. ); ); visual world (Altmann & 2006 ) 2013 predictive coding (Clark ) Kamide 1999 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 4 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

5 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language decreasing linguistic abstraction until the system arrives c processing limitations, such as the Now-or- to speci fi fi cient information to drive the articula- at chunks with suf Never bottleneck (for this style of approach, see, e.g., tors (either the vocal apparatus or the hands). As in com- Chater et al. 2008 ; Levy ). Here, though, our focus 1998 prehension, memory is limited within a given level of is primarily on mechanism rather than rationality. representation, resulting in potential interference between the items to be produced (e.g., Dell et al. ). 1997 Thus, higher-level chunks tend to be passed down immedi- 3. Chunk-and-Pass language processing ” ready, “ ately to the level below as soon as they are leading to a bias toward producing easy-to-retrieve utterance com- eeting nature of linguistic input, in combination with fl The ponents before harder-to-retrieve ones (e.g., Bock 1982 ; the impressive speed with which words and signs are pro- ). For example, if there is a competition 2013 MacDonald duced, imposes a severe constraint on the language between two possible words to describe an object, the system: the Now-or-Never bottleneck. Each new incoming fl word that is retrieved more uently will immediately be word or sign will quickly interfere with previous heard and passed on to lower-level articulatory processes. To further seen input, providing a naturalistic version of the masking facilitate production, speakers often reuse chunks from used in psychophysical experiments. How, then, is language the ongoing conversation, and those will be particularly comprehension possible? Why doesn t interference ’ rapidly available from memory. This phenomenon is re- between successive sounds (or signs) obliterate linguistic ected by the evidence for lexical (e.g., Meyer & Schvane- fl input before it can be understood? The answer, we veldt 1986 1971 ) and structural priming (e.g., Bock ; Bock & suggest, is that our language system rapidly recodes this 1998 ; Potter & Lom- Loebell 1990 ; Pickering & Branigan input into chunks, which are immediately passed to a 1998 bardi ) within individuals as well as alignment across higher level of linguistic representation. The chunks at this ; Pickering & conversational partners (Branigan et al. 2000 higher level are then themselves subject to the same ); priming is also extensively observed in text Garrod 2004 Chunk-and-Pass procedure, resulting in progressively 2005 corpora (Hoey ). As noted by MacDonald ( ), 2013 larger chunks of increasing linguistic abstraction. Crucially, these memory-related factors provide key constraints on given that the chunks recode increasingly larger stretches of the production of language and contribute to cross-linguis- input from lower levels of representation, the chunking 4 tic patterns of language use. process enables input to be maintained over ever-larger A useful analogy for language production is the notion of temporal windows. It is this repeated chunking of lower- 5 ” “ just-in-time stock control, in which stock inventories are level information that makes it possible for the language kept to a bare minimum during the manufacturing process system to deal with the continuous deluge of input that, if 1988 ). Similarly, the Now-or-Never bottle- (Ohno & Mito not recoded, is rapidly lost. This chunking process is also neck requires that, for example, low-level phonetic or artic- what allows us to perceive speech at a much faster rate ulatory decisions not be made and stored far in advance and ): We have 1969 than nonspeech sounds (Warren et al. then reeled off during speech production, because any learned to chunk the speech stream. Indeed, we can easily buffer in which such decisions can safely be stored would understand (and sometimes even repeat back) sentences quickly be subject to interference from subsequent materi- consisting of many tens of phonemes, despite our severe al. So the Now-or-Never bottleneck requires that once de- memory limitations for sequences of nonspeech sounds. tailed production information has been assembled, it be What we are proposing is that during comprehension, executed straightaway, before it can be obliterated by the the language system must keep on chunk- – similar to SF – ing the incoming information into increasingly abstract oncoming stream of later low-level decisions, similar to levels of representation to avoid being overwhelmed by what has been suggested for motor planning (Norman & eager the input. That is, the language system engages in ; see also MacDonald 1986 Shallice ). We call this pro- 2013 processing when creating chunks. Chunks must be built posal Just-in-Time language production. right away, or memory for the input will be obliterated by interference from subsequent material. If a phoneme or 3.1. Implications of Strategy 1: Incremental processing syllable is recognized, then it is recoded as a chunk and passed to a higher level of linguistic abstraction. And Chunk-and-Pass processing has important implications for once recoded, the information is no longer subject to inter- comprehension and production: It requires that both take ference from further auditory input. A general principle of place . In incremental processing, representa- incrementally perception and memory is that interference arises primarily tions are built up as rapidly as possible as the input is en- between overlapping representations (Crowder & Neath countered. By contrast, one might, for example, imagine ); crucially, recoding ; Treisman & Schmidt 1982 1991 a parser that waits until the end of a sentence before begin- avoids such overlap. For example, phonemes interfere ning syntactic analysis, or that meaning is computed only with each other, but phonemes interfere very little with once syntax has been established. However, such process- words. At each level of chunking, information from the pre- ing would require storing a stream of information at a vious level(s) is compressed and passed up as chunks to the single level of representation, and processing it later; but next level of linguistic representation, from sound-based given the Now-or-Never bottleneck, this is not possible 3 chunks up to complex discourse elements. because of severe interference between such representa- As a conse- tions. Therefore, incremental interpretation and produc- quence, the rich detail of the original input can no longer tion follow directly from the Now-or-Never constraint on be recovered from the chunks, although some key informa- language. tion remains (e.g., certain speaker characteristics; Nygaard To get a sense of the implications of Chunk-and-Pass ; Remez et al. et al. 1997 ). 1994 processing, it is interesting to relate this perspective to spe- In production, the process is reversed: Discourse-level c computational principles and models. How, for ci fi chunks are recursively broken down into subchunks of BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 5 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

6 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language over considerably longer periods of time than planning at example, do classic models of parsing fi t within this frame- the syllabic level. Similarly, processes of reduction to facil- work? A wide range of psychologically inspired models in- volves some degree of incrementality of syntactic analysis, itate production (e.g., modifying the speech signal to make which can potentially support incremental interpretation it easier to produce, such as reducing a vowel to a schwa, or 1972 (e.g., Phillips 1996 2003 ). For example, ; ; Winograd shortening or eliminating phonemes) can be observed the sausage machine parsing model (Frazier & Fodor across different levels of linguistic representation, from in- 1978 ) proposes that a preliminary syntactic analysis is ; Jurafsky et al. 2004 dividual words (e.g., Gahl & Garnsey carried out phrase-by-phrase, but in complete isolation ) to frequent multiword sequences (e.g., Arnon & 2001 from semantic or pragmatic factors. But for a right-branch- 1999 ). ; Bybee & Schiebman 2013 Cohen Priva ing language such as English, chunks cannot be built left- s ’ Some may object that the Chunk-and-Pass perspective to-right, because the leftmost chunks are incomplete until strict notion of incremental interpretation and production later material has been encountered. Frameworks from leaves the language system vulnerable to the rather sub- Kimball ( “ stacking up ” 1973 incomplete ) onward imply stantial ambiguity that exists across many levels of linguistic constituents that may then all be resolved at the end of representation (e.g., lexical, syntactic, pragmatic). So-called the clause. This approach runs counter to the memory con- garden path sentences such as the famous “ The horse raced straints imposed by the Now-or-Never bottleneck. Recon- past the barn fell (Bever ” ) show that people are vul- 1970 ciling right-branching with incremental chunking and nerable to at least some local ambiguities: They invite com- fl exible constituency processing is one motivation for the prehenders to take the wrong interpretive path by treating of combinatory categorial grammar (e.g., Steedman 1987 ; raced as the main verb, which leads them to a dead end. ; see also Johnson-Laird 1983 ). 2000 Only when the is encountered does it fell, nal word, fi With respect to comprehension, considerable evidence should be in- become clear that something is wrong: raced supports incremental interpretation, going back more terpreted as a past participle that begins a reduced relative than four decades (e.g., Bever 1970 ; Marslen-Wilson clause (i.e., the horse [that was] raced past the barn fell). 1975 ). The language system uses all available information The dif fi culty of recovery in such garden path sentences in- to rapidly integrate incoming information as quickly as pos- dicates how strongly the language system is geared toward sible to update the current interpretation of what has been incremental interpretation. said so far. This process includes not only sentence-internal Viewed as a processing problem, garden paths occur information about lexical and structural biases (e.g., when the language system resolves an ambiguity incorrect- Farmer et al. ; Trueswell et al. 1994 ; MacDonald 2006 ly. But in many cases, it is possible for an underspeci fi ed 1993 ), but also extra-sentential cues from the referential representation to be constructed online, and for the ambi- and pragmatic context (e.g., Altmann & Steedman 1988 ; guity to be resolved later when further linguistic input Thornton et al. 1999 ) as well as the visual environment ’ arrives. This type of case is consistent with Marr ) 1976 s( and world knowledge (e.g., Altmann & Kamide 1999 ; “ proposal of the ” that the principle of least commitment, Tanenhaus et al. 1995 ). As the incoming acoustic informa- perceptual system resolves ambiguous perceptual input tion is chunked, it is rapidly integrated with contextual in- only when it has suf fi cient data to make it unlikely that formation to recognize words, consistent with a variety of such decisions will subsequently have to be reversed. data on spoken word recognition (e.g., Marslen-Wilson Given the ubiquity of local ambiguity in language, such 1975 ; van den Brink et al. 2001 ). These words are then, cation may be used very widely in language fi underspeci in turn, chunked into larger multiword units, as evidenced processing. Note, however, that because of the severe con- by recent studies showing sensitivity to multiword sequenc- straints the Now-or-Never bottleneck imposes, the lan- es in online processing (e.g., Arnon & Snider 2010 ; Reali & guage system cannot adopt broad parallelism to further 2007b ; Siyanova-Chanturia et al. 2011 ; Trem- Christiansen minimize the effect of ambiguity (as in many current prob- ), and subse- 2011 ; Tremblay et al. 2010 blay & Baayen ; 1996 ; Jurafsky 2006 abilistic theories of parsing, e.g., Hale quently further integrated with pragmatic context into ). Rather, within the Chunk-and-Pass account, 2008 Levy discourse-level structures. the sole role for parallelism in the processing system is in Turning to production, we start by noting the powerful deciding how the input should be chunked; only when con- that is, that we intuition that we speak “ into the void ”– fl icts concerning chunking are resolved can the input be plan only a short distance ahead. Indeed, experimental passed on to a higher-level representation. In particular, studies suggest that, for example, when producing an utter- we suggest that competing higher-level codes cannot be ac- ance involving several noun phrases, people plan just one ’ tivated in parallel. This picture is analogous to Marr s prin- (Smith & Wheeldon 1999 ), or perhaps two, noun phrases ciple of least commitment of vision: Although there might ahead (Konopka 2012 ), and they can modify a message be temporary parallelism to resolve con icts about, say, fl during production in the light of new perceptual input correspondence between dots in a random-dot stereogram, 2015 ). Moreover, speech- (Brown-Schmidt & Konopka it is not possible to create two con fl icting three-dimensional 1982 error data (e.g., Cutler ) reveal that, across representa- surfaces in parallel, and whereas there may be parallelism tional levels, errors tend to be highly local: Phonological, over the interpretation of lines and dots in an image, it is morphemic, and syntactic errors apply to neighboring not possible to see something as both a duck and a rabbit chunks within each level (where material may be moved, simultaneously. More broadly, higher-level representations swapped, or deleted). Consequently, speech planning fi cient evidence has accrued are constructed only when suf – appears to involve just a small number of chunks the that they are unlikely later to need to be replaced (for – number of which may be similar across linguistic levels stimuli outside the psychological laboratory, at least). but which covers different amounts of time depending on Maintaining, and later resolving, an underspeci fi ed rep- the linguistic level in question. For example, planning in- resentation will create local memory and processing volving chunks at the level of intonational bursts stretches demands that may slow down processing, as is observed, 6 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

7 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language the occurrence of an ambiguous verb to specify the correct for example, by increased reading times (e.g., Trueswell interpretation of that verb. Moreover, eye-tracking studies et al. 1994 ) and distinctive patterns of brain activity (as have demonstrated that dialogue partners exploit both con- ). Accordingly, measured by ERPs; Swaab et al. 2003 versational context and task demands to constrain interpre- when the input is ambiguous, the language system may tations to the appropriate referents, thereby side-stepping require later input to recognize previous elements of the effects of phonological and referential competitors speech stream successfully. The Now-or-Never bottleneck ) that have otherwise 2011 (Brown-Schmidt & Konopka “ ” right-context effects requires that such online be highly been shown to impede language processing (e.g., Allo- local because raw perceptual input will be lost if it is not ). These dialogue-based constraints also 1998 penna et al. 2010 ed (e.g., Dahan fi rapidly identi ). Right-context mitigate syntactic ambiguities that might otherwise effects may arise where the language system can delay res- 2008 disrupt processing (Brown-Schmidt & Tanenhaus ). ed representations olution of ambiguity or use underspeci fi This information may be further combined with other that do not require resolving the ambiguity right away. Sim- probabilistic sources of information such as prosody (e.g., ilarly, cataphora, in which, for example, a referential ) to 2005 ; Snedeker & Trueswell 2003 Kraljic & Brennan He is a nice “ pronoun occurs before its referent (e.g., resolve potential ambiguities within a minimal temporal ed ) require the creation of an underspeci ” guy, that John fi window. Finally, it is not clear that undetected garden he is encountered, which is re- entity (male, animate) when path errors are costly in normal language use, because if solved to be coreferential with John only later in the sen- communication appears to break down, the listener can ). Overall, the 2003 tence (e.g., van Gompel & Liversedge fi cation from repair the communication by requesting clari Now-or-Never bottleneck implies that the processing the dialogue partner. system will build the most abstract and complete represen- 6 ed, given the linguistic input. tation that is justi fi Of course, outside of experimental studies, background 3.2. Implications of Strategy 2: Multiple levels of linguistic knowledge, visual context, and prior discourse will structure provide powerful cues to help resolve ambiguities in the signal, allowing the system rapidly to resolve many apparent The Now-or-Never bottleneck forces the language system ambiguities without incurring a substantial danger of to compress input into increasingly abstract chunks that “ garden-pathing. ” Indeed, although syntactic and lexical cover progressively longer temporal intervals. As an ambiguities have been much studied in psycholinguistics, example, consider the chunking of the input illustrated in increasing evidence indicates that garden paths are not a Figure 2 . The acoustic signal is fi rst chunked into higher- major source of processing dif fi culty in practice (e.g., Fer- level sound units at the phonological level. To avoid 7 ; Jaeger 2010 ; Wasow & Arnold 2003 ). reira 2008 For interference between local sound-based units, such as pho- example, Roland et al. ( ) reported corpus analyses 2006 nemes or syllables, these units are further recoded as showing that, in naturally occurring language, there is gen- rapidly as possible into higher-level units such as mor- cient information in the sentential context before erally suf fi phemes or words. The same phenomenon occurs at the Figure 2. Chunk-and-Pass processing across a variety of linguistic levels in spoken language. As input is chunked and passed up to increasingly abstract levels of linguistic representations in comprehension, from acoustics to discourse, the temporal window over which information can be maintained increases, as indicated by the shaded portion of the bars associated with each linguistic level. This process is reversed in production planning, in which chunks are broken down into sequences of increasingly short and concrete units, from a discourse-level message to the motor commands for producing a speci fi c articulatory output. More-abstract representations correspond to longer chunks of linguistic material, with greater look-ahead in production at higher levels of abstraction. Production processes may further serve as the basis for predictions to facilitate comprehension and thus provide top- down information in comprehension. (Note that the names and number of levels are for illustrative purposes only.) BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 7 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

8 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Such representational locality is exempli ed across differ- fi next level up: Local groups of words must be chunked into ent linguistic levels by the local nature of phonological pro- larger units, possibly phrases or other forms of multiword cesses from reduction, assimilation, and fronting, including sequences. Subsequent chunking then recodes these repre- more elaborate phenomena such as vowel harmony (e.g., sentations into higher-level discourse structures (that may Nevins 2010 ), speech errors (e.g., Cutler ), the imme- 1982 themselves be chunked further into even more abstract ectional morphemes and the verbs to fl diate proximity of in representational structures beyond that). Similarly, produc- which they apply, and the vast literature on the processing tion requires running the process in reverse, starting with dif fi culties associated with non-local dependencies in sen- the intended message and gradually decoding it into in- 1998 ; Hawkins tence comprehension (e.g., Gibson ). 2004 creasingly more speci fi c chunks, eventually resulting in As noted earlier, the higher the level of linguistic represen- the motor programs necessary for producing the relevant tation, the longer the limited time window within which in- speech or sign output. As we discuss in section 3.3, the pro- formation can be chunked. Whereas dealing with just two duction process may further serve as the basis for predic- center-embeddings at the sentential level is prohibitively tion during comprehension (allowing higher-level ; Karlsson dif fi ), we are 2007 cult (e.g., de Vries et al. 2011 fl uence the processing of current input). information to in able to deal with up to four to six embeddings at the More generally, our account is agnostic with respect to ). This is 2013 multi-utterance discourse level (Levinson c characterization of the various levels of linguis- fi the speci 8 because chunking takes place at a much longer time tic representation (e.g., whether sound-based chunks take course at the discourse level compared with the sentence the form of phonemes, syllables, etc.). What is central for level, providing more time to resolve the relevant depend- the Chunk-and-Pass account: some form of sound-based ency relations before they are subject to interference. level of chunking (or visual-based in the case of sign lan- , processing within each Figure 2 Finally, as indicated by guage), and a sequence of increasingly abstract levels of but – level of linguistic representation takes place in parallel chunked representations into which the input is continually with a clear temporal component as chunks are passed – recoded. between levels. Note that, in the Chunk-and-Pass frame- A key theoretical implication of Chunk-and-Pass pro- work, it is entirely possible that linguistic input can simulta- cessing is that the multiple levels of linguistic representa- neously, and perhaps redundantly, be chunked in more tion, typically assumed in the language sciences, are a than one way. For example, syntactic chunks and intona- necessary by-product of the Now-or-Never bottleneck. tional contours may be somewhat independent (Jackendoff Only by compressing the input into chunks and passing 2007 ). Moreover, we should expect further chunking across them to increasingly abstract levels of linguistic representa- ” “ different channels of communication, including visual tion can the language system deal with the rapid onslaught input such as gesture and facial expressions. of incoming information. Crucially, though, our perspective The Chunk-and-Pass perspective is compatible with a also suggests that the different levels of linguistic represen- number of recent theoretical models of sentence compre- tations do not have a true part whole relationship with one – hension, including constraint-based approaches (e.g., Mac- another. Unlike in the case of SF, who learned strategies to Donald et al. 1994 ) and 1994 ; Trueswell & Tanenhaus perfectly unpack chunks from within chunks to reproduce ’ s[ certain generative accounts (e.g., Jackendoff ] paral- 2007 the original string of digits, language comprehension typi- lel architecture). Intriguingly, fMRI data from adults cally employs lossy compression to chunk the input. That 2006a ) and infants (Dehaene- (Dehaene-Lambertz et al. is, higher-level chunks will not in general contain complete ) indicate that activation responses Lambertz et al. 2006b copies of lower-level chunks. Indeed, as speech input is to a single sentence systematically slows down when encoded into ever more abstract chunks, increasing moving away from the primary auditory cortex, either amounts of low-level information will typically be lost. ’ back toward Wernicke s area or forward toward Broca ’ s ), there is 1983 Instead, as in perception (e.g., Haber area, consistent with increasing temporal windows for greater representational underspeci fi cation with higher chunking when moving from phonemes to words to levels of representation because of the repeated process 9 phrases. Indeed, the cortical circuits processing auditory of lossy compression. Thus, we would expect a growing in- input, from lower (sensory) to higher (cognitive) areas, volvement of extralinguistic information, such as perceptu- follow different temporal windows, sensitive to more and al input and world knowledge, in processing higher levels of more abstract levels of linguistic information, from pho- linguistic representation (see, e.g., Altmann & Kamide nemes and words to sentences and discourse (Lerner ). 2009 et al. ). Similarly, the reverse 2013 ; Stephens et al. 2011 Whereas our account proposes a lossy hierarchy across process, going from a discourse-level representation of levels of linguistic representation, only a very small the intended message to the production of speech (or within a level: other- number of chunks are represented sign) across parallel linguistic levels, is compatible with wise, information is rapidly lost due to interference. This several current models of language production (e.g., has the crucial implication that chunks within a given ). Data 2001 1997 ; Levelt ; Dell et al. 2006 Chang et al. . For example, acoustic infor- locally level can interact only from intracranial recordings during language production mation must rapidly be coded in a non-acoustic form, say, are consistent with different temporal windows for chunk in terms of phonemes; but this is only possible if phonemes decoding at the word, morphemic, and phonological correspond to local chunks of acoustic input. The process- levels, separated by just over a tenth of a second (Sahin ing bottleneck therefore enforces a strong pressure toward 2009 et al. ). These results are compatible with our proposal local dependencies within a given linguistic level. Impor- that incremental processing in comprehension and produc- tantly, though, this does not imply that linguistic relations tion takes place in parallel across multiple levels of linguis- are restricted only to adjacent elements but, instead, that tic representation, each with a characteristic temporal they may be formed between any of the small number of window. elements maintained at a given level of representation. 8 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

9 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language 3.3. Implications of Strategy 3: Predictive language It also parallels Marr ’ s( 1976 ) principle of least commit- processing ment, as we mentioned earlier, according to which the per- ceptual system should, as far as possible, only resolve We have already noted that, to be able to chunk incoming dent that perceptual ambiguities when suf fi ciently con fi information as fast and as accurately as possible, the lan- they will not need to be undone. Moreover, it is compatible guage system exploits multiple constraints in parallel with the fi ne-grained weakly parallel interactive model across the different levels of linguistic representation. 1988 (Altmann & Steedman ) in which possible chunks Such cues may be used not only to help disambiguate pre- are proposed, word-by-word, by an autonomous parser vious input, but also to generate expectations for what may and one is rapidly chosen using top-down information. come next, potentially further speeding up Chunk-and-Pass To facilitate chunking across multiple levels of represen- processing. Computational considerations indicate that tation, prediction takes place in parallel across the different simple statistical information gleaned from sentences levels but at varying timescales. Predictions for higher-level provides powerful predictive constraints on language com- chunks may run ahead of those for lower-level chunks. For prehension and can explain many human processing results example, most people simply answer ” two “ in response to (e.g., Christiansen & Chater 1999 ; Christiansen & “ How many animals of each kind did Moses the question ; Hale 2006 ; Jurafsky 1996 ; ; Elman 1990 MacDonald 2009 ”– take on the Ark? failing to notice the semantic ; Padó et al. 2008 Levy ). Similarly, eye-tracking data 2009 anomaly (i.e., it was Noah Ark) even in ’ s Ark, not Moses ’ suggest that comprehenders routinely use a variety of the absence of time pressure and when made aware that – sources of probabilistic information from phonological the sentence may be anomalous (Erickson & Matteson cues to syntactic context and real-world knowledge – to an- ). That is, anticipatory pragmatic and communicative 1981 ticipate the processing of upcoming words (e.g., Altmann & considerations relating to the required response appear to ; Farmer et al. 1999 Kamide ). 2006 ; Staub & Clifton 2011 trump lexical semantics. More generally, the time course Results from event-related potential experiments indicate of normal conversation may lead to an emphasis on more that rather speci c predictions are made for upcoming fi temporally extended higher-level predictions over lower- input, including its lexical category (Hinojosa et al. 2005 ), level ones. This may facilitate the rapid turn-taking that ; Wicha grammatical gender (Van Berkum et al. 2005 ) has been observed cross-culturally (Stivers et al. 2009 et al. 2004 ), and even its onset phoneme (DeLong et al. and which seems to require that listeners make quite spe- 2010 ). Accordingly, 2005 ) and visual form (Dikker et al. c predictions about when the speaker ’ s current turn fi ci there is a growing body of evidence for a substantial role will ), as well as being 2012 nish (Magyari & De Ruiter fi of prediction in language processing (for reviews, see, c linguistic fi able to quickly adapt their expectations to speci e.g., Federmeier 2007 ; Hagoort 2009 ; Kamide 2008 ; ). 2013 environments (Fine et al. 2014 Kutas et al. 2007 ) and evidence ; Pickering & Garrod We view the anticipation of turn-taking as one instance that such language prediction occurs in children as young as of the broader alignment that takes place between dialogue 2012 2 years of age (Mani & Huettig ). Importantly, as well partners across all levels of linguistic representation (for a as exploiting statistical relations within a representational ). This dovetails with 2004 review, see Pickering & Garrod level, predictive processing allows top-down information fMRI analyses indicating that although there are some from higher levels of linguistic representation to rapidly fi c brain areas, spa- comprehension- and production-speci 10 constrain the processing of the input at lower levels. tiotemporal patterns of brain activity are in general From the viewpoint of the Now-or-Never bottleneck, closely coupled between speakers and listeners (e.g., prediction provides an opportunity to begin Chunk-and- Silbert et al. 2014 ) ob- ). In particular, Stephens et al. ( 2010 Pass processing as early as possible: to constrain represen- served close synchrony between neural activations in speak- tations of new linguistic material as it is encountered, and ers and listeners in early auditory areas. Speaker activations even incrementally to begin recoding predictable linguistic preceded those of listeners in posterior brain regions (in- input before it arrives. This viewpoint is consistent with cluding parts of Wernicke ’ s area), whereas listener activa- recent suggestions that the production system may be tions preceded those of speakers in the striatum and pressed into service to anticipate upcoming input (e.g., anterior frontal areas. In the Chunk-and-Pass framework, ). Chunk-and-Pass pro- 2013a ; 2007 Pickering & Garrod the listener lag primarily derives from delays caused by cessing implies that there is practically no possibility for the chunking process across the various levels of linguistic going back once a chunk is created because such backtrack- representation, whereas the speaker lag predominantly re- ing tends to derail processing (e.g., as in the classic garden ects the listener s anticipation of upcoming input, espe- ’ fl path phenomena mentioned above). This imposes a Right- cially at the higher levels of representation (e.g., First-Time pressure on the language system in the face of pragmatics and discourse). Strikingly, the extent of the lis- 11 linguistic input that is highly locally ambiguous. The con- tener ’ s anticipatory brain responses were strongly correlat- tribution of predictive modeling to comprehension is that it ed with successful comprehension, further underscoring facilitates local ambiguity resolution while the stimulus is the importance of prediction-based alignment for language still available. Only by recruiting multiple cues and integrat- processing. Indeed, analyses of real-time interactions show ing these with predictive modeling is it possible to resolve that alignment increases when the communicative task local ambiguities quickly and correctly. ). By decreas- becomes more dif fi cult (Louwerse et al. 2012 ts with proposals such as that fi Right-First-Time parsing ing the impact of potential ambiguities, alignment thus 1980 by Marcus ( ), where local ambiguity resolution is makes processing as well as production easier in the face delayed until later disambiguating information arrives, of the Now-or-Never bottleneck. and models in which aspects of syntactic structure may We have suggested that only an incremental, predictive fi ed, therefore not requiring the ambiguity be underspeci language system, continually building and passing on new 1995 ; Sturt & Crocker 1996 ). to be resolved (e.g., Gorrell chunks of linguistic material, encoded at increasingly BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 9 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

10 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language 1986 ). Whatever the appropriate computational frame- abstract levels of representation, can deal with the on- work, the Now-or-Never bottleneck requires that language slaught of linguistic input in the face of the severe acquisition be viewed as a type of skill learning, such as memory constraints of the Now-or-Never bottleneck. We learning to drive, juggle, play the violin, or play chess. suggest that a productive line of future work is to consider the practicing Such skills appear to be learned through the extent to which existing models of language are compat- skill, using online feedback during the practice itself, al- ible with these constraints, and to use these properties to though the consolidation of learning occurs subsequently guide the creation of new theories of language processing. 2004 ). The challenge of language ac- (Schmidt & Wrisberg quisition is to learn a dazzling sequence of rapid processing operations, rather than conjecturing a correct “ linguistic 4. Acquisition is learning to process theory. ” If speaking and understanding language involves Chunk- and-Pass processing, then acquiring a language requires 4.1. Implications of Strategy 1: Online learning learning how to create and integrate the right chunks rapidly, before current information is overwritten by new The Now-or-Never bottleneck implies that learning can input. Indeed, the ability to quickly process linguistic depend only on material currently being processed. As which has been proposed as an indicator of chunk- input – we have seen, this implication requires a processing strat- is a strong predictor of language – ) ing ability (Jones 2012 egy according to which modi cation to current representa- fi acquisition outcomes from infancy to middle childhood tions (in this context, learning) occurs right away; in 2008 (Marchman & Fernald ). The importance of this If learn- online. machine-learning terminology, learning is process is also introspectively evident to anyone acquiring ing does not occur at the time of processing, the represen- a second language: Initially, even segmenting the speech tation of linguistic material will be obliterated, and the stream into recognizable sounds can be challenging, opportunity for learning will be gone forever. To facilitate let alone parsing it into words or processing morphology such online learning, the child must learn to use all avail- and grammatical relations rapidly enough to build a seman- able information to help constrain processing. The integra- tic interpretation. The ability to acquire and rapidly deploy is a fundamental – cues or – tion of multiple constraints a hierarchy of chunks at different linguistic scales is parallel component of many current theories of language acquisi- to the ability to chunk sequences of motor movements, tion (see, e.g., contributions in Golinkoff et al. ; 2000 , built up by contin- skill numbers, or chess positions: It is a Morgan & Demuth ; Weissenborn & Höhle 1996 ; 2001 ual practice. ). For 2008 for a review, see Monaghan & Christiansen Viewing language acquisition as continuous with other ’ initial guesses about whether a example, second-graders types of skill learning is very different from the standard novel word refers to an object or an action are affected formulation of the problem of language acquisition in lin- by that word s phonological properties (Fitneva et al. ’ guistics. There, the child is viewed as a linguistic theorist 2009 ); 7-year-olds use visual context to constrain online who has the goal of inferring an abstract grammar from a ); and pre- 1999 sentence interpretation (Trueswell et al. ; 1957 1965 ) corpus of example sentences (e.g., Chomsky language production and comprehension is con- ’ schoolers and only secondarily learning the skill of generating and un- strained by pragmatic factors (Nadig & Sedivy 2002 ). Thus, derstanding language. But perhaps the child is not a mini- children learn rapidly to apply the multiple constraints used language acquisition is linguist. Instead, we suggest that in incremental adult processing (Borovsky et al. 2012 ). nothing more than learning to process : to turn meanings Nonetheless, online learning contrasts with traditional into streams of sound or sign (when generating language), approaches in which the structure of the language is and to turn streams of sound or sign back into meanings ine by the cognitive system acquiring a corpus learned of fl (when understanding language). of past linguistic inputs and choosing the grammar or eetingly, then any fl If linguistic input is available only other model of the language that best fi ts with those learning must occur while that information is present; inputs. For example, in both mathematical and theoretical that is, learning must occur in real time, as the Chunk- ; Pinker 1984 ) and analysis (e.g., Gold 2011 ; Hsu et al. 1967 and-Pass process takes place. Accordingly, any modi ca- fi in grammar-induction algorithms in machine learning and s cognitive system in light of processing ’ tions to the learner cognitive science, it is typically assumed that a corpus of must, according to the Now-or-Never bottleneck, occur at language can be held in memory, and that the candidate the time of processing. the chunk The learner must learn to grammar is successively adjusted to t the corpus as well fi – to learn to recode the input at succes- input appropriately ; Pereira & 1999 as possible (e.g., Manning & Schütze sively more abstract linguistic levels; and to do this requires, Schabes ; Redington et al. 1998 ). However, this ap- 1992 of course, learning the structure of the language being proach involves learning linguistic regularities (at, say, the spoken. But how is this structure learned? morphological level), by storing and later surveying rele- We suggest that, in language acquisition, as in other areas vant linguistic input at a lower level of analysis (e.g., involv- of perceptual-motor learning, people learn by processing, ing strings of phonemes); and then attempting to and that past processing leaves traces that can facilitate determine which higher-level regularities best t the data- fi future processing. What, then, is retained, so that language base of lower-level examples. There are a number of dif - fi processing gradually improves? We can consider various culties with this type of proposal – for example, that only a possibilities: For example, the weights of a connectionist very rich lower-level representation (perhaps combined network can be updated online in the light of current pro- with annotations concerning relevant syntactic and seman- ); in an exemplar-based cessing (Rumelhart et al. 1986a tic context) is likely to be a useful basis for later analysis. model, traces of past examples can be reused in the But more fundamentally, the Now-or-Never bottleneck re- ; Nosofsky future (e.g., Hintzman 1988 1988 ; Logan at quires that information be retained only if it is recoded 10 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

11 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language acquisition involves learning to process, and generalizations processing time : Phonological information that is not can only be made over past processing episodes. chunked at the morphological level and beyond will be 12 obliterated by oncoming phonological material. So, if learning is shaped by the Now-or-Never bottle- 4.2. Implications of Strategy 2: Local learning neck, then linguistic input must, when it is encountered, be recoded successively at increasingly abstract linguistic Online learning faces a particularly acute version of a – a constraint imposed, levels if it is to be retained at all general learning problem: the stability-plasticity dilemma we argue, by basic principles of memory. Crucially, such in- ). How can new information (e.g., Mermillod et al. 2013 format to formation is not, therefore, in a suitably “ ” neutral be acquired without interfering with prior information? allow for the discovery of previously unsuspected linguistic The problem is especially challenging because reviewing regularities. In a nutshell, the lossy compression of the lin- fi prior information is typically dif cult (because recalling s ’ current guistic input is achieved by applying the learner earlier information interferes with new input) or impossible model of the language. But information that would point (where prior input has been forgotten). Thus, to a good ap- toward a better model of the language (if examined in ret- proximation, the learner can only update its model of the rospect) will typically be lost (or, at best, badly obscured) by language in a way that responds to current linguistic this compression, precisely because those regularities are input, without being able to review whether any updates captured by the current model of the language. not fi cally, if the are inconsistent with prior input. Speci Suppose, for example, that we create a lossy encoding of learner has a global model of the entire language (e.g., a language using a simple, context-free phrase structure traditional grammar), the learner runs the risk of over fi tting grammar that cannot handle, say, noun-verb agreement. that model to capture regularities in the momentary lin- The lossy encoding of the linguistic input produced using guistic input at the expense of damaging the match with this grammar will provide a poor basis for learning a past linguistic input. – more sophisticated grammar that includes agreement Avoiding this problem, we suggest, requires that learning precisely because agreement information will have been consisting of learning about speci c rela- local, be highly fi thrown away. So the Now-or-Never bottleneck rules out tionships between particular linguistic representations. the possibility that the learner can survey a neutral database New items can be acquired, with implications for later pro- of linguistic material, to optimize its model of the language. cessing of similar items; but learning current items does not The emphasis on online learning does not, of course, rule thereby create changes to the entire model of the language, is re- out the possibility that any linguistic material that thus potentially interfering with what was learned from past membered may subsequently be used to inform learning. input. One way to learn in a local fashion is to store individ- But according to the present viewpoint, any further learn- ual examples (this requires, in our framework, that those that material. So if a child comes reprocessing ing requires examples have been abstractly recoded by successive to learn a poem, song, or story verbatim, the child might Chunk-and-Pass operations, of course), and then to gener- extract more structure from that material by mental re- alize, piecemeal, from these examples. This standpoint is hearsal (or, indeed, by saying it aloud). The online learning priority of the speci “ consistent with the idea that the ” c, fi only when it is being constraint is that material is learned as observed in other areas of cognition (e.g., Jacoby et al. processed – ruling out any putative learning processes ), also applies to language acquisition. For example, 1989 that involve carrying out linguistic analyses or compiling children seem to be highly sensitive to multiword chunks statistics over a stored corpus of linguistic material. 2011 (Arnon & Clark ; see 2008 ; Bannard & Matthews 13 If this general picture of acquisition as learning-to- Arnon & Christiansen, submitted, for a review ). More process is correct, then we should expect the exploitation generally, learning based on past traces of processing will of memory to require “ replaying ” learned material, so typically be sensitive to details of that processing, as is ob- that it can be re-processed. Thus, the application of served across phonetics, phonology, lexical access, syntax, memory itself requires passing through the Now-or- ; Pierre- and semantics (e.g., Bybee 1998 ; Goldinger 2006 Never bottleneck there is no way of directly interrogating – humbert ). 1992 ; Tomasello 2002 an internal database of past experience; indeed, this view- That learning is local provides a powerful constraint, in- point fi ts with our subjective sense that we need to “ bring compatible with typical computational models of how the to mind ” past experiences or rehearse verbal material to – child might infer the grammar of the language because process it further. Interestingly, there is now also substan- these models typically do not operate incrementally but c evidence that replay does occur (e.g., in tial neuroscienti fi range across the input corpus, evaluating alternative gram- rat spatial learning, Carr et al. 2011 ). Moreover, it has long matical hypotheses (so-called batch learning). But, given been suggested that dreaming may have a related function unprocessed “ the Now-or-Never bottleneck, the corpus, ” fi (here using “ input to elim- ” ctional reverse “ learning over ” so readily available to the linguistic theorist, or to a comput- fi ed by the brain, Crick & inate spurious relationships identi er model, is lost to the human learner almost as soon as it is , for a closely Mitchison 1983 ; see Hinton & Sejnowki 1986 encountered. Where such information has been memo- cits in the ability to fi related computational model). De rized (as in the case of SF ’ s encoding of streams of replay material would, in this view, lead to consequent def- digits), recall and processing is slow and effortful. More- icits in memory and inference; consistent with this view- over, because information is encoded in terms of the point, Martin and colleagues have argued that rehearsal fi cult to neutrally review current encoding, it becomes dif fi de cits for phonological pattern and semantic information that input to create a better encoding, and cross-check 14 fi culties in the long-term acquisition and re- may lead to dif past data to test wide-ranging grammatical hypotheses. tention of word forms and word meanings, respectively, So, as we have already noted, the Now-or-Never bottleneck and their use in language processing (e.g., Martin & He seems incompatible with the view of a child as a mini- ). In summary, then, language 1994 ; Martin et al. 2004 linguist. 11 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

12 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language which embodies these principles is the simple recurrent By contrast, the principle of local learning is respected by 1999 network (Altmann 2002 ; Christiansen & Chater ; other approaches. For example, item-based (Tomasello Elman 1990 ), which learns to map from the current input ), connectionist (e.g., Chang et al. 1999; Elman 2003 15 on to the next element in a continuous sequence of linguis- 2002 ; MacDonald & Christiansen ), 1990 exemplar- tic (or other) input; and which learns, online, by adjusting 2009 ), and other usage-based (e.g., based (e.g., Bod of the network) to reduce ” weights “ its parameters (the Arnon & Snider ) accounts of language 2006 ; Bybee 2010 the observed prediction error, using the back-propagation acquisition tie learning and processing together – and learning algorithm. Using a very different framework, in assume that language is acquired piecemeal, in the 2001 ; Gold- the spirit of construction grammar (e.g., Croft Bauplan absence of an underlying . Such accounts, based 2006 berg ) recently de- 2011 ), McCauley and Christiansen ( on local learning, provide a possible explanation of the fre- veloped a psychologically based, online chunking model of quency effects that are found at all levels of language pro- incremental language acquisition and processing , incorpo- 2007 ; Bybee & Hopper cessing and acquisition (e.g., Bybee rating prediction to generalize to new chunk combinations. 2001 ; Ellis 2002 ; Tomasello 2003 ), analogous to exemplar- Exemplar-based analogical models of language acquisition based theories of how performance speeds up with practice and processing may also be constructed, which build and ). (Logan 1988 predict language structure online, by incrementally creat- The local nature of learning need not, though, imply that ing a database of possible structures, and dynamically language has no integrated structure. Just as in perception using online computation of similarity to recruit these and action, local chunks can be de fi ned at many different structures to process and predict new linguistic input. levels of abstraction, including highly abstract patterns, Importantly, prediction allows for top-down information for example, governing subject, verb, and object; and gen- uence current processing across different levels of fl to in eralizations from past processing to present processing will linguistic representation, from phonology to discourse, operate across all of these levels. Therefore, in generating and at different temporal windows (as indicated by or understanding a new sentence, the language user will Fig. 2 ). We see the ability to use such top-down informa- fl be in uenced by the interaction of multiple constraints tion as emerging gradually across development, building from innumerable traces of past processing, across differ- on bottom-up information. That is, children gradually ent linguistic levels. This view of language processing in- learn to apply top-down knowledge to facilitate processing volving the parallel interaction of multiple local via prediction, as higher-level information becomes more uential approach- fl constraints is embodied in a variety of in entrenched and allows for anticipatory generalizations to 1997 es to language (e.g., Jackendoff ). 2007 ; Seidenberg be made. In this section, we have argued that the child should not 4.3. Implications of Strategy 3: Learning to predict be viewed as a mini-linguist, attempting to infer the ab- stract structure of grammar, but as learning to process: If language processing involves prediction to make the – that is, learning to alleviate the severe constraints – ciently rapid fi encoding of new linguistic material suf imposed by the Now-or-Never bottleneck. Next, we to learning then a critical aspect of language acquisition is discuss how chunk-based language acquisition and process- make such predictions successfully (Altmann & Mirkovic ing have shaped linguistic change and, ultimately, the evo- 2009 ). Perhaps the most natural approach to predictive lution of language. learning is to compare predictions with subsequent ” reality, thus creating an “ error signal, and then to modify the predictive model to systematically reduce this error. Throughout many areas of cognition, such error-driven 5. Language change is item-based learning has been widely explored in a range of computa- Like language, human culture constantly changes. We con- tional frameworks (e.g., from connectionist networks, to re- tinually tinker with all aspects of culture, from social con- inforcement learning, to support vector machines) and has ventions and rituals to technology and everyday artifacts considerable behavioral (e.g., Kamin 1969 ) and neurobio- ). (see contributions in Richerson & Christiansen 2013 logical support (e.g., Schultz et al. 1997 ). Perhaps language, too, is a result of cultural evolution a – Predictive learning can, in principle, take a number of – product of piecemeal tinkering with the long-term evolu- forms: For example, predictive errors can be accumulated tion of language resulting from the compounding of myriad cations made to the fi over many samples, and then modi local short-term processes of language change. This hy- predictive model to minimize the overall error over those gures prominently in many recent theories of lan- pothesis fi samples (i.e., batch learning). But this is ruled out by the 2005 2009 guage evolution (e.g., Arbib ; Beckner et al. ; Now-or-Never bottleneck: Linguistic input, and the predic- 2008 ; Hurford 1999 ; Smith & Christiansen & Chater fl eetingly. But error- tions concerning it, is present only Kirby driven learning can also be “ online ”– each prediction 2003 ; Tomasello ; for a review of these theories, 2008 error leads to an immediate, though typically small, modi- ). Language is construed as a complex 2013 see Dediu et al. cation of the predictive model; and the accumulation of fi evolving system in its own right; linguistic forms that are these small modi fi cations gradually reduces prediction easier to use and learn, or are more communicatively ef fi - errors on future input. cient, will tend to proliferate, whereas those that are not A number of computational models adhere to these prin- will be prone to die out. Over time, processes of cultural ciples: Learning involves creating a predictive model of the evolution involving repeated cycles of learning and use language, using online error-driven learning. Such models, are hypothesized to have shaped the languages we limited though they are, may provide a starting point for observe today. creating an increasingly realistic account of language acqui- If aspects of language survive only when they are easy to sition and processing. For example, a connectionist model produce and understand, then moment-by-moment 12 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

13 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language We noted earlier that a consequence of Chunk-and-Pass processing will shape not only the structure of language processing for production is a tendency toward reduction, Grady 2005 ), but also the 2004 ’ (see also Hawkins ;O especially of more frequently used forms, and this consti- learning problem that the child faces. Thus, from the per- tutes one of several pressures on language change (see spective of language as an evolving system, language pro- ). Because reduction minimizes artic- also MacDonald 2013 cessing at the timescale of seconds has implications for ulatory processing effort for the speaker but may increase the longer timescales of language acquisition and evolution. processing effort for the hearer and learner, this pressure Figure 3 illustrates how the effects of the Now-or-Never can in extreme cases lead to a communicative collapse. ow from the timescale of processing to those fl bottleneck ed by a lab-based analogue of the game of This is exempli fi of acquisition and evolution. ” telephone, “ in which participants were exposed to a minia- Chunk-and-Pass processing carves the input (or output) ture arti fi cial language consisting of simple form-meaning into chunks at different levels of linguistic representation at mappings (Kirby et al. 2008 ). The initial language contained the timescale of the utterance (seconds). These chunks random mappings between syllable strings and pictures of constitute the comprehension and production events moving geometric fi gures indifferent colors. After exposure, from which children and adults learn and update their participants were asked to produce linguistic forms corre- ability to process their native language over the timescale fi c pictures. Importantly, the participants sponding to speci of the individual (tens of years). Each learner, in turn, is saw only a subset of the language but nonetheless had to gen- part of a population of language users that shape the cultur- eralize to the full language. The productions of the initial al evolution of language across a historical timescale (hun- learner were then used as the input language for the next dreds or thousands of years): Language will be shaped by learner, and so on for a total of 10 generations. “ ” In the the linguistic patterns learners fi nd easiest to acquire and absence of other communicative pressures (such as the process. And the learners will, of course, be strongly avoidance of ambiguity; Grice 1967 ), the language collapsed constrained by the basic cognitive limitation that is the into just a few different forms that allowed for systematic, Now-or-Never bottleneck – and, hence, through cultural albeit semantically underspeci fi ed, generalization to evolution, linguistic patterns, which can be processed unseen items. In natural language, however, the pressure through that bottleneck, will be strongly selected. More- toward reduction is normally kept in balance by the need over, if acquiring a language is learning to process and pro- to maintain effective communication. cessing involves incremental Chunk-and-Pass operations, Expanding on the notion of reduction and erosion, we then language change will operate through changes suggest that constraints from Chunk-and-Pass processing driven by Chunk-and-Pass processing, both within and can provide a cognitive foundation for grammaticalization between individuals. But this, in turn, implies that process- 1993 fi ). Speci (Hopper & Traugott cally, chunks at different es of language change should be , driven by pro- item-based discourse, syntax, morphology, – levels of linguistic structure ned over Chunk-and- fi cessing/acquisition mechanisms de and phonology – are potentially subject to reduction. Con- Pass representations (rather than, for example, being sequently, we can distinguish between different types of ned over abstract linguistic parameters, with diverse fi de grammaticalization, from discourse syntacticization and structural consequences across the entire language). Figure 3. Illustration of how Chunk-and-Pass processing at the utterance level (with the C referring to chunks) constrains the i acquisition of language by the individual, which, in turn, in fl uences how language evolves through learning and use by groups of individuals on a historical timescale. BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 13 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

14 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language et al. ) game-of-telephone experiments, showing that 2008 s( ’ semantic bleaching to morphological reduction and pho- when ambiguity is avoided, a highly structured linguistic netic erosion. Repeated chunking of loose discourse struc- system emerges across generations of learners, with tures may result in their reduction into more rigid syntactic morpheme-like substrings indicating different semantic s( 1979 ) hypothesis that ecting Givón constructions, re ’ fl 16 properties (color, shape, and movement). Another similar, ’ today s discourse. ’ s syntax is yesterday For example, the lab-based cultural evolution experiment showed that this resultative construction He pulled the window open might process of regularization does not result in the elimination derive from syntacticization of a loose discourse sequence of variability but, rather, in increased predictability through He pulled the window and it opened (Tomasello such as ). Whereas 2010 lexicalized patterns (Smith & Wonnacott 2003 ). As a further by-product of chunking, some words the initial language contained unpredictable pairings of that occur frequently in certain kinds of construction may nouns with plural markers, each noun became chunked bleached ” of meaning and ultimately gradually become “ with a speci c marker in the fi nal languages. fi signal only general syntactic properties. Consider, as an These examples illustrate how Chunk-and-Pass process- be going to example, the construction , which was originally obligatori ing over time may lead to so-called , cation fi ’ m I used exclusively to indicate movement in space (e.g., exible or optional fl whereby a pattern that was initially ) but which is now also used as an intention going to Ithaca becomes obligatory (e.g., Heine & Kuteva ). This 2007 m going or future marker when followed by a verb (as in I ’ process is one of the ways in which new chunks may be ). Additionally, a chunked 1994 ; Bybee et al. to eat at seven created. So, although chunks at each linguistic level can linguistic expression may further be subject to morpholog- lose information through grammaticalization, and although ical reduction, resulting in further loss of morphological (or they cannot regain it, a countervailing process exists by syntactic) elements. For instance, the demonstrative in that ” “ which complex chunks are constructed by gluing together English (e.g., that window ) lost the grammatical category 17 existing chunks. that of number ( s( ’ That is, in Bybee those vs. ) when it came to be used items “ ) phrase, 2002 plur sing ” For example, auxilia- the window/windows that is/ as a complementizer, as in that are used together fuse together. ). Finally, as noted 1993 (Hopper & Traugott are dirty ry verbs (e.g., to have, to go) can become fused with main verbs to create new morphological patterns, as in many earlier, frequently chunked elements are likely to become phonologically reduced, leading to the emergence of new Romance languages, in which the future tense is signaled fi x to the in fi by an auxiliary tacked on as a suf shortened grammaticalized forms, such as the phonetic nitive. In , , ). Thus, 1994 (Bybee et al. gonna into going to erosion of Spanish, the future tense endings -é , -ás , -á -emos , -éis derive from the present tense of the auxiliary haber , -án the Now-or-Never bottleneck provides a constant pressure , he , has ; and in French, ha , hemos , habéis , han namely, toward reduction and erosion across the different levels of , -ai the corresponding endings -as derive -on , -ez , -on , -a , linguistic representation, providing a possible explanation from the present tense of the auxiliary avoir , namely, ai , for why grammaticalization tends to be a largely unidirec- , ; 1999 ; Haspelmath 1994 tional process (e.g., Bybee et al. ). Such complex 1982 (Fleischman ont , avez , avon , a as new chunks are then subject to erosion (e.g., as is implicit ; Hopper & Traugott 1993 ). 2002 Heine & Kuteva in the example above, the Spanish for you Beyond grammaticalization, we suggest that language informal, plural change, more broadly, will be local at the level of individual rst will eat comeréis , rather than fi ∗ comerhabéis ; the is chunks. At the level of sound change, our perspective is con- syllable of the auxiliary has been stripped away). ; 1977 ; sistent with lexical diffusion theory (e.g., Wang 1969 Importantly, the present viewpoint is neutral regarding Wang & Cheng 1977 ), suggesting that sound change origi- the extent to which children are the primary source of inno- nates with a small set of words and then gradually spreads vation (e.g., Bickerton 1984 ) or regularization (e.g., to other words with a similar phonological make-up. The ) of linguistic material, although con- Hudson et al. 2005 extent and speed of such sound change is affected by a straints from child language acquisition likely play some number of factors, including frequency, word class, and pho- role (e.g., in the emergence of regular subject-object-verb ). ; Phillips 2002 nological environment (e.g., Bybee 2006 word order in the Al-Sayyid Bedouin Sign Language; Similarly, morpho-syntactic change is also predicted to be 2005 Sandler et al. ). In general, we would expect that mul- “ constructional diffu- local in nature: what we might call fl uence language change in parallel (for tiple forces in Accordingly, we interpret the cross-linguistic evi- ” sion. 2009 ; Hruschka et al. 2013 ), in- reviews, see Dediu et al. dence indicating the effects of processing constraints on 2011 ), lan- cluding sociolinguistic factors (e.g., Trudgill ; Kempson et al. 2004 grammatical structure (e.g., Hawkins guage contact (e.g., Mufwene 2008 ), and use of language ’ ;O 2001 , for a review) 2011 ; see Jaeger & Tily 2005 Grady 1987 as an ethnic marker (e.g., Boyd & Richerson ). as a process of gradual change over individual constructions, Because language change, like processing and acquisi- instead of wholesale changes to grammatical rules. Note, tion, is driven by multiple competing factors, which are am- though, that because chunks are not independent of one ed by cultural evolution, linguistic diversity will be the pli fi another but form a system within a given level of linguistic “ norm. Accordingly, we would expect few, if any, true ” lan- representation, a change to a highly productive chunk may guage universals to exist in the sense of constraints that can have cascading effects to other chunks at that level (and sim- be explained only in purely linguistic terms (Christiansen & ilarly for representations at other levels of abstraction). For ). Nonetheless, domain-general processing con- 2008 Chater example, if a frequently used construction changes, then cantly constrain the set of possible fi straints are likely to signi constructional diffusion could in principle lead to rapid, ). This picture is 2008 languages (see, e.g., Cann & Kempson and far-reaching, change throughout the language. consistent with linguistic arguments suggesting that there On this account, another ubiquitous process of language may be no strict language universals (Bybee ; Evans & 2009 change, regularization, whereby representations at a partic- Levinson 2009 ). For example, computational phylogenetic ular linguistic level become more patterned, should also be analyses indicate that word order correlations are lineage- fi ed by another of Kirby a piecemeal process. This is exempli ), shaped by particular histories speci fi c (Dunn et al. 2011 14 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

15 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Similarly, agglutinating languages, such as Turkish, of cultural evolution rather than following universal patterns chunk complex multimorphemic words using local group- as would be expected if they were the result of innate linguistic constraints (e.g., Baker c 2001 fi ing mechanisms that include formulaic morpheme expres- ) or language-speci ). Likewise, at higher levels of linguistic 2013 sions (Durrant ). Thus, the performance limitations (e.g., Hawkins 2009 process of piecemeal tinkering that drives item-based representation, verbs normally have only two or three argu- language change is subject to constraints deriving not only ments at most. Across linguistic theories of different per- from Chunk-and-Pass processing and multiple-cue integra- suasions, syntactic phrases typically consist of only a few constituents. Thus, the Now-or-Never bottleneck provides c trajectory of cultural evolu- fi tion but also from the speci tion that a language follows. More generally, in this a strong bias toward bounded linguistic units across various levels of linguistic representations. perspective, there is no sharp distinction between language evolution and language change: Language evolution is just the result of language change over a long timescale (see 6.1.2. The local nature of linguistic dependencies . Just as also Heine & Kuteva ), obviating the need for separate 2007 we have argued that Chunk-and-Pass processing leads to theories of language evolution and change (e.g., Berwick simple linguistic units with only a small number of compo- 18 1994 ). et al. ; Pinker 2002 ; Hauser et al. 2013 nents, so it produces a powerful tendency toward local de- pendencies. Dependencies between linguistic elements will primarily be adjacent or separated by only a few other elements. For example, at the phonological level, 6. Structure as processing fl processes are highly local, as re ected by data on coarticu- The Now-or-Never bottleneck implies, we have argued, lation, assimilation, and phonotactic constraints (e.g., Clark that language comprehension involves incrementally et al. 2007 ). Similarly, we expect word formation processes chunking linguistic material and immediately passing the to be highly local in nature, which is in line with a variety of result for further processing, and production involves a different linguistic perspectives on the prominence of adja- similar cascade of Just-in-Time processing operations in cency in morphological composition (e.g., Carstairs-Mc- the opposite direction. And language will be shaped ). Strikingly, ; Siegel 1978 Carthy 1992 ; Hay 2000 through cultural evolution to be easy to learn and process adjacency even appears to be a key characteristic of multi- by generations of speakers/hearers, who are forced to morphemic formulaic units in an agglutinating language chunk and pass the oncoming stream of linguistic material. ). 2013 such as Turkish (Durrant What are the resulting implications for the structure of lan- At the syntactic level, there is also a strong bias toward guage and its mental representation? In this section, we local dependencies. For example, when processing the sen- rst show that certain key properties of language follow nat- fi tence comprehenders ex- ” ... The key to the cabinets was “ urally from this framework; we then reconceptualize perience local interference from the plural cabinets , certain important notions in the language sciences. was although the verb needs to agree with the singular (Nicol et al. key ). Indeed, in- 1997 ; Pearlmutter et al. 1999 dividuals who are good at picking up adjacent dependen- 6.1. Explaining key properties of language cies among sequence elements in a serial-reaction time task also experience greater local interference effects in In nonlin- . 6.1.1. The bounded nature of linguistic units 2010 ). More- sentence processing (Misyak & Christiansen guistic sequential tasks, memory constraints are so severe over, similar local interference effects have been observed that chunks of more than a few items are rare. People typ- in production when people are asked to continue the above ically encode phone numbers, number plates, postal codes, sentence after (Bock & Miller 1991 cabinets ). and Social Security numbers into sequences of between More generally, analyses of Romanian and Czech two and four digits or letters; memory recall deteriorates (Ferrer-i-Cancho 2004 ) as well as Catalan, Basque, and rapidly for unchunked item-sequences longer than about ) point to a pressure Spanish (Ferrer-i-Cancho & Liu 2014 ), and memory recall typically 2000 four elements (Cowan toward minimization of the distance between syntactically breaks into short chunk-like phrases. Similar chunking pro- related words. This tendency toward local dependencies cesses are thought to govern nonlinguistic sequences of seems to be particularly strongly expressed in strict-word- ). As we have argued previously 1998 actions (e.g., Graybiel order languages such as English, but somewhat less so for in this article, the same constraints apply throughout lan- fl more exible languages such as German (Gildea & Tem- guage processing, from sound to discourse. perley ). However, the use of case marking in 2010 Across different levels of linguistic representation, units German may provide a cue to overcome this by indicating also tend to have only a few component elements. Even who does what to whom, as suggested by simulations of the though the nature of sound-based units in speech is theo- learnability of different word orders with or without case retically contentious, all proposals capture the sharply 2002 markings (e.g., Lupyan & Christiansen ; Van Ever- bounded nature of such units. For example, a traditional ). This highlights the importance not only of broeck 1999 perspective on English phonology would postulate pho- distributional information (e.g., regarding word order) but nemes, short sequences of which are grouped into syllables, also of other types of cues (e.g., involving phonological, with multisyllabic words being organized by intonational or semantic, or pragmatic information), as discussed perhaps morphological groupings. Indeed, the tendency previously. toward few-element units is so strong that long, nonsense We want to stress, however, that we are not denying the supercalifragilisticexpia- words with many syllables such as existence of long-distance syntactic dependencies; rather, is chunked successively, for example, as tentative- lidocious we are suggesting that our ability to process such depen- ly indicated: dencies will be bounded by the number of chunks that cali listic ][ fragi ]][[ docious ][ Super [[[ ]] ]][ ali ][ expi ]][[ can be kept in memory at a given level of linguistic BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 15 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

16 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language and decoded so that the symbolic informa- incrementally representation. In many cases, chunking may help to min- tion to be transmitted maps, fairly locally, to portions of imize the distance over which a dependency has to remain the acoustic signal. Thus, to an approximation, whereas in- in memory. For example, the use of personal pronouns dividual phonemes acoustically exhibit enormous contextu- can facilitate the processing of otherwise dif fi cult object al variation, diphones (pairs of phonemes) are a fairly stable relative clauses because they are more easily chunked acoustic signal, as evident by their use in tolerably good (e.g., ; Reali & Christiansen People [you know] are more fun speech synthesis and recognition (e.g., Jurafsky et al. ). Similarly, the processing of long-distance depen- 2007a ). Overall, then, each successive segment of the 2000 dencies is eased when they are separated by highly frequent analog acoustic input must correspond to a part of the sym- word combinations that can be readily chunked (e.g., Reali bolic code being transmitted. This is not because of consid- 2007b ). More generally, the Chunk-and- & Christiansen fi ciency but because of the erations of informational ef Pass account is in line with other approaches that assign ’ s processing limitations in encoding and decoding: brain processing limitations and complexity as primary con- cally, by the Now-or-Never bottleneck. fi speci straints on long-distance dependencies, thus potentially The need rapidly to encode and decode implies that providing explanations for linguistic phenomena, such as spoken language will consist of a sequence of short 1984 subjacency (e.g., Berwick & Weinberg ; Kluender & sound-based units (the precise nature of these units may Kutas 1993 ), island constraints (e.g., Hofmeister & Sag be controversial, and may even differ between languages, 2010 ), referential binding (e.g., Culicover 2013 ), and but units could include diphones, phonemes, mora, sylla- ’ 2013 scope effects (e.g., O Grady ). Crucially, though, as bles, etc.). Similarly, in speech production, the Now-or- we argued earlier, the impact of these processing con- Never bottleneck rules out planning and executing a long straints may be lessened to some degree by the integration articulatory sequence (as in a block-code used in communi- of multiple sources of information (e.g., from pragmatics, cation technology); rather, speech must be planned incre- discourse context, and world knowledge) to support the mentally, in the Just-in-Time fashion, requiring that the ongoing interpretation of the input (e.g., Altmann & Steed- speech signal corresponds to sequences of discrete ; Tanenhaus et al. 1995 2014 ; Heider et al. 1988 ). man sound-based units. 6.1.3. Multiple levels of linguistic representation . Speech allows us to transmit a digital, symbolic code over a serial, analog channel using time variation in sound pressure (or Our perspective has yet . 6.1.4. Duality of patterning using analog movements, in sign language). How might further intriguing implications. Because the Now-or- we expect this digital-analog-digital conversion to be Never bottleneck requires that symbolic information tuned, to optimize the amount of information transmitted? must rapidly be read off the analog signal, the number of The problem of encoding and decoding digital signals and in particular, – such symbols will be severely limited over an analog serial channel is well studied in communica- may be much smaller than the vocabulary of a typical tion theory (Shannon 1948 and, interestingly, the solu- – ) speaker (many thousands or tens of thousands of items). tions typically adopted look very different from those This implies that the short symbolic sequences into which employed by natural language. Crucially, to maximize the the acoustic signal is initially recoded cannot, in general, rate of transfer of information it is generally best to trans- be bearers of meaning; instead, the primary bearers of form the message to be conveyed across the analog signal meaning, lexical items, and morphemes, will be composed in a very nonlocal way. That is, rather than matching up out of these smaller units. portions of the information to be conveyed (e.g., in an en- Thus, the Now-or-Never bottleneck provides a potential gineering context, these might be the contents of a data- explanation for a puzzling but ubiquitous feature of human base) to particular portions of the analog signal, the best languages, including signed languages. This is duality of strategy is to encrypt the entire digital message using the patterning : the existence of (one or more) level(s) of sym- entire analog signal, so that the message is coded as a bolically encoded sound structure (whether described in block (e.g., MacKay 2003 ). But why is the engineering sol- terms of phonemes, mora, or syllables) from which the ution to information transmission so very different from level of words and morphemes (over which meanings are that used by natural language, in which distinct portions fi de ned) are composed. Such patterning arises, in the of the analog signal correspond to meaningful units in the present analysis, as a consequence of rapid online multilev- digital code (e.g., phonemes, words)? The Now-or-Never el chunking in both speech production and perception. In bottleneck provides a natural explanation. the absence of duality of patterning, the acoustic signal cor- A block-based code requires decoding a stored memory responding, say, to a single noun, could not be recoded in- trace for the entire analog signal (for language, typically, crementally as it is received (Warren & Marslen-Wilson acoustic) that is, the whole block. This is straightforward – but would have to be processed as a whole, thus – ) 1987 for arti fi cial computing systems, where memory interfer- dramatically overloading sensory memory. ence is no obstacle. But this type of block coding is, of It is, perhaps, also of interest to note that the other course, precisely what the Now-or-Never bottleneck rules domain in which people process enormously complex out. The human perceptual system must turn the acoustic – also typically consists of multiple – music acoustic input input into a (lossy) compressed form right away, or else layers of structure (notes, phrases, and so on, see, e.g., the acoustic signal is lost forever. Similarly, the speech pro- Lerdahl & Jackendoff 1983 ; Orwin et al. 2013 ). We may duction system cannot decide to send a single, lengthy conjecture that Chunk-and-Pass processing operates for analog signal, and then successfully reel off the lengthy cor- music as well as language, thus helping to explain why responding sequence of articulatory instructions, because our ability to process musical input spectacularly exceeds this will vastly exceed our memory capacity for sequences our ability to process arbitrary sequential acoustic material of actions. Instead, the acoustic signal must be generated (Clément et al. 1999 ). 16 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

17 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language We . 6.1.5. The quasi-regularity of linguistic structure interpretation (e.g., Ford et al. 1982 ; Kempson et al. have argued that the Now-or-Never bottleneck implies ). For example, in describing his incre- 2001 2010 ; Morrill that language processing involves applying highly local ) noted, Syntactic mental parser-interpreter, Pulman ( “ 1985 Chunk-and-Pass operations across a range of representa- information is used to build up the interpretation and to tional levels; and that language acquisition involves learning guide the parse, but does not result in the construction of to perform such operations. But, as in the acquisition of ” (p. 132). Steedman an independent level of representation other skills, learning from such speci fi c instances does not ( 2000 ) adopted a closely related perspective when intro- operate by rote but leads to generalization and hence mod- ducing his combinatory categorial grammar, which aims cation from one instance to another (Goldberg i fi 2006 ). to map surface structure directly onto logic-based semantic Indeed, such processes of local generalization are ubiqui- interpretations, given rich lexical representations of words tous in language change, as we have noted above. From that include information about phonological structure, syn- this standpoint, we should expect the rule-like patterns in tactic category, and meaning: “... syntactic structure is language to emerge from generalizations across speci fi c in- merely the characterization of the process of constructing , for an example of 2000 stances (see, e.g., Hahn & Nakisa a logical form, rather than a representational level of struc- ectional morphology in German); this approach to in fl (p. xi). Thus, in these ...” ture that actually needs to be built once entrenched, such rule-like patterns can, of course, accounts, the syntactic structure of a sentence is not explic- be applied quite broadly to newly encountered cases. itly represented by the language system but plays the role Thus, patterns of regularity in language will emerge of a processing trace of the operations used to create or ” “ locally and bottom-up, from generalizations across individ- Grady ’ interpret the sentence (see also O ). 2005 ual instances, through processes of language use, acquisi- To take an analogy from constructing objects, rather than tion, and change. sentences, the process by which components of an IKEA- We should therefore expect language to be quasi-regular history ” fl style “ at-pack cabinet are combined provides a across phonology, morphology, syntax, and semantics to – (combine a board, handle, and screws to construct the be an amalgam of overlapping and partially incompatible doors; combine frame and shelf to construct the body; patterns, involving generalizations from the variety of lin- fi combine doors, body, and legs to create the nished guistic forms from which successive language learners gen- cabinet). The history by which the cabinet was constructed eralize. For example, English past tense morphology has, fi nished item, may thus reveal the intricate structure of the ed – famously, the regular ending, a range of subregularities but this structure need not be explicitly represented during → sprang , but fl ing → → ( sing → sang , ring rang , spring read off ” the syn- the construction process. Thus, we can “ → ung ; with brought fl bring ; and even wrung → wring , tactic structure of a sentence from its processing history, re- some verbs having the same present and past tense vealing the syntactic relations between various constituents → split , hit → hit , cost → cost forms, e.g., ; whereas split (likely with a “ fl at ” structure; Frank et al. 2012 ). Syntactic am others differ wildly, e.g., ; → was ; see, e.g., go went → representations are neither computed during comprehen- 1982 Bybee & Slobin ; Pinker & Prince 1988 ; Rumelhart sion nor in production; instead, there is just a history of pro- & McClelland ). This quasi-regular structure (Seiden- 1986 cessing operations. That is, we view linguistic structure as ) does indeed seem to be wide- 1989 berg & McClelland processing history. Importantly, this means that syntax is spread throughout many aspects of language (e.g., not privileged but is only one part of the system and it – 2006 ). Culicover 1999 ; Goldberg ; Pierrehumbert 2002 is not independent of the other parts (see also ). Fig. 2 From a traditional, generative perspective on language, In this view, a rather minimal notion of grammar speci- such quasi-regularities are puzzling: Natural language is as- es how the chunks from which a sentence is built can be fi similated, somewhat by force, to the structure of a formal composed. There may be several, or indeed many, orders – language with a precisely de ned syntax and semantics fi in which such combinations can occur, just as operations the ubiquitous departures from such regularities are for furniture assembly may be carried out somewhat exi- fl mysterious. From the present standpoint, by contrast, the bly (but not completely without constraints it might turn – quasi-regular structure of language arises in rather the out that the body must be screwed together before a same way that a partially regular pattern of tracks were shelf can be attached). In the context of producing and un- laid down across a forest, through the overlaid traces of derstanding language, the process of construction is likely nding the path of local an endless number of agents fi “ is ” component to be much more constrained: Each new least resistance; and where each language processing presented in turn, and it must be used immediately or it episode tends to facilitate future, similar, processing epi- will be lost. Moreover, viewing Chunk-and-Pass processing ’ s choice of path facilitates the use sodes, just as an animal as an aspect of skill acquisition, we might expect that the of that path for animals that follow. precise nature of chunks may change with expertise: Highly overlearned material might, for example, gradually come to be treated as a single chunk (see Arnon & Christi- 6.2. What is linguistic structure? ansen, submitted, for a review). Crucially, as with other skills, the cognitive system will Chunk-and-Pass processing can be viewed as having an in- ), gener- 1984 cognitive miser tend to be a (Fiske & Taylor teresting connection with traditional linguistic notions. In ally following a principle of least effort (Zipf 1949 ). As pro- both production and comprehension, the language system cessing proceeds, there is an intricate interplay of top-down creates a sequence of chunking operations, which link dif- and bottom-up processing to alight on the message as ferent linguistic units together across multiple levels of rapidly as possible. The language system need only con- structure. That is, the syntactic structure of a given utter- struct enough chunk structure so that, when combined fl ance is re ected in its processing history. This conception with prior discourse and background knowledge, the in- is reminiscent of previous proposals, in which syntax is tended message can be inferred incrementally. This viewed as a control structure for guiding semantic BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 17 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

18 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language grammatical structure separate from processing. Learning observation relates to some interesting contemporary lin- learning the grammar. to process is guistic proposals. For example, from a generative perspec- tive, Culicover ( 2013 ) highlighted the importance of incremental processing, arguing that the interpretation of 7. Conclusion a pronoun depends on which discourse elements are avail- able when it is encountered. This implies that the linear The perspective developed in this article sees language as order of words in a sentence (rather than hierarchical struc- fi c processing episodes, composed of a myriad of speci ture) plays an important role in many apparently grammat- where particular messages are conveyed and understood. ical phenomena, including weak cross-over effects in Like other action sequences, linguistic acts have their struc- referential binding. From an emergentist perspective, ture in virtue of the cognitive mechanisms that produce and O ’ Grady (2015) similarly emphasized the importance of perceive them. We have argued that the structure of lan- real-time processing constraints for explaining differences guage is, in particular, strongly affected by a severe limita- exive pronouns ( fl in the interpretation of re himself, them- tion on human memory: the Now-or-Never bottleneck. ) and plain pronouns ( selves him, them ). The former are re- Sequential information, at many levels of analysis, must solved locally, and thus almost instantly, whereas the rapidly be recoded to avoid being interfered with or over- antecedents for the latter are searched for within a written by the deluge of subsequent material. To cope broader domain (causing problems in acquisition because with the Now-or-Never bottleneck, the language system of a bias toward local information). chunks new material as rapidly as possible at a range of in- More generally, our view of linguistic structure as pro- creasingly abstract levels of representation. As a conse- cessing history offers a way to integrate the formal linguistic quence, Chunk-and-Pass processing induces a multilevel contributions of construction grammar (e.g., Croft ; 2001 structure over linguistic input. The history of the process Goldberg ) with the psychological insights from 2006 of chunk building can be viewed as analogous to a usage-based approaches to language acquisition and pro- shallow surface structure in linguistics, and the repertoire cessing (e.g., Bybee & McClelland 2005 ; Tomasello of possible chunking mechanisms and the principles by fi ). Speci 2003 cally, we propose to view constructions as fi which they can be combined can be viewed as de ning a 19 computational procedures – specifying how to process grammar. Indeed, we have suggested that chunking proce- and produce a particular chunk – where we take a broad dures may be one interpretation of the constructions that view of constructions as involving chunks at different are at the core of linguistic theories of construction levels of linguistic representation, from morphemes to mul- grammar. More broadly, the Now-or-Never bottleneck tiword sequences. A procedure may integrate several dif- promises to provide a framework within which to reinte- ferent aspects of language processing or production, grate the language sciences, from the psychology of lan- including chunking acoustic input into sound-based units guage comprehension and production, to language (phonemes, syllables), mapping a chunk onto meaning (or acquisition, language change, and evolution, to the study vice versa), incorporating pragmatic or discourse informa- of language structure. fi tion, and associating a chunk with speci c arguments (see Grady ’ ). As with other skills (e.g., Heath- also O 2005 ; 2013 ACKNOWLEDGMENTS 2000 ; Newell & Rosenbloom 1981 ), there will be cote et al. We would like to thank Inbal Arnon, Amui Chong, Brandon practice effects, where the repeated use of a given chunk Conley, Thomas Farmer, Adele Goldberg, Ruth Kempson, results in faster processing and reduced demands on cogni- Stewart McCauley, Michael Schramm, and Julia Ying, as well as fi tive resources, and with suf cient use, leading to a high seven anonymous reviewers for comments on previous versions 1988 degree of automaticity (e.g., Logan ; see Bybee & of this article. This work was partially supported by BSF grant 2005 , for a linguistic perspective). McClelland number 2011107, awarded to MHC (and Inbal Arnon), and ERC grant 295917-RATIONALITY, the ESRC Network for In terms of our previous forest track analogy, the more a Integrated Behavioural Science, the Leverhulme Trust, particular chunk is comprehended or produced, the more Research Councils UK Grant EP/K039830/1 to NC. entrenched it becomes, resulting in easier access and faster processing; tracks become more established with NOTES ciently frequent use, adjacent tracks may use. With suf fi 1. Levinson ( ) used the term bottleneck in a different, 2000 blend together, creating somewhat wider paths. For though interestingly related, way to refer to an asymmetry example, the frequent processing of simple transitive sen- between the speed of language production and comprehension tences, processed individually as multiword chunks, such “ bottle- processes. Slower production processes are in this sense a as “ I want milk, ” ”“ I want candy, might fi rst lead to a neck ” to communication. wider track involving the item-based template I want X. ” “ 2. Moreover, the rate of information transfer per unit of time I Repeated use of this template along with others (e.g., “ appears to be quite similar across spoken and signed languages ”“ ) might eventually give rise to a more ab- ” I see X like X, 1972 (Bellugi & Fischer ). Indeed, information transfer also appears to be roughly constant across a variety of spoken languag- (a NVN stract transitive generalization along the lines of 2011 es (Pellegrino et al. ). highway in terms of our track analogy). Similar proposals Note that the Chunk-and-Pass framework does not take a 3. for the emergence of basic word order patterns have is necessarily computed coded meaning “ stand on whether ” ’ Grady been proposed both within emergentist (e.g., O (for discussion, see Noveck & ” enriched meaning “ before ) and generative perspectives 2005 2013 ; Tomasello 2003 ; Reboul ). Indeed, to the extent that familiar can ” chunks “ 2008 2001 ). Importantly, however, (e.g., Townsend & Bever be associated with standardized enriched meanings, “ ” gestalts just as with generalizations in perception and motor skills, then the coded meaning could, in principle, be bypassed. So, the grammatical abstractions are not explicitly represented might be directly interpreted could you pass the salt for example, but result from the merging of item-based procedures for as a request, bypassing any putative initial representation as a yes/ chunking. Consequently, there is no representation of may no question. Similarly, an idiom such as kick the bucket BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 18 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

19 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language die directly be associated with the meaning . The same appears to cer rather Simon ), the language system may be a satis fi 1956 to the such as ” be true for non-idiomatic compositional chunks “ ). 2002 than a maximizer (Ferreira et al. ). This viewpoint is compatible with a edge (Jolsvai et al. 2013 Some classes of learning algorithm can be converted from 12. variety of perspectives in linguistics that treat multiword chunks incremental “ ” online “ or ” form, including con- “ to ” batch-learning ; Gold- 2001 in the same way as traditional lexical items (e.g., Croft nectionist learning (Saad 1998 ) and the widely used expectation- berg 2006 ). maximization (EM) algorithm (Neal & Hinton ), typically 1998 Our framework is neutral about competing proposals con- 4. with diminished learning performance. How far it is possible to cerning how precisely production and comprehension processes create viable online versions of existing language acquisition algo- 2014 are entwined (e.g., Cann et al. 2012 ; Dell & Chang ; Picker- rithms is an important question for future research. ). – but see Chater et al. ( ing & Garrod 2013a ) 2016 Nonetheless, as would be expected from a statistical model 13. The phrase ” just-in-time has been used in the engineering “ 5. of learning, some early productivity is observed at the word level, where words are fairly independent and may not form reliable fi eld of speech synthesis in a similar way (Baumann & Schlangen s determiner-noun combinations, Valian ’ chunks (e.g., children ). 2012 2013 ; Yang 2009 et al. ; though see McCauley & Christiansen 6. It is likely that some more detailed information is also main- , for evidence that such productivity is 2013 ; Pine et al. 2014b tained and accessible to the language system. For example, Levy not driven by syntactic categories). ) found more eye-movement regressions when people 2009 et al. ( ” triggers “ Interestingly, though, the notion of 14. in the princi- compared The coach smiled at the player tossed the Frisbee read fi ) potentially 1981 ples and parameters model (Chomsky ts with with The coach smiled toward the player tossed the Frisbee . They 1985 ; the online learning framework outlined here (Berwick has contextually plausible neighbors argued that this is because at Fodor ): Parameters are presumed to be 1989 ; Lightfoot 1998 toward ) whereas and , as ( does not. The regression suggests that, set when crucial “ triggering ” information is observed in the on encountering processing dif fi culty, the language system ’ s input (for discussion, see Gibson & Wexler 1994 ; Niyogi child ” checks back “ to see whether it recognized an earlier word cor- fi ). However, this model is very dif ; Yang 1996 & Berwick cult 2002 but does so only when other plausible alternative interpre- rectly – to reconcile with incremental processing and, moreover, it does tations may be possible. This pattern requires that higher-level not provide a good fi t with empirical linguistic data (Boeckx & representations do not throw away lower-level information entire- 2013 Leivada ). ly. Some information about the visual (or perhaps phonological) 15. – Note that the stability plasticity dilemma arises in connec- form of at and toward must be maintained, to determine tionist modelling: models that globally modify their weights, in re- whether or not there are contextually plausible neighbors that sponse to new items, often learn only very slowly, to avoid might be the correct interpretation. This pattern is compatible with prior items (e.g., French “ catastrophic interference ” 1999 ; with the present account: Indeed, the example of SF in section ; Ratcliff McCloskey & Cohen 1989 1990 ). Notably, though, cata- 2 indicates how high-level organization may be critical to retaining strophic interference may occur only if the old input rarely reap- lower-level information (e.g., interpreting random digits as pears later in learning. running times makes them more memorable). Although Givón ( 1979 ) discussed how syntactic construc- 16. 7. It is conceivable that the presence of ambiguities may be a tions might derive from previous pragmatic discourse structure, necessary component of an ef fi cient communication system, in he did not coin the phrase ” today ’ “ s syntax is yesterday ’ s discourse. thus becoming ambig- which easy-to-produce chunks are reused – Instead, it has been ascribed to him through paraphrasings of his – in a trade-off between ease of production and dif fi culty of uous 2012 comprehension (Piantadosi et al. ). from Givón ” maxim that ’ s morphology is yesterday ’ s syntax today “ 8. Although our account is consistent with the standard linguis- ), an idea he attributed to the Chinese philosopher Lao Tse. 1971 ( Apparent counterexamples to the general unidirectionality 17. tic levels, from phonology through syntax to pragmatics, we envis- of grammaticalization – such as the compound verb to up the ante ner-grained levels, fi age that a complete model may include 2000 ) – (e.g., Campbell are entirely compatible with the present distinguishing, for example, multiple levels of discourse represen- approach: They correspond to the creation of new idiomatic tation. One interesting proposal along these lines, developed from chunks, from other pre-existing chunks, and thus do not violate the work of Austin ( fi 1962 ) and Clark ( 1996 ), is outlined in En eld our principle that chunks generally decay. ( 2013 ). It remains, of course, of great interest to understand the 18. Note that, in particular, the present viewpoint does not rule 9. biological evolutionary history that led to the cognitive pre-requi- out the possibility that some detailed information is retained in sites for the cultural evolution of language. Candidate mecha- ; Pierre- 1998 processing (e.g., Goldinger ; Gurevich et al. 2010 nisms include joint attention, large long-term memory, ). But such detailed information can be retained humbert 2002 sequence processing ability, appropriate articulatory machinery, only because the original input has been chunked successfully, auditory processing systems, and so on. But this is the study not rather than being stored in raw form. of language evolution but of the evolution of the biological precur- 10. It is often dif fi cult empirically to distinguish bottom-up and ; for an opposing sors of language (Christiansen & Chater 2008 top-down effects. Bottom-up statistics across large corpora of low- perspective, see Pinker & Bloom 1990 ). level representations can mimic the operation of high-level repre- “ computational procedure The term ” is also used by Sagarra 19. sentations in many cases; indeed, the power of such statistics is ), but they viewed such procedures as de- and Herschensohn ( 2010 central to the success of much statistical natural language process- in tandem with the growing grammatical competence ” “ veloping ing, including speech recognition and machine translation (e.g., ) discussed “ (p. 2022). Likewise, Townsend and Bever ( 2001 fre- Manning & Schütze 1999 ). However, rapid sensitivity to back- quency-based perceptual templates that assign the initial ground knowledge and nonlinguistic context suggests that there (p. 6). However, they argued that this only results in ” meaning is also an important top-down fl ow of information in human lan- ” pseudosyntactic structure, which is later checked against a com- “ guage processing (e.g., Altmann 2004 ; Altmann & Kamide ; 1999 plete derivational structure. In contrast, we argue that these com- ) as well as in cognition, more generally 1993 Marslen-Wilson et al. putational procedures are all there is to grammar, a proposal that (e.g., Bar ). 2004 ) notion of 2013 ; 2005 s( ’ dovetails with O computational “ Grady ’ rst time Strictly speaking, “ good-enough fi 11. ” may be a more but with a focus on chunking in our case. routines, ” appropriate description. As may be true across cognition (e.g., 19 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

20 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language order to manage new world interaction like the human language Open Peer Commentary for a neuronal reuse perspective). 2010 (see Anderson According to the ideomotor recycling theory, the expected con- sequences of abstract meanings are simulated in an anticipative way in order to retrieve the appropriate and concrete words and ≈ sentences during the production ( action) and the comprehen- The ideomotor recycling theory for language perception) of language. Importantly, and as suggested sion ( ≈ it ought to be possible to select a response by Greenwald ( 1972 ), “ doi:10.1017/S0140525X15000680, e63 very directly, perhaps totally bypassing any limited-capacity process, by presenting a stimulus that closely resembles the re- Arnaud Badets (p. 52). Consequently, we can easily ’ sponse s sensory feedback ” Aquitaine (UMR ’ CNRS, Institut de Neurosciences Cognitives et Intégratives d clarify the close alignment of linguistic meanings during a dialogue 5287), Université de Bordeaux, France. 2013a ). In this between two persons (see also Pickering & Garrod [email protected] context, an utterance is represented by the expected consequenc- http://www.incia.u-bordeaux1.fr/spip.php?article255 ≈ es of abstract meanings for speaker, which can be processed ( stimulus processing) very rapidly, as expected meanings for the For language acquisition and processing, the ideomotor theory Abstract: subsequent utterance in listener ( ≈ sensory feedback). For the predicts that the comprehension and the production of language are functionally based on their expected perceptual effects (i.e., linguistic ideomotor recycling theory, there are common representational – events). This anticipative mechanism is central for action perception formats between shared abstract meanings during a dialogue. behaviors in human and nonhuman animals, but a recent ideomotor Finally, there is another piece, but indirect, of evolutionary evi- recycling theory has emphasized a language account throughout an dence for an ideomotor account in language processing. Indeed, evolutionary perspective. there has been a co-evolution of cooperation for Gärdenfors ( 2004 ) “ ” about future goals and symbolic communications (p. 243). Corballis The Now-or-Never bottleneck, according to Christiansen & Chater ( 2009 ) suggested the same mutual mechanism between the capacity (C&C), is, in a broad-spectrum view, a convincing constraint in lan- to envision the far future and language processing. Consequently, if perception – guage acquisition and processing. From general action the ideomotor recycling theory can explain some parts of human lan- principles, this bottleneck deals with a myriad of linguistic inputs guage (Badets et al. ), it could be argued that the same recycled 2016 to recode them by chunks as rapidly as possible. Accordingly, lan- mechanism can also be in charge for the representation of the future guage processing involves a prediction (or anticipation) mechanism ).Inthisview,BadetsandOsiurak (seealsoBadets&Rensonnet 2015 that encodes new linguistic feature very rapidly. I agree with this ) have recently suggested that such an anticipative mechanism 2015 ( general position, but the described predictive mechanism in could be central for the representation of future scenarios. From dif- charge of such anticipation does not seem theoretically conclusive ferentparadigmsanddomains liketooluse,actionmemory,prospec- 2016 in regard to a recent ideomotor recycling theory (Badets et al. ). tive memory, or motor skill learning, compelling evidence highlights Sensorimotor and predictive mechanisms have been clearly the- that theideomotor mechanismcan predictfar-future-situatedevents ; Wolpert ; Shin et al. 1971 orized in the last 40 years (Adams 2010 ), lan- cient behaviors. For Corballis ( 2009 fi to adapt different and ef et al. 2011 ). For example, as suggested in the Now-or-Never bot- guage has the capacity to improve the representation of such future tleneck framework, the computational modeling approaches of scenarios. For instance, in telling a person what will happen next motor control assume that two kinds of internal models are in week (associated with predicted storm weather) if he or she practices charge of producing goal-directed behaviors (e.g., Wolpert et al. sailing, it is possible to form a coherent accident representation that rst is the forward model, which predicts the expected 2001 fi ). The can be avoided in the future. This mutual ideomotor mechanism sensory consequences as a function of the motor command. The between language and the capacity to envision the future gives an second is the inverse model, a mechanism that transforms the ex- evident evolutionary advantage for humans. pected sensory consequences into motor commands. Basically, the To conclude, it seems that, from an evolutionary perspective, the inverse model is related to a motor plan to reach the expected goal, u- ideomotor mechanism has been recycled in order to spread its in fl and the forward model is in charge of monitoring an action by com- ence on human behavior beyond simple motor acts. The ideomotor paring the expected sensory consequences to the actual sensory recycling theory can apply to language processing and other higher consequences. Differences can cause an adaptation to the motor cognitive functions as foresight. For language, common representa- mechanism in order to attain the goal. For ef cient regulations fi tional formats between shared and expected abstract meanings of goal-directed actions, the forward and inverse models are during a dialogue can explain very rapid and ef fi cient language skills. equally central. However, this theoretical framework assumes an equivalent weight for the representation underlying the expected perceptual effects and the representation of the behavior to achieve these effects. In contrast, the ideomotor theory does not Language processing is not a race against time deny the involvement of a movement system but assumes a primary role for expected perceptual events, which could be doi:10.1017/S0140525X15000692, e64 central in language production and comprehension (Badets et al. 2016 see also Kashima et al. 2013 ). a b Giosuè Baggio and Carmelo M. Vicario Ideomotor theory predicts that behaviors are functionally a Language Acquisition and Language Processing Lab, Department of ; 1970 linked to their sensory consequences (Greenwald Language and Literature, Norwegian University of Science and Technology, 2001 Hommel et al. ). The core mechanism is that actions are rep- b School of Psychology, Bangor University, Bangor, 7491 Trondheim, Norway; resented mainly by the expected perceptual consequences (or Gwynedd LL57, 2AS, United Kingdom. effects) they aim to produce in the environment. From an evolu- [email protected] [email protected] perception tionary account, it is obvious that such an action – http://www.ntnu.edu/employees/giosue.baggio mechanism dedicated to situated interaction is present for mil- http://www.bangor.ac.uk/psychology/people/pro fi les/carmelo_vicario. ). 2001 lions of years, since ancestral animals (Cisek & Kalaska php.en Moreover, Badets et al. ( 2016 have recently suggested “ we can easily assume that there is a reuse of cognitive function from mech- Abstract: We agree with Christiansen & Chater (C&C) that language anisms of simple motor control to more elaborated communication processing and acquisition are tightly constrained by the limits of ” and language processing (p. 11). In thistheory based onthe concept sensory and memory systems. However, the human brain supports a ), the ideomotor mechanism is re- of exaptation (Gould & Vrba 1982 range of cognitive functions that mitigate the effects of information processing bottlenecks. The language system is partly organised around cycled (i.e., exapted) during evolution or normal development in BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 20 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

21 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language these moderating factors, not just around restrictions on storage and parallel with language growth (Brinton et al. 1986a; Saxton et al. computation. ). 2005 1990 ; Yonata 1999 ; Tomasello et al. Finally, C&C claim that different levels of linguistic representa- Christiansen & Chater ’ s) theory builds upon the notion ’ s (C&C tion are mapped onto a hierarchy of cortical circuits. Each circuit c responses to that linguistic structures and processes are speci fi chunks and passes elements at increasingly larger timescales. But the limitations of sensory and memory systems. Language relies research indicates the picture is rather more complicated. Most – incrementality, hierarchical representa- on three main strategies brain regions can work at multiple timescales. Frontal and tempo- tion, and prediction for coping with those limitations. We think – ral language cortices can represent and manipulate information this list is incomplete, and that it should also include inference, the delivered at different rates, and over intervals of different duration ability to read and write, pragmatic devices for coordinating 2016 (Fuster ; Ding et al. 2011 ; Pallier et al. ). Furthermore, 1995 speaker and hearer, and the mutual tuning of speech comprehen- the left parietal lobe is a critical region for both temporal process- sion and production systems in the brain. We aim to show that this ing (e.g., Vicario et al. 2013 ; see Wiener et al. 2010 for a review) is more than merely adding items to a list: Our argument points to and amodal (spoken and written) word comprehension (Mesulam a different balance between restrictions on storage and computa- 1998 ). The left inferior parietal cortex is a core area for speech tion, and the full range of cognitive functions that have a mitigat- comprehension and production because of its connections with es all ing effect on them. Indeed, C&C fi ’ s concise inventory satis wide portions of Wernicke ’ s (superior temporal cortex [STC]) constraints, but no more: Language processing remains a race s (left inferior frontal gyrus) areas (Catani et al. ’ and Broca against time. We argue instead that the moderating factors ). The temporal cortex processes speech at different scales: 2005 widely offset the constraints, suggesting a different picture of lan- at shorter windows (25 50 ms) in the left STC, and at longer – guage than the one envisaged by C&C. windows (150 ; 2005 250 ms) in the right STC (Boemio et al. – Hearing is the main input channel for language. C&C discuss ). This asymmetry 2012 ; Giraud & Poeppel 2007 Giraud et al. constraints on auditory analysis, but not the mechanisms by might result from mutual tuning of primary auditory and motor which the brain recovers lost information. Sensory systems rely 2010 ). If speech cortices in the left hemisphere (Morillon et al. heavily on perceptual inference. A classic example is phonemic production and perception indeed share some of the constraints ): Deleting auditory segments restoration (PhR) (Warren 1970 described by C&C, then neither system should be expected to from speech reduces comprehension, but if the deleted lag behind the speed or the resolution of the other. segment is replaced with noise, comprehension is restored. PhR A more comprehensive theory of language processing would is not the creation of an illusory percept but the reorganisation arise from taking into account constraints of the kind discussed of input: Because PhR arises in auditory cortices, it requires by C&C, plus a wider array of cognitive mechanisms mitigating that energy be present at the relevant frequencies (Petkov et al. the effect of these constraints, including (but not limited to) ). Short segments of unprocessed speech are not necessarily 2007 Chunk-and-Pass processing and its corollaries. The human lost but may often be reconstructed. Probabilistic and logical in- s ubiquitous capacity to infer, recover, and re-access unpro- ’ brain ference to richer structures based on sparse data is available at cessed, lost, or degraded information is as much part of the 1994 ; all levels of representation in language (Graesser et al. design “ ” of the language system as incrementality, hierarchical ). 1990 Swinney & Osterhout representation, and prediction. The joint effect of these strategies Vision is next in line in importance. In C&C ’ s theory, vision has is to make language processing much less prone to information largely a supporting role: It may provide cues that disambiguate loss and much less subject to time pressures than C&C seem to speech, but it is itself subject to constraints like auditory process- imply. ing. However, the human brain can translate information across modalities, such that constraints that apply to one modality are weaker or absent in another. This applies to some innovations in recent human evolutionary history, including reading and writing. By the nature of texts as static visual objects, the effects Pro and con: Internal speech and the evolution of temporal constraints on information intake may be reduced of complex language or abolished. This is not to say there are no temporal constraints on reading: Processing the ne temporal structure of speech is fi doi:10.1017/S0140525X15000709, e65 crucial for reading acquisition (Dehaene 2009 ; Goswami 2015 ). Written information, though, is often freely accessible in ways Christina Behme that auditory information is not. We acquire a portion of our vo- Department of Philosophy, Dalhousie University, Halifax, NS B3H 4R2, cabulary and grammar through written language, and we massive- Canada. ly use text to communicate. Therefore, it seems that C&C ’ s [email protected] “ premise that applies to spoken language ” eeting fl language is Abstract: The target article by Christiansen & Chater (C&C) offers an only, and not to language in general. integrated framework for the study of language acquisition and, possibly, But even auditory information can often be freely re-accessed. a novel role for internal speech in language acquisition. However, the Misperception and the loss of information in conversation pose “ Now-or-Never bottleneck raises a paradox for language evolution. It ” coordination problems. These can be solved by deploying a seems to imply that language complexity has been either reduced over number of pragmatic devices that allow speaker and hearer to time or has remained the same. How, then, could languages as complex I just returned “ realign: echo questions are one example (A: as ours have evolved in prelinguistic ancestors? Linguistic Platonism ” B: “ from Kyrgyzstan. You just returned from where? ” ). Informa- could offer a solution to this paradox. tion is re-accessed by manipulating the source of the input, with (implicit) requests to slow down production, or to deliver a new “ Christiansen & Chater (C&C) promise to provide an integrated token of the same type. Language use relies on a two-track framework for explaining many aspects of language structure, ac- ): (1) communication about 1996 “ and (2) ” stuff system (Clark quisition, processing, and evolution that have previously been communication about communication. Track 2 allows us to treated separately ” (sect. 1, para. 5). This integration results in recover from failure to process information on Track 1, and to a plausible language acquisition model. Citing a wealth of com- focus attention on what matters from Track 1. Signals from both pelling empirical evidence, C&C propose that language is tracks are subject to bottlenecks and constraints; nonetheless, learned like other skills and that linguistic abilities are not isolat- Track 2 alleviates the effects of restrictions on Track 1 processing. ed biological traits, as suggested by Hauser et al. ( 2014 ), but con- Interestingly, infants are able to engage in repair of failed messag- tinuous with other motor and cognitive skills. Rejecting the ). This capacity develops early in childhood, in 1986 es (Golinkoff Chomskyan dogma that language learning is effortless and 21 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

22 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ ; 2012 ), ; 1975 (virtually) instantaneous (Chomsky 1980 ; 1986 learned must have evolved. Decades ago it was suggested that C&C propose that “ the Now-or-Never bottleneck requires that many of the problems that bedevil Chomskyan linguistics can be language acquisition be viewed as a type of skill learning, such eliminated if one adopts linguistic Platonism and draws a distinc- as learning to drive, juggle, play the violin, or play chess. Such tion between the knowledge speakers have of their language and practicing the skill, using skills appear to be learned through the languages that speakers have knowledge of. Platonism consid- ...” online feedback during the practice itself (sect. 4, para. 4). ers as distinct (1) the study of semantic properties and relations This view integrates language naturally within cognition and like ambiguity, synonymy, meaningfulness, and analyticity, and does not require the postulation of domain-speci fi c cognitive (2) the study of the neuropsychological brain-states of a person s account casts doubt on Chomsky ’ modules. Additionally, C&C s ’ who has acquired knowledge about these semantic properties claim that the fact that we frequently talk silently to ourselves 2014a ; Katz 1984 ; (e.g., Behme ; ; 1998 ; Katz & Postal 1991 1996 supports his view that the function of language is not communi- ; Postal ). In such a view, languages and 2003 ; Neef 2014 2009 2012 cation (e.g., Chomsky 2000 ; 2002 ; ). A more parsimonious brains that have acquired knowledge of languages are two distinct explanation would assume that frequent internal monologues ontological systems. arose from the habitual ne-tuning by [silently] fi “ practice ” ( In addition to eliminating many problems for contemporary doing) of language learning. C&C argue that “ we should linguistics, such a view also might resolve the language evolution replaying ’ expect the exploitation of memory to require ‘ paradox because languages have an independent existence, and learned material, so that it can be reprocessed ” (sect. 4.1, para. only human cognitive capacity evolves. It might be argued that c evidence that such 5). They cite substantial neuroscienti fi the epistemology of linguistic Platonism is hopeless. Although replay occurs and propose that dreaming may have a related this is not the place to defend linguistic Platonism, one should re- function. Given that especially the integration of available infor- member that in mathematics it is widely accepted that the mation across different types and levels of abstraction and the an- number systems exist independently of human brains and ticipation of responses might require more practice than the human culture, and are discovered, just as are other objects of sci- motor-execution of (audible) speech, silent self-conversation enti c discovery. It has been argued that if one accepts the pos- fi might initially provide an additional medium for language learn- sibility of mathematical realism, there is no a priori reason to ing. Later in life, such internal monologue could be recruited to reject the possibility of linguistic realism (e.g., Behme ; 2014b the function Chomsky envisioned. Future research could 1998 Katz ). Before rejecting linguistic Platonism out of hand, uncover at what age children begin using internal monologue, one ought to remember that to what degree second-language acquisition is assisted by learn- For psychology, AI, and the related cognitive sciences, the question of ers switching their internal monologue from L1 to L2, and what a grammar is a theory of is important because its answer can whether the lack of internal monologue (e.g., Grandin ) 2005 s work ends and ’ resolve troublesome issues about where the linguist fl uency in production. has negative effects on the cognitive scientist ’ s begins. A Platonist answer to this question Although C&C ’ s account offers an attractive language acquisi- would clearly divide linguistics and cognitive sciences so that the waste- tion model, it seems to create a paradox for language evolution. ful and unnecessary quarrels of the past can be put behind us. (Katz C&C argue that there are strong pressures toward simpli fi cation 1984 , p. 28) fi cial toy lan- and reduction. For example, when a very simple arti collapsed into just a few different forms “ guage was simulated, it ed, fi that allowed for systematic, albeit semantically underspeci ACKNOWLEDGMENTS generalization In natural language, however, the pressure ... I thank Vyvyan Evans, David Johnson, and Paul Postal for comments on toward reduction is normally kept in balance by the need to main- earlier drafts and accept full responsibility for remaining errors. tain effective communication ” (sect. 5, para. 4). This observation raises the following problem: For an existing, fairly complex system, simpli fi cation may indeed lead to the kinds of changes “ C&C discuss (e.g., that chunks at each level of linguistic struc- uences on language fl Socio-demographic in discourse, syntax, morphology, and phonology ture – are poten- – [sect. 5, para. 5]). But in this view ” tially subject to reduction structure and change: Not all learners are the there is a strong pressure toward simpli cation and virtually no fi same possibility of increasing complexity. Yet it is not clear why the lan- guage of our distant ancestors would have been more complex doi:10.1017/S0140525X15000710, e66 than or at least as complex as modern languages. It has been a a b argued convincingly that the complexity of grammar actually Till Bergmann, Rick Dale, and Gary Lupyan a needed to support most daily activities of humans living in Cognitive and Information Sciences, University of California, Merced, b complex contemporary societies is substantially less than that ex- – Department of Psychology, University of Wisconsin Merced, CA 95343; Madison, Madison, WI 53706. , p. 19), 2009 hibited by any contemporary human language (Gil [email protected] www.tillbergmann.com and it seems implausible that existing language complexity is func- http://cognaction.org/rick/ [email protected] tionally motivated. http://sapir.psych.wisc.edu/ [email protected] If the Now-or-Never bottleneck has the power C&C attribute to it, it must have constrained language learning and use for our The Now-or-Never bottleneck has important consequence for Abstract: distant ancestors in the same way as it does for us. Presumably understanding why languages have the structures they do. However, not these ancestors had cognitive capacities that were not superior who addressed by C&C is that the bottleneck may interact with is doing to ours, and their culture would have imposed even fewer the learning: While some languages are mostly learned by infants, others demands for linguistic complexity than contemporary culture. have a large share of adult learners. We argue that such socio- So how could they have evolved a highly complex language s thesis. demographic differences extend and qualify C&C ’ system that in turn could be reduced to provide the cognitive foundation for grammaticalization? C&C suggest analogies We wholeheartedly agree with Christiansen & Chater (C&C) that between language and other cognitive processes (e.g., vision). “ acquiring a language is learning to process ” (sect. 5, para. 3) and This is problematic because the visual system evolved to perceive that “ there is no representation of grammatical structure separate objects that exist independently of this system. On purely natural- from processing ” (sect. 6.2, para. 6). We also agree with C&C ’ s ist accounts, languages have no existence independent of human more general thesis that the structure of language cannot be un- brains or human culture. Therefore, both the cognitive abilities derstood without taking into account the constraints and biases underwriting linguistic competence and the language that is of the language learners and users. Although the Now-or-Never BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 22 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

23 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language cognitive bottleneck is an unavoidable constraint on language obligatory markings (i.e., decrease in redundancy) and increase comprehension and production, fully understanding its conse- ). 2015 in compositionality (Lupyan & Dale quences requires taking into account socio-demographic realities, how This reasoning helps explain the Now-or-Never bottleneck namely who is doing the language learning. ” fi obligatori “ can create (sect. 5, para. 8) and also why some cation C&C write that “ Language will be shaped by the linguistic pat- languages have more obligatory markings than other languages. nd easiest to acquire and process (sect. 5, para. 3), ” terns learners fi In summary, although we agree with C&C that multiple forces “ may importantly depend on is doing the easiest who but what is in fl (sect. 5, para. 9), we em- ” uence language change in parallel learning. Some languages are learned exclusively by infants and phasize the force constituted by the learning community. Lan- used in small, culturally homogeneous communities. For c (cognitive) learning constraints and guages adapt to the speci fi example, half of all languages have fewer than 7,000 speakers. communicative needs of the learners and speakers. Other languages have substantial populations of non-native speak- ers and are used in large, culturally and linguistically heteroge- neous communities. For example, at present, about 70% of English speakers are non-native speakers (Gordon 2005 ). Now or ... not later: Perceptual data are The socio-demographic niche in which a language is learned immediately forgotten during language and used can in fl uence its grammar insofar as different kinds of processing learners have different learning biases. Languages with many adult learners may adapt to their socio-demographic niche by es- doi:10.1017/S0140525X15000734, e67 cult for adults to process. Indeed, as Lupyan chewing features dif fi b,c,d a ) have shown in an analysis of more than 2,000 lan- 2010 and Dale ( Klinton Bicknell, T. Florian Jaeger, and guages, languages spoken in larger and more diverse communities b,d Michael K. Tanenhaus (those that tend to have more non-native learners) have simpler a Department of Linguistics, Northwestern University, Evanston, IL 60208-0854; morphologies and fewer obligatory markings (see also Bentz & b Department of Brain and Cognitive Sciences, University of Rochester, 2013 Winter ). In contrast, languages used in a socio-demographic c Department of Computer Science, University of Rochester, NY 14627-0268; d niche containing predominantly infant learners tend to have many Department of Linguistics, University Rochester, Rochester, NY 14627-0226; more obligatory markings – for example, they are more likely to of Rochester, Rochester, NY 14627-0096. encode tense, aspect, evidentiality, and modality as part of the [email protected] [email protected] grammar, and to have more complex forms of agreement (Dale [email protected] ; Trudgill 2001 2012 ; see also Dahl & Lupyan ; McWhorter 2004 http://faculty.wcas.northwestern.edu/kbicknell/ ). 2011 http://www.bcs.rochester.edu/people/fjaeger/ fl Such in uences of the socio-demographic environment on lan- https://www.bcs.rochester.edu/people/mtan/mtan.html ’ guage structure are important caveats to C&C s thesis because the Abstract: Christiansen & Chater (C&C) propose that language Now-or-Never bottleneck, although present in all learners, comprehenders must immediately compress perceptual data by depends on the knowledge that a learner brings to the lan- “ chunking ” them into higher-level categories. Effective language guage-learning task. understanding, however, requires maintaining perceptual information s account, successful language processing depends on On C&C ’ long enough to integrate it with downstream cues. Indeed, recent recoding the input into progressively higher-level (more abstract) results suggest comprehenders do this. Although cognitive systems are chunks. As an analogy, C&C give the example of how remember- undoubtedly limited, frameworks that do not take into account the tasks ing strings of numbers is aided by chunking (re-representing) that these systems evolved to solve risk missing important insights. numbers as running times or dates (sect. 2, para. 7). But, of course, this recoding is only possible if the learner knows about Christiansen & Chater (C&C) propose that memory limitations reasonable running times and the format of dates. The ability to force language comprehenders to compress perceptual data im- remember the numbers depends on the ability to chunk them, mediately, forgetting lower-level information and maintaining and the ability to chunk them depends on prior knowledge. only higher-level categories ( “ ” ). Recent data from chunks In the case of language learning, recoding of linguistic input is speech perception and sentence processing, however, demon- s “ current achieved by applying the learner model of the lan- ’ strate that comprehenders can maintain ne-grained lower-level fi ” (sect. 4.1, para. 3) and further constrained by memory guage perception information for substantial durations. These results and other domain-general processes. But both the learner ’ s lan- directly contradict the central idea behind the Now-or-Never bot- guage model and domain-general constraints vary depending on tleneck. To the extent that the framework allows them, it risks be- who the learner is. coming so exible that it fails to make substantive claims. On the fl Infants come to the language-learning task with a less devel- other hand, these results are predicted by existing frameworks, oped memory and ability to use pragmatic and other extralin- such as bounded rationality, which are thus more productive gure out the meaning of an utterance. As a guistic cues to fi frameworks for future research. We illustrate this argument consequence, the Now-or-Never bottleneck is strongly in with recent developments in our understanding of a classic place. The language adapts through increased grammaticaliza- result in speech perception: categorical perception. tion that binds units of meaning more tightly, thereby increas- Initial results in speech perception suggested that listeners are ing redundancy. For example, grammatical gender of the sort insensitive to fi ne-grained within-category differences in voice found in many Indo-European languages increases redundancy onset time (VOT, the most important cue distinguishing voiced by enforcing agreement of nouns, adjectives, and pronouns, “ p bill in ” versus ” versus and voiceless stop consonants, e.g., “ b arguably – making one more predictable from the other and – ” voiced “ pill ), encoding only whether a sound is ” voiceless “ or reducing the memory load required for processing the ). Subsequent work demonstrated sensitivity 1957 (Liberman et al. utterances. ; Pisoni & Tash 1977 to within-category differences (Carney et al. Adults come to the language-learning task with more developed fi ), with some 1974 ndings interpreted as evidence this sensitivity memories, and ability for pragmatic inference, but at the same rapidly decays (e.g., Pisoni & Lazarus 1974 ). Such a picture is time they are biased by pre-existing chunks that may interfere very similar to the idea behind Chunk-and-Pass: Listeners fi with chunks that most ef ciently convey the meaning in the rapidly chunk phonetic detail into a phoneme, forgetting the sub- new language. The greater memory capacities and ability to use categorical information in the process. contextual and other pragmatic cues to infer meanings, may Although it may perhaps be intuitive, given early evidence that relax the Now-or-Never bottleneck, nudging grammars toward perceptual memory is limited (Sperling 1960 ), such discarding of morphological simpli fi cation with its accompanying decrease in subcategorical information would be surprising from the perspective 23 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

24 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language of bounded rationality: Information critical to the successful recog- and rationality predicts this property of language processing nition of phonetic categories often occurs downstream in the speech offers an explanation for it. ). Effective language under- signal (Bard et al. 1988 ; Grosjean 1985 The Now-or-Never bottleneck makes novel, testable predic- standing thus requires maintaining and integrating graded support ’ tions only insofar as it makes strong claims about comprehenders ’ for different phonetic categories provided by a sound s acoustics (in)ability to maintain lower-level information beyond the “ now. ” (its subcategorical information) with information present in the The studies we summarized above are inconsistent with this claim. downstream signal. Indeed, more recent work suggests that com- Similarly inconsistent is evidence from research on reading sug- prehenders do this. For example, within-category differences in u- fl gesting that lower-level information survives long enough to in not immediately forgotten but are still available down- VOT are 2009 ; Levy et al. ence incremental parsing (Levy 2011 ). ; 2009 stream at the end of a multisyllabic word (McMurray et al. Moreover, the history of research on categorical perception pro- , for further discussion of right-context effects). 2010 see Dahan vides a word of caution: Rather than focusing too much on cogni- Of particular relevance is a line of work initiated by Connine et al. tive limitations, it is essential for researchers to equally consider ( 1991 , Expt. 1). They manipulated VOT in the initial segment of the computational problems of language processing and how com- target words ( dent/tent ) and embedded these words in utterances prehender goals can be effectively achieved. with downstream information about the word ’ s identity (e.g., ). They found that lis- ” “ The dent/tent in the fender ” or “... forest teners can maintain subcategorical phonetic detail and integrate it with downstream information even beyond word boundaries . Linguistic representations and memory Chunk-and-Pass does not predict these results. Recognizing this, C&C allow violations of Now-or-Never, as long as “ such online architectures: The devil is in the details ’ [are] highly local, because raw perceptual ‘ right-context effects fi ed ” (sect. 3.1, para. 7). input will be lost if it is not rapidly identi doi:10.1017/S0140525X15000746, e68 This substantially weakens the predictive power of their proposal. Dustin Alfonso Chacón, Shota Momma, and Colin Phillips On the other hand, Connine et al. ’ s results do seem to support this quali cation. They reported that subcategorical phonetic fi Department of Linguistics, University of Maryland, College Park, MD 20742. – detail (a) was maintained only 3 syllables downstream, but not 6 [email protected] [email protected]mail.com [email protected] 8, and (b) was maintained only for maximally ambiguous tokens. ling.umd.edu/colin Recent work, however, points to methodological issues that call Abstract: Attempts to explain linguistic phenomena as consequences of ). Re- 2015 both of these limitations into question (Bicknell et al. fi memory constraints require detailed speci cation of linguistic garding (a), Connine et al. allowed listeners to respond at any representations and memory architectures alike. We discuss examples of – 8 syllable condition, point in the sentence: On 84% of trials in the 6 supposed locality biases in language comprehension and production, and listeners categorized the target word prior to hearing the relevant their link to memory constraints. Findings do not generally favor ). Therefore, these responses right-context (e.g., fender or forest ’ Christiansen & Chater s) approach. We discuss connections to ’ s (C&C could not probe access to subcategorical information. In a replica- debates that stretch back to the nineteenth century. tion that avoided this problem, we found that subcategorical detail decays more slowly than Connine et al. ’ s analysis would suggest: It is important to understand how language is shaped by cognitive – Subcategorical detail was maintained for at least 6 8 syllables constraints, and limits on memory are natural culprits. In this s ’ (the longest range investigated). Regarding (b), Connine et al. regard, Christiansen & Chater (C&C) join a tradition in language analysis was based on proportions, rather than log-odds. Rational in- ; Wundt 1978 research that has a long pedigree (Frazier & Fodor tegration of downstream information with subcategorical informa- 1904 ) and to which we are sympathetic. C&C s model aims to in- ’ tion should lead to additive effects in log-odds space (which, in tegrate an impressive range of phenomena, but the authors play proportional space, then are largest around the maximally ambigu- fast and loose with the details; they mischaracterize a number of ous tokens; Bicknell et al. 2015 ), This is indeed what we found: The phenomena; and key predictions depend on auxiliary assumptions effect of downstream information on the log-odds of hearing dent that are independent of their model. An approach that takes the ) was constant across the entire VOT range. In short, subca- tent (or details of linguistic representations and memory architectures tegorical information is maintained longer than previous studies more seriously will ultimately be more fruitful. We illustrate suggested, not immediately discarded by chunking (see also using examples from comprehension and production. 2013 ). Moreover, maintenance is not limited to Szostak & Pitt C&C propose that comprehenders can maintain only a few low- special cases; it is the default (Brown et al. 2014 ). level percepts at once and must therefore quickly encode higher- Clearly, language processing is subject to cognitive limitations; order, abstract representations. They argue that this explains the many – if not most – theories of language processing acknowledge pervasive bias for shorter dependencies. However, memory repre- this. In its general form, the Now-or-Never bottleneck thus em- sentations are more than simple strings of words that quickly bodies an idea as old as the cognitive sciences: that observable vanish. Sentences are encoded as richly articulated, connected behavior and the cognitive representations and mechanisms un- representations that persist in memory, perhaps without explicit derlying this behavior are primarily driven by a priori (static/ encoding of order, and memory access is similarly articulated limitations . This contrasts with another view: Cog- fi xed) cognitive ). As evidence of their model, C&C cite 2006 (Lewis et al. agree- cient fi ef nitive and neural systems have evolved solutions to the cabinets are on the The key to the in sentences like ment attraction computational tasks agents face (Anderson 1990 ). Both views . These errors are common in production and often go unno- table have been productive, providing explanations for perception, ticed in comprehension, and it is tempting to describe them in motor control, and cognition, including language (and C&C ). But this is inac- “ proximity concord (Quirk et al. 1972 terms of ” have contributed to both views). A number of proposals have curate. Agreement attraction is widespread in cases where the dis- tied together these insights. This includes the idea of bounded ra- The tractor is further from the verb than the true subject, as in tionality, that is, rational use of limited resources given task con- praise so highly will win (Bock & musicians who the reviewer ; Neumann et al. ; 2009 2014 straints (Howes et al. ; Simon 1982 1991 “ ). Attraction is asymmetrical, yielding illusions of Miller 2009 ; Feldman et al. 2010 for language: e.g., Bicknell & Levy ; grammaticality “ but not ” (Wagers ” illusions of ungrammaticality Kleinschmidt & Jaeger ; Kuperberg & Jaeger 2015 2016 ; Lewis ), and depends on whether the distractor is syntactically 2009 et al. 2013 et al. ). Chunk-and-Pass is a step backward because it blurs ). These facts are surprising if attrac- 2010 “ active ” (Franck et al. the connection between these two principled dimensions of tion re fl ects simple recency, but they can be captured in a theory development. Consequently, it fails to predict systematic model that combines articulated linguistic representations with a maintenance of subcategorical information, whereas bounded content-addressable memory architecture (Dillon et al. 2013 ; BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 24 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

25 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ 2003 ’ s ). Hence, agreement attraction McElree et al. fi ts C&C cs of the memory system fi eld does too. However, the speci fi broadest objective, deriving attraction from memory constraints, and linguistic representations are essential for an empirically in- but only if suitably detailed commitments are made. formative theory, and they are often validated by the counterintu- C&C also endorse the appealing view that locality constraints in itive facts that they explain. : Ross ) can be reduced to memory- ” island effects syntax ( 1967 “ driven locality biases in the processing of fi ller-gap dependencies (Kluender & Kutas ). Details matter here, too, and they 1993 suggest a different conclusion. When linear and structural locality Gestalt-like representations hijack Chunk-and- nal languages such as Japanese, it becomes fi diverge, as in head- Pass processing clear that the bias for shorter ller-gap dependencies in process- fi ing is linear, whereas grammatical locality constraints are struc- doi:10.1017/S0140525X15000758, e69 ; Chacón et al., submitted; Omaki tural (Aoshima et al. 2004 2014 et al. ). Magda L. Dumitru The moral that we draw from these examples is that each reduc- Department of Cognitive Science, Macquarie University, Sydney NSW 2109, tionist claim about language must be evaluated on its own merits Australia. ). 2013 (Phillips [email protected] Turning to production, C&C argue that incrementality and lo- Abstract: Christiansen & Chater (C&C) make two related and somewhat ect severe memory constraints, suggesting that we fl cality biases re contradictory claims, namely that the ever abstract language speak “ into the void. ” This amounts to what is sometimes called representations built during Chunk-and-Pass processing allow for ever (Ferreira & Swets radical incrementality 2002 ). It implies that greater interference from extra-linguistic information, and that it is sentence production involves word-by-word planning that is nevertheless the language system that re-codes incoming information tightly synchronized with articulation – for example, planning is into abstract representations. I analyse these claims and discuss evidence just-in-time , leading to a bias for local dependencies between suggesting that Gestalt-like representations hijack Chunk-and-Pass ect words. However, this view of production does not re fl processing. memory constraints alone, and it is empirically unwarranted. Radical incrementality carries a strong representational as- Christiansen & Chater (C&C) argue that higher-level chunks pre- sumption whose problems were pointed out in the late nineteenth serve information from lower-level chunks albeit in a much im- century. The philologist Hermann Paul, an opponent of Wilhelm poverished form. However, they also suggest that there is no Wundt, argued that a sentence is essentially an associative sum of obligatory relationship between low-level chunks and high-level clearly segmentable concepts, each of which can trigger articula- chunks. To support their claim, they cite the case of SF (cf. Erics- tion in isolation. Radical incrementality requires this assumption, son et al. 1980 ), who could accurately recall as many as 79 digits as it presupposes the isolability of each word or phrase in a sen- after grouping them in locally meaningful units (e.g., historical tence at all levels of representation. Memory constraints alone dates and human ages). Moreover, they argue that the Now-or- s ar- ’ do not require this assumption, and so there is a gap in C&C Never bottleneck forbids broad parallelism in language at the gument that memory constraints entail radical incrementality. sentences). expense of avoiding ambiguities (e.g., ” garden path “ Indeed, Wundt was already aware of memory limitations, and In brief, C&C propose that chunks are only locally coherent and yet he adopted the contrasting view that sentence planning in- that their gist, however contradictory, is being safely kept track volves a successive scanning ( apperception ) of a sentence that is of at higher levels. Unfortunately, the authors remain silent simultaneously present in the background of consciousness about the mechanisms underlying higher-level representation 1904 ). The historical debate illustrates during speech (Wundt formation. that radical incrementality turns on representational assumptions C&C also declare themselves agnostic about the nature of rather than directly following from memory limitations. chunks. Indeed, although there is ample psychological evidence Empirically, radical incrementality has had limited success in ac- for the existence of chunks in various types of experimental counting for production data. Three bodies of data that C&C cite data, from pause durations in reading to naive sentence diagram- turn out to not support their view. First, the scope of planning at fi ming, chunks remain notoriously dif cult to de fi ne. However, we higher levels (e.g., conceptual) can span a clause (Meyer 1996 ; have reasons to reject the possibility, which follows naturally from Smith & Wheeldon 1999 ). Also, recent evidence suggests that lin- the Chunk-and-Pass framework, that chunks are arbitrary and guistic dependencies can modulate the scope of planning (Lee et al. may depend exclusively on memory limitations. To wit, chunks ; Momma et al. s time, ’ , in press). Second, since Wundt 2013 2015 correspond most closely to intonational phrases (IPs) (cf. Gee & availability effects on word order have not led researchers to ), which, in turn, are hard to capture by grammat- 1983 Grosjean for an accessible in- 2012 assume radical incrementality (see Levelt “ ical rules. For example, the sentence This is the cat / that chased s views). Bock ( ’ troduction to Wundt ) emphasized that avail- 1987 the rat / that ate the cheese ” contains three IPs (separated by ability effects on order result from the tendency for accessible slashes) that fail to correspond to syntactic constituents (noun words to be assigned a higher grammatical function (e.g., phrases or verb phrases). Yet IPs are not entirely free of structure, subject). In languages where word order and the grammatical func- as they must begin at the edge of a syntactic constituent and end tional hierarchy dissociate, availability effects support the grammat- before or at the point where a syntactic constituent ends (cf. Jack- ical function explanation rather than radical incrementality 2007 ). Moreover, although a given utterance can be carved endoff (Christianson & Ferreira ’ s claim, 2005 ). Third, contrary to C&C up in several ways (hence, contain a variable number of IPs), carv- early observations about speech errors indicated that exchange ings are not arbitrary and license only certain IP combinations and errors readily cross phrasal and clausal boundaries (Garrett 1980 ). not others. We may therefore conclude that IPs and correspond- C&C could argue that their view is compatible with many of t well with each other) fi ing chunks must be globally coherent (i.e., these data; memory capacity at higher levels of representation is and depend on the meaning conveyed. left as a free parameter. But this is precisely the limitation of Furthermore, I believe that chunking is driven not by memory c commit- their model: Speci fi c predictions depend on speci fi limitations nor by language structures, but by an overall need for ments. Radical incrementality is certainly possible in some cir- 2014 coherence or meaningfulness (cf. Dumitru ). Indeed, evi- cumstances, but it is not required, and this is unexpected under dence from memory enhancement techniques suggests that s view that speaking reduces to a chain of word productions ’ C&C chunking must rely on global coherence. So, for example, that are constrained by severe memory limitations. memory contest champions who use the so-called mind palace To conclude, we af rm the need to closely link language pro- fi technique (e.g., Yates 1966 ) often achieve impressive results. cesses and cognitive constraints, and we suspect the rest of the The method requires them to commit to long-term memory a BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 25 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

26 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Signed and spoken languages emerge, change, are acquired, Abstract: vivid image associated with each item to be remembered (e.g., and are processed under distinct perceptual, motor, and memory faces, digits, and lists of words) as well as a familiar walk constraints. Therefore, the Now-or-Never bottleneck has different through the palace where these images are stored at precise loca- fi rami cations for these languages, which are highlighted in this tions. Whenever necessary, contestants can retrace the walk commentary. The extent to which typological differences in linguistic through the palace (i.e., rely on global coherence) to recall a structure can be traced to processing differences provides unique huge number of unrelated facts. evidence for the claim that structure is processing. I also claim that coherence is grounded in a model of reality that people instantly build when recalling items or understanding lan- Christiansen & Chater (C&C) make it clear that the consequences guage based on their experience with frequent patterns of percep- of the Now-or-Never bottleneck for language are not speech-spe- 2008 ). Indeed, as shown in Altmann tion and action (Barsalou c. This commentary highlights how and why signed and spoken fi ci 2002 ), for instance, people ( 2009 ) and in Altmann and Kamide ( languages respond differently to the limitations imposed by the use available lexical information at the earliest stages of processing bottleneck. C&C argue that the Now-or-Never bottleneck arises to anticipate upcoming words. Furthermore, as reported in from general principles of perceptuo-motor processing and Kamide et al. ( 2003 ), people target a larger sentential context memory, and both have different properties for visual-manual during online processing, hence would look more readily toward and aural-oral languages, which lead to adaptations and preferenc- The man will taste the “ ...” but a glass of beer when hearing c to each language type. fi es that are speci , The girl will taste the ... for in- ” towards candy when hearing “ The vocal articulators (lips, tongue, larynx) are smaller and stance, although the verb itself combines equally well with quicker than the hands and arms, and the auditory system is gen- candy. ” “ Subsequently, Dumitru and Taylor ” and with “ beer erally more adept at temporal processing than the visual system, cue knowledge or ) reported that disjunction words like 2014 ” ( “ which is better at spatial processing. These perceptual and about expected argument structure and sense depending on motoric differences exert distinct pressures and affordances grounded sentential context. More important, language process- when solving the problems presented by the Now-or-Never bot- fl ect knowledge of the world that goes beyond ing may re tleneck. As a consequence, signed and spoken languages prefer ’ s awareness and beyond language structures (cf. people different chunking strategies for structuring linguistic informa- ). In particular, when understanding conjunc- Dumitru et al. 2013 tion. At the phonological level, spoken languages prefer what tion and disjunction expressions, people rapidly establish ground- chunking, whereas signed languages serial could be called ed connections between the two items mentioned (i.e., the chunking. For example, for spoken languages, spatial prefer concepts evoked by the nouns linked by conjunction or by disjunc- single-segment words are rare and multisegment words are tion) in the form of Gestalts. common, but the reverse pattern holds for sign languages (Bren- Accordingly, people shifted their gaze faster from the picture of ). Oversimplifying here, consonants and vowels consti- tari 1998 an ant to the picture of a cloud when hearing “ Nancy examined an tute segment types for speech, while locations and movements ant and a cloud ” Nancy examined an ant or a “ than when hearing ). Single- constitute segment types for sign (e.g., Sandler 1986 cloud ; they could instantly evoke a single Gestalt in conjunction ” segment spoken words are rare because they are extremely situations (where they usually select both items mentioned) and short and generally limited to the number of vowels in the lan- two Gestalts in disjunction situations (where they usually select guage. Single consonants violate the Possible-Word Constraint one of the items). As expected, their attention shifted faster ), which also applies to sign language (Orfanidou 1997 (Norris et al. between two object parts (in this case, two representations be- et al. ). Single-segment signs are not problematic because 2010 longing to the same Gestalt) than between two objects (two rep- – fi other phonological information gura- for example, hand con resentations belonging to different Gestalts). Subsequent work tion can be produced (and perceived) simultaneously with a – 2014 ) con fi by Dumitru ( rmed that Gestalts generate perceptual large number of possible single location or movement segments. compatibility effects such that visual groupings of a particular Multisegment (> three) and multisyllabic signs are rare in part set of stimuli (e.g., two different-coloured lines) had complemen- because the hands are relatively large and slow articulators, and tary effects on validation scores for conjunction descriptions as this limits the number of serial segments that can be quickly opposed to disjunction descriptions. Importantly, language users chunked and passed on to the lexical level of representation. need not be aware of the dynamics of the concept-grounding Distinct preferences for serial versus spatial chunking are also process (i.e., why they shift their gaze between stimuli at a found at the morphological level. Spoken languages show a certain speed), and there are no language-related constraints fi xation (speci general preference for linear af fi xation) fi cally, suf that might explain these differences in behaviour. over nonconcatenative processes such as reduplication or tem- To summarise, I have questioned the proposal by C&C that 1985 platic morphology (Cutler ). In contrast, linear af fi xation Chunk-and-Pass processing is exclusively driven by memory con- (particularly for in ectional morphology) is rare across sign lan- fl straints and obeys the rules of the language system. Instead, I dis- guages, and simultaneous, nonconcatenative morphology is the cussed recent evidence suggesting that chunking is driven by a 2005 norm. Aronoff et al. ( ) attributed the paucity of linear mor- need for global coherence manifested as Gestalt-like structures, phology to the youth of sign languages but acknowledged that which in turn underlie memory organisation and mirror real- processing constraints imposed by modality also shape this prefer- world phenomena. I further suggested that Gestalts hijack cally, the ability of the visual system fi ). Speci 1995 ence (Emmorey Chunk-and-Pass processing when they generate online effects to process spatially distributed information in parallel, the slow ar- that are not language-related. ticulation rate of the hands, and limits on working memory all con- spire to induce sign languages to favor simultaneous over sequential morphological processes. In fact, when the linear mor- phology of a spoken language is implemented in the visual-manual modality, as in Manually Coded English (MCE), deaf children Consequences of the Now-or-Never bottleneck who have no exposure to a natural sign language spontaneously for signed versus spoken languages create simultaneous morphology to mark verb arguments xes in MCE are ). In addition, linear manual suf fi (Supalla 1991 doi:10.1017/S0140525X1500076X, e70 often incorrectly analyzed by children as separate signs because prosodically and perceptually they do not pattern like bound mor- Karen Emmorey phemes (Supalla & McKee 2002 ). School of Speech, Language, and Hearing Sciences, San Diego State Although the architecture of the memory system is parallel for University, San Diego, CA 92120. ), 1998 1997 ; signed and spoken languages (Wilson & Emmorey [email protected] immediate memory for sequences of items has consistently http://emmoreylab.sdsu.edu/ BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 26 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

27 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ , inter 1975 been found to be superior for speech (Bellugi et al. immediately. Hence, it is still an open empirical question alia). Hall and Bavelier ( 2009 ) demonstrated that the serial span whether poor performance in explicit recall tasks provides discrepancy between speech and sign arises during perception severe constraints on processing and learning. and encoding, but not during recall, where sign actually shows We note, in passing, that even if relevant forms of memory were an advantage (possibly because visual feedback during signing short-lived, this would not necessarily be a bottleneck. Mecha- does not interfere with the memory store, unlike auditory feed- nisms to make representations last longer – such as self-sustained ). The source of back during speaking; Emmorey et al. 2009 activity – are well documented in many brain regions (Major & these differences is still unclear, but the short-term memory ca- Tank ), and one might assume that memories can be 2004 – 5 items) is typical of a variety of types of pacity for sign (4 longer-lived when this is adaptive. Short-lived memories might memory (Cowan ), and thus what needs to be explained is 2000 thus be an adaptation rather than a bottleneck (e.g., serving to why the memory capacity for speech is unusually high. reduce information load for various computations). Problematic “ implications. ” C&C use the NNB to advance the Because sign languages emerge, change, are acquired, and are cally, the skill of parsing fi following view: Language is a skill (speci processed under distinct memory and perceptuo-motor con- predictively); this skill is what children acquire (rather than some s con- ’ straints, they provide an important testing ground for C&C theory-like knowledge); and there are few if any restrictions on troversial proposals that learning to process is learning the linguistic diversity. C&C ’ s conclusions do not follow from the grammar and that linguistic structure is processing history. Typo- NNB and are highly problematic. Below, we discuss some of logical differences between the structure of signed and spoken the problematic inferences regarding processing, learning, and languages may be particularly revealing. Can such structural dif- evolution. ferences be explained by distinct processing adaptations to the Regarding processing, C&C claim that the NNB implies that Now-or-Never bottleneck? For example, given the bottleneck knowledge of language is the skill of parsing predictively. There pressures, one might expect duality of patterning to emerge is indeed ample evidence for a central role for prediction in quickly in a signed language, but recent evidence suggests that ), but this is not a consequence of the parsing (e.g., Levy 2008 it may not (Sandler et al. ). Could this be because the 2011 NNB: The advantages of predictive processing are orthogonal to in different “ visual-manual and auditory-oral systems are lossy ” the NNB, and, even assuming the NNB, processing might still ways or because chunking processes differ between modalities? occur element by element without predictions. C&C also claim s claim that Given C&C ’ there is no representation of grammati- “ that the NNB implies a processor with no explicit representation cal structure separate from processing (sect. 6.2, para. 6), it is ” of syntax (other than what can be read off the parsing process as a critical to determine whether the differences and the common- – trace). It is unclear what they actually mean with this claim, alities – between signed and spoken languages can be traced to though. First, if C&C mean that the parser does not construct features of processing. full syntactic trees but rather produces a minimum that allows semantics and phonology to operate, they just echo a view dis- cussed by Pulman ( ) and others. Although this view is an 1986 open possibility, we do not see how it follows from the NNB. Linguistics, cognitive psychology, and the Second, if C&C mean that the NNB implies that parsing does Now-or-Never bottleneck not use explicit syntactic knowledge, this view is incorrect: Many s algorithm, incremental parsing algorithms (e.g., LR, Earley ’ doi:10.1017/S0140525X15000953, e71 CKY) respect the NNB by being incremental and not needing b a to refer back to raw data (they can all refer to the result of Ansgar D. Endress and Roni Katzir a earlier processing instead) and yet make reference to explicit Department of Psychology, City University London, London EC1V 0HB, b syntax. Finally, we note that prediction-based, parser-only Department of Linguistics and Sagol School of United Kingdom; models in the literature that do not incorporate explicit represen- Neuroscience, Tel Aviv University, Ramat Aviv 69978, Israel. tations of syntactic structure (e.g., Elman 1990 ; McCauley & [email protected] [email protected] 2011 Christiansen ) fail to explain why we can recognize unpredict- Christiansen & Chater (C&C) if linguistic “ s key premise is that ’ Abstract: able sentences as grammatical (e.g., Evil unicorns devour ” information is not processed rapidly, that information is lost for good xylophones ). (NNB), C&C ” Now-or-Never bottleneck “ (sect. 1, para. 1). From this Regarding learning, C&C claim that the NNB is incompatible wide-reaching and fundamental implications for language “ derive with approaches to learning that involve elaborate linguistic processing, acquisition and change as well as for the structure of knowledge. This, however, is incorrect: The only implication of language itself (sect. 2, para. 10). We question both the premise and ” eeting, any fl the NNB for learning is that if memory is indeed the consequentiality of its purported implications. learning mechanism must be online rather than batch, relying only on current information. But online learning does not rule Christiansen & Chater (C&C) base the Problematic premises. out theory-based models of language in any way (e.g., Börschinger Now-or-Never bottleneck (NNB) on the observation that ). In fact, some have argued that online variants of 2011 & Johnson sensory memory disappears quickly in explicit memory tasks. theory-based models provide particularly good approximations to rst, that not all forms of explicit memory are short- fi We note, ). 2010 empirically observed patterns of learning (e.g., Frank et al. lived. For example, children remember words encountered Regarding the evolution of language (which they con fl ate with ; Markson & Bloom 1978 once after a month (Carey & Bartlett the biological evolution of language), C&C claim that it is item- 1997 ). More important, it is by no means clear that explicit based and gradual, and that linguistic diversity is the norm, with memory is the (only) relevant form of memory for language pro- few if any true universals. However, how these claims might cessing and acquisition, nor how quickly other forms of memory follow from the NNB is unclear, and C&C are inconsistent with decay. For example, the perceptual learning literature suggests the relevant literature. For example, language change has been that learning can occur even in the absence of awareness of the argued to be abrupt and nonlinear (see Niyogi & Berwick 2003 stimuli (Seitz & Watanabe ; Watanabe et al. 2001 ) and some- 2009 ), often involving what look like changes in abstract principles 1985 ). Similarly, visual times has long-lasting effects (Schwab et al. rather than concrete lexical items. As for linguistic diversity, C&C memories that start decreasing over a few seconds can be stabi- repeat claims from Christiansen and Chater ( 2008 ) and Evans and lized by presenting items another time (Endress & Potter 2014). 2009 ), but those works ignore the strongest typological Levinson ( At a minimum, then, such memory traces are long-lasting patterns revealed by generative linguistics. For example, no enough for repeated exposure to have cumulative learning effects. known language allows for a single conjunct to be displaced in a Information that is not even perceived is thus used for learning question (Ross 1967 ): We might know that Kim ate peas and and processing, and some forms of memory do not disappear BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 27 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

28 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language something is, but in no yesterday and wonder what that something discussed our work demonstrating that ambiguous relative language can we use a question of the form * What did Kim eat fi nitively attached into the matrix structure clauses are often not de Why did peas and yesterday? to inquire about it. Likewise, in if a failure to attach has no interpretive consequences (Swets et al. John wonder who Bill hit? , one can only ask about the cause of 2008 ; cf. Payne et al. 2014 ). Very much in line with C&C, Swets 1982 ; Rizzi the wondering, not of the hitting (see Huang 1990 ). et al. observed that people who are asked detailed comprehension Typological data thus reveal signi fi cant restrictions on linguistic questions probing their interpretation of the ambiguous relative diversity. nitive attachments, but those asked only fi clause make de Language is complex. Our efforts to comprehend Conclusion. fi cial features of the sentence shallow questions about super it are served better by detailed analysis of the cognitive mecha- seem to leave the relative clause unattached – that is, they under- nisms at our disposal than by grand theoretical proposals that specify. This fi nding fi right “ s discussion of ’ ts neatly with C&C ignore the relevant psychological, linguistic, and computational “ can be broadly con- ” where here ” right context context effects, distinctions. strued to mean the follow-on comprehension question that in fl uences the interpretation constructed online. An important ACKNOWLEDGMENTS difference, however, emerges as well, and here we believe the We thank Leon Bergen, Bob Berwick, Tova Friedman, and Tim GE framework has some advantages over Now-or-Never as a ’ Donnell. O broad model of comprehension: Our framework predicts that the language user ’ s task will have a strong effect on the composi- “ tion of chunks ” and the interpretation created from them (cf. 2015 2011 Christianson & Luke ). We have ; Lim & Christianson reported these results in production as well, demonstrating that Is Now-or-Never language processing good the extent to which speaking is incremental depends on the pro- enough? cessing demands of the speaking task (Ferreira & Swets 2002 ). Given the importance of task effects in a range of cognitive doi:10.1017/S0140525X15000771, e72 domains, any complete model of language processing must b a include mechanisms for explaining how they arise. Fernanda Ferreira and Kiel Christianson Moreover, the idea that language processing proceeds chunk- a Department of Psychology, University of California, Davis, Davis, CA 95616; by-chunk is not novel. C&C consider some antecedents of their b College of Education, University of Illinois, Urbana – Champaign, Champaign, proposal, but several are overlooked. For example, they argue IL 61820. that memory places major constraints on language processing, es- [email protected] sentially obligating the system to chunk and interpret as rapidly as http://psychology.ucdavis.edu/people/fferreir eager processing ). This was a key mo- possible (what they term ” “ [email protected] ’ s original garden-path model (Frazier & tivation for Lyn Frazier http://epl.beckman.illinois.edu/ ) and the parsing strategies known as minimal attach- Rayner 1982 ’ ment and late closure: The parser s goal is to build an interpreta- s) Now-or-Never bottleneck ’ s (C&C ’ Christiansen & Chater Abstract: framework is similar to the Good-Enough Language Processing model rst rather than fi tion quickly and pursue the one that emerges (Ferreira et al. 2002 ), particularly in its emphasis on sparse waiting for and considering multiple alternatives. This, too, is representations. We discuss areas of overlap and review experimental part of C&C that the parser cannot construct multiple s proposal ’ – s arguments, including evidence for fi ndings that reinforce some of C&C ’ – representations at the same level in parallel but the connections underspeci fi cation and for parsing in “ chunks. ” In contrast to Good- to the early garden-path model are not mentioned, and the incom- Enough, however, Now-or-Never does not appear to capture patibility of this idea with parallel models of parsing is also not misinterpretations or task effects, both of which are important aspects of given adequate attention. Another example is work by Tyler and comprehension performance. 1987 ), who showed that listeners form unlinked local Warren ( phrasal chunks during spoken language processing and who con- Christiansen & Chater (C&C) offer an intriguing proposal con- nd no evidence for the formation of a fi clude that they could cerning the nature of language, intended to explain fundamental global sentence representation. Thus, several of these ideas aspects of language comprehension, production, learning, and have been part of the literature for many years, and evidence evolution. We agree with the basic framework, and indeed we for them can be found in research motivated from a broad have offered our own theoretical approach, Good-Enough (GE) range of theoretical perspectives. Language Processing, to capture many of the phenomena dis- s Perhaps the most critical aspect of comprehension that C&C ’ cussed in the target article, particularly those relating to both approach does not capture is meaning and interpretation: C&C ine comprehension. In this commentary, we hope fl online and of describe an architecture that can account for some aspects of pro- to expand the discussion by pointing to some of these connections cessing, but their model seems silent on the matter of the content and highlighting additional phenomena that C&C did not discuss of people ’ s interpretations. This is a serious shortcoming given but that reinforce some of their points. In addition, however, we the considerable evidence for systematic misinterpretation (e.g., believe the GE model is better able to explain important aspects 2001 Christianson et al. ; Patson et al. 2006 ; van Gompel ; 2009 of language comprehension that C&C consider, as well as several 2006 et al. ). In our work, we demonstrated that people who they leave out. Of course, no single article could be comprehen- While Mary bathed the baby played in read sentences such as sive when it comes to a fi eld as broad and active as this one, but the crib often derive the interpretation that Mary bathed the we believe a complete theory of language must ultimately have baby, and they also misinterpret simple passives such as The dog something to say about these important phenomena, and particu- (Ferreira 2003 ). These are not small ten- was bitten by the man ’ s interpretations. larly the content of people dencies; the effects are large, and they have been replicated in We begin, then, with a brief review of the GE approach (Fer- numerous studies across many different labs. For C&C, these 2002 ). The fundamental assumption is that interpreta- reira et al. omissions are a lost opportunity because these results are consis- tions are often shallow and sometimes inaccurate. This idea that tent with their proposed architecture. For example, misinterpre- fi interpretations are shallow and underspeci ed is similar to tations of garden-path sentences arise in part because the parser C&C ’ s suggestion that the comprehension system creates processes sentences in thematic chunks and fails to reconcile chunks that might not be combined into a single, global represen- the various meanings constructed online. Recently, we demon- tation. In their model, this tendency arises from memory con- strated that the misinterpretations are attributable to a failure to straints that lead the system to build chunks at increasingly “ clean up ” the interpretive consequences of creating these abstract levels of representation. As evidence for this assumption ’ s nding compatible with C&C chunks (Slattery et al. 2013 ), a fi fi regarding underspeci ed representations, C&C might have BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 28 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

29 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ idea that chunks are quickly recoded into more abstract levels of 2013 studies (Dominey et al. ). Moreover, 2003 ; Hinaut & Dominey representation and that it is dif fi cult to re-access the less abstract it has been argued that reservoir computing shares important pro- representations. cessing characteristics with cortical networks (Rabinovich et al. C&C s framework is exciting, and we believe it will inspire sig- ’ 2013 ), making this framework par- 2008 ; Singer 2013 ; Rigotti et al. ni fi cant research. Their creative synthesis is a major achievement, ticularly suitable to the computational study of cognitive functions. and we hope we have contributed constructively to the project by To demonstrate the ability of reservoir models to memorize lin- pointing to areas of connection and convergence as well as by guistic input over time, we exposed an echo-state network (Jaeger rst 1,000 highlighting important gaps. & Haas 2004 ) to a word sequence consisting of the fi words (roughly the length of this commentary) of the Scholarpe- dia entry on echo-state networks. Ten networks were randomly generated with 1,000 units and static, recurrent, sparse connectiv- ity (20% inhibition). The read-outs were adapted such that the Reservoir computing and the Sooner-is-Better network had to recall the input sequence 10 and 100 words bottleneck back. The 358 different words in the corpus were represented or- thogonally, and the word corresponding to the most active output doi:10.1017/S0140525X15000783, e73 unit was taken as the recalled word. For a 10-word delay, the a b correct word was recalled with an average accuracy of 96% Stefan L. Frank and Hartmut Fitz (SD=0.6%). After 100 words, accuracy remained at 96%, suggest- a Centre for Language Studies, Radboud University Nijmegen, 6500 HD ing that the network had memorized the entire input sequence. b Max Planck Institute for Psycholinguistics, 6500 Nijmegen, The Netherlands; This indicates that there was suf fi cient information in the AH Nijmegen, The Netherlands. system ’ s state-space trajectory to reliably recover previous percep- [email protected] tual input even after very long delays. Sparseness and inhibition, www.stefanfrank.info two pervasive features of the neocortex and hippocampus, were fi hartmut. [email protected] critical: Without inhibition, average recall after a 10-word delay tz-hartmut www.mpi.nl/people/ fi dropped to 51%, whereas fully connected networks correctly recalled only 9%, which equals the frequency of the most Abstract: Prior language input is not lost but integrated with the current input. This principle is demonstrated by : Untrained ” reservoir computing “ common word in the model s input. In short, the more brain- ’ recurrent neural networks project input sequences onto a random point in like the network, the better its capacity to memorize past input. high-dimensional state space. Earlier inputs can be retrieved from The modelling results should not be mistaken for a claim that this projection, albeit less reliably so as more input is received. The people are able to perfectly remember words after 100 items of Sooner-is-Better. Now-or-Never ” “ ” bottleneck is therefore not but “ intervening input. To steer the language system towards an inter- pretation, earlier input need not be available to explicit recall and ” bot- Now-or-Never “ Christiansen & Chater (C&C)argue that the verbalization. Thus, it is also irrelevant to our echo-state network tleneck arises because input that is not immediately processed is simulation whether or not such specialized read-outs exist in the forever lost when it is overwritten by new input entering the human language system. The simulation merely serves to illustrate same neural substrate. However, the brain, like any recurrent the concept of state-dependent processing where past perceptual network, is a state-dependent processor whose current state is a input is implicitly represented in the current state of the network. function of both the previous state and the latest input (Buonomano A more realistic demonstration would take phonetic, or perhaps ). The incoming signal therefore does not wipe out 2009 & Maass even auditory, features as input, rather than presegmented previous input. Rather, the two are integrated into a new state words. Because the dynamics in cortical networks is vastly more that, in turn, will be integrated with the next input. In this way, diverse than in our model, there is no principled reason such net- an input stream “ lives on ” in processing memory. Because prior works should not be able to cope with richer information sources. input is implicitly present in the system ’ s current state, it can be Downstream networks can then access this information when in- faithfully recovered from the state, even after some time. Hence, terpreting incoming utterances, without explicitly recalling previ- the latest input to there is no need to immediately chunk “ ” ous words. Prior input encoded in the current state can be used protect it from interference. This does not mean that no part of for any context-sensitive operation the language system might the input is ever lost. As the integrated input stream grows in for example, to predict the next phoneme or be carrying out – cult to reliably make use of length, it becomes increasingly dif fi word in the unfolding utterance, to assign a thematic role to the the earliest input. Therefore, the sooner the input can be used current word, or to semantically integrate the current word with for further processing, the more successful this will be: There is a a partial interpretation that has already been constructed. ” “ Sooner-is-Better ” rather than a “ Now-or-Never bottleneck. Because language is structured at different levels of granularity ̌ So-called reservoir computing models (Luko š evic ius & Jaeger (ranging from phonetic features to discourse relations), the lan- 2009 ; Maass et al. ) exemplify this perspective on language 2002 guage system requires neuronal and synaptic mechanisms that processing. Reservoir computing applies untrained recurrent net- operate at different timescales (from milliseconds to minutes) in works to project a temporal input stream into a random point in order to retain relevant information in the system ’ s state. Precisely a very high-dimensional state space. A ” network is then “ read-out how these memory mechanisms are implemented in biological calibrated, either online through gradient descent or of fl ine by networks of spiking neurons is currently not well-understood; pro- linear regression, to transform this random mapping into a posals include a role for diverse, fast-changing neuronal dynamics desired output, such as a prediction of the incoming input, a recon- ) coupled with short-term synaptic plasticity (Gerstner et al. 2014 struction of (part of) the previous input stream, or a semantic rep- (Mongillo et al. ) and more long-term adaptation through 2008 resentation of the processed language. Crucially, the recurrent ). The nature 2001 spike-timing dependent plasticity (Bi & Poo network itself is not trained, so the ability to retrieve earlier input of processing memory will be crucial in any neurobiologically from the random projection cannot be the result of learned chunk- viable theory of language processing (Petersson & Hagoort ing or other processes that have been acquired from language expo- ), and we should therefore not lock ourselves into architec- 2012 1999 sure. Indeed, Christiansen and Chater ( ) found that even tural commitments based on stipulated bottlenecks. before training, the random, initial representations in a simple re- current network s hidden layer allow for better-than-chance classi- ’ ACKNOWLEDGMENTS fi cation of earlier inputs. Reservoir computing has been applied to We would like to thank Karl-Magnus Petersson for helpful discussions on simulations of human language learning and comprehension, and these issues. SLF is funded by the European Union Seventh Framework such models accounted for experimental fi ndings from both behav- Programme under grant no. 334028. ; Frank & Bod 2011 ioural (Fitz 2011 ) and neurophysiological BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 29 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

30 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language “ typically assumed in the language sciences ; this includes com- ” Natural language processing and the Now-or- putational linguistics and transition-based parsing models. Tra- Never bottleneck ditionally, each of these levels was processed sequentially in a pipeline, contrasting with the parallelism of the Chunk-and- doi:10.1017/S0140525X15000795, e74 Pass framework. However, the appearance of general incre- mental processing frameworks spanning various levels, from Carlos Gómez-Rodríguez ), has led to segmentation to parsing (Zhang & Clark 2011 LyS (Language and Information Society) Research Group, Departamento de recent research on joint processing where the processing of Computación, Universidade da Coruña, Campus de Elviña, 15071, A Coruña, several levels takes place simultaneously and in parallel, Spain. passing information between levels (Bohnet & Nivre 2012 ; [email protected] ). These models, which improve accuracy Hatori et al. 2012 http://www.grupolys.org/~cgomezr over pipeline models, are very close to the Chunk-and-Pass fi ciency of Abstract: Researchers, motivated by the need to improve the ef framework. natural language processing tools to handle web-scale data, have recently : The joint processing Predictive language processing (sect. 3.3) arrived at models that remarkably match the expected features of human models just mentioned are hypothesized to provide accuracy language processing under the Now-or-Never bottleneck framework. This improvements precisely because they allow for a degree of pre- provides additional support for said framework and highlights the research dictive processing. Contrary to pipeline approaches where in- potential in the interaction between applied computational linguistics and ows in a bottom-up way, these systems allow formation only fl cognitive science. to constrain the pro- top-down information from higher levels “ Christiansen & Chater (C&C) describe how the brain ’ s limitations ” cessing of the input at lower levels, just as C&C describe. to retain language input (the Now-or-Never bottleneck) constrain Acquisition as learning to process (sect. 4) : Transition-based and shape human language processing and acquisition. parsers learn a sequence of processing actions (transitions), Interestingly, there is a very strong coincidence between the ; Nivre rather than a grammar (Gómez-Rodríguez et al. 2014 characteristics of processing and learning under the Now-or- 2008 fl ), making the learning process simple and exible. Never bottleneck and recent computational models used in the ): This is also a general characteristic of all Local learning (sect. 4.2 fi eld of natural language processing (NLP), especially in syntactic transition-based parsers. Because they do not learn grammar parsing. C&C provide some comparison with classic cognitively c situations, rules but processing actions to take in speci fi inspired models of parsing, noting that they are in contradiction adding a new example to the training data will create only with the constraints of the Now-or-Never bottleneck. However, local changes to the inherent language model. At the imple- a close look at the recent NLP and computational linguistics liter- mentation level, this typically corresponds to small weight ature (rather than the cognitive science literature) shows a clear – changes in the underlying machine learning model be it a t remarkably well with trend toward systems and models that fi fi ), 2007 er (Nivre et al. support vector machine (SVM) classi s framework. C&C ’ perceptron (Zhang & Clark ), or neural network (Chen 2011 It is worth noting that most NLP research is driven by purely & Manning 2014 ), among other possibilities. pragmatic, engineering-oriented requirements: The primary goal Online learning and learning to predict (sect. 4.1 and 4.3) : Evalua- nd models that provide plausible explanations of the is not to fi tion of NLP systems usually takes place in standard, fi xed properties of language and its processing by humans, but rather corpora, and so recent NLP literature has not placed much em- to design systems that can parse text and utterances as accurately phasis on online learning. However, some systems and frame- and ef ciently as possible for practical applications like opinion fi works do use online learning models with error-driven learning, mining, machine translation, or information extraction, among like the perceptron (Zhang & Clark 2011 ). The recent surge of others. interest in parsing with neural networks (e.g., Chen & Manning In recent years, the need to develop faster parsers that can work ) also seems to point future research in 2014 ; Dyer et al. 2015 on web-scale data has led to much research interest in incremen- this direction. transition- tal, data-driven parsers; mainly under the so-called Putting it all together, we can see that researchers whose moti- based 2008 ). This family of (or shift-reduce) framework (Nivre vating goal was not psycholinguistic modeling, but only raw com- parsers has been implemented in systems such as MaltParser fi putational ef ciency, have nevertheless arrived at models that (Nivre et al. ), ZPar (Zhang & Clark 2011 ), ClearParser 2007 conform to the description in the target article. This fact provides (Choi & McCallum ), or Stanford CoreNLP (Chen & 2013 further support for the views C&C express. 2014 ), and it is increasingly popular because they are Manning A natural question arises about the extent to which this coinci- easy to train from annotated data and provide a very good ciency require- fi dence is attributable to similarities between the ef trade-off between speed and accuracy. – or rather to the fact ments of human and automated processing Strikingly, these parsing models present practically all of the that because evolution shapes natural languages to be easy to characteristics of processing and acquisition that C&C describe process by humans (constrained by the Now-or-Never bottle- as originating from the Now-or-Never bottleneck in human neck), computational models that mirror human processing will processing: naturally work well on them. Relevant differences between the Incremental processing (sect. 3.1) : A de ning feature of transition- fi brain and computers, such as in short-term memory capacity, based parsers is that they build syntactic analyses incrementally seem to suggest the latter. Either way, there is clearly much to as they receive the input, from left to right. These systems can be gained from cross-fertilization between cognitive science and build analyses even under severe working memory constraints: computational linguistics: For example, computational linguists Although the issue of ” “ stacking up with right-branching lan- nd inspiration in cognitive models for designing NLP tools can fi arc-standard guages mentioned by C&C exists for so-called fi ciently with limited resources, and cognitive scien- that work ef 2004 ), parsers based on the arc-eager model parsers (Nivre tists can use computational tools as models to test their hypothe- (e.g., Gómez-Rodríguez & Nivre ; Nivre 2013 2003 ) do not ac- ses. Bridging the gap between these areas of research is essential cumulate right-branching structures in their stack; as they build to further our understanding of language. dependency links as soon as possible. In these parsers, we only need to keep a word in the stack while we wait for its head or its ACKNOWLEDGMENTS direct dependents, so the time that linguistic units need to be This work was funded by the Spanish Ministry of Economy and retained in memory is kept to the bare minimum. Competitiveness/ERDF (grant FFI2014-51978-C2-2-R) and Xunta de Galicia (grant R2014/034). I thank Ramon Ferrer i Cancho for helpful Multiple levels of linguistic structure (sect. 3.2) : As C&C mention, comments on an early version of this commentary. the organization of linguistic representation in multiple levels is BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 30 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

31 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ Realizing the Now-or-Never bottleneck and Chunk-and-Pass processing with Item-Order- fi Rank working memories and masking eld chunking networks doi:10.1017/S0140525X15000801, e75 Stephen Grossberg Center for Adaptive Systems, Boston University, Boston, MA 02215. [email protected] http://cns.bu.edu/~steve s (C&C ’ s) key goals for a language Abstract: Christiansen & Chater ’ system have been realized by neural models for short-term storage of linguistic items in an Item-Order-Rank working memory, which inputs to Masking Fields that rapidly learn to categorize, or chunk, variable- length linguistic sequences, and choose the contextually most predictive list chunks while linguistic inputs are stored in the working memory. Key goals that Christiansen & Chater (C&C) propose for language processing have already been realized by real-time neural models of speech and language learning and performance, notably: 1. C&C write in their abstract about a Now-or-Never bottle- neck whereby the brain compresses and recodes linguistic input Figure 1 (Grossberg). Speech hierarchy. Each processing level as rapidly as possible; a multilevel linguistic representation; and in this hierarchy is an Item-Order-Rank (IOR) working memory a predictive system, which ensures that local linguistic ambiguities that can store sequences with repeated items in short-term are dealt with Right-First-Time using Chunk-and-Pass processing. memory. The second and third IOR working memories are, in 2. At the beginning of paragraph 2 of section 3.3, C&C note addition, multiple-scale Masking Fields (MF) that can chunk that predictions for higher-level chunks may of ” run ahead “ input sequences of variable length, and choose the sequence, or “ in re- two ” those for lower-level chunks, as when listeners answer sequences, for storage that receive the most evidence from its “ How many animals of each kind did sponse to the question inputs. Each level receives its bottom-up inputs from an ” Moses take on the Ark? adaptive lter and reads-out top-down expectations that focus fi attention on the feature patterns in their learned prototypes at the previous level. The fi rst level stores sequences of item Neural models of speech and language embody design princi- chunks. The second level stores sequences of list chunks. The ples and mechanisms that automatically satisfy such properties. individual list chunks of the third level thus represent sequences Introduced in Grossberg ( 1978b ), they have progressively ; 1978a of list chunks at the second level, including sequences with developed to the present time. Two key contributions are as repeated words, like “ DOG EATS DOG. ” During rehearsal, follows: (a) a model for short-term storage of sequences of lan- each chunk at a higher level can read out its learned sequence guage items that can include repeated items, called an Item- through its top-down expectation with the support of a volitional Order-Rank (IOR) working memory. The working memory signal that converts the top-down modulatory signals into signals fi lter to (b) a model for learned unitization, inputs via an adaptive that are capable of fully activating their target cells. categorization, or chunking of variable-length sequences of items that are stored in working memory, called a Masking Field (MF). An MF clari fi es how the brain rapidly learns to categorize, or An IOR working memory (WM) stores a temporal stream of chunk, variable-length linguistic sequences, and uses recurrent inputs through time as an evolving spatial pattern of content-ad- competitive interactions to choose the most predictive sequence dressable item representations. This WM model is called an chunk, or list chunk, as linguistic inputs are processed in real IOR model because its nodes, or cell populations, represent list time by the working memory. The MF, in turn, sends predictive items , the temporal order in which the items are presented is top-down matching signals to the working memory to attentively across nodes, and the same item stored by an activity gradient select item sequences that the winning chunks represent. can be repeated in different list positions, or ranks primacy .A Both IOR and MF networks are realized by recurrent on-center, gradient stores items in WM in the correct temporal order, with off-surround networks whose cells obey the membrane equations rst item having the highest activity. Recall occurs when a fi the 1973 ). of neurophysiology; that is, shunting dynamics (Grossberg basal ganglia rehearsal wave activates WM read-out. The node These working memory and chunking networks have been used with the highest activity reads out fastest and self-inhibits its to explain and simulate many challenging properties of variable- prevents persever- WM representation. Such inhibition-of-return rate variable-speaker speech and language data: for example, ation of performance. Both psychophysical and neurophysiologi- ), Cohen and Ames and Grossberg ( ), Boardman et al. ( 2008 1999 cal data support this coding scheme; see Grossberg ( 2013 ), Grossberg ( 1986 1986 ), Grossberg ( ; 2003 ), Grossberg et al. ). 2008 ), and Silver et al. ( 2011 Grossberg and Pearson ( ), and Grossberg and Pearson ( 1997 ), Grossberg and Myers ( 2000 These circuits were derived by analyzing how a WM could be de- 2008 ( ). Most recently, such working memories and chunking net- signed to enable list chunks of variable length to be rapidly learned works have been incorporated into the cARTWORD hierarchical and stably remembered through time, leading to postulates which laminar cortical model of speech learning and recognition (Gross- – – working memories all imply that linguistic, motor, and spatial 2014 ). berg & Kazerounian ; Kazerounian & Grossberg 2011 have a similar design, with similar data patterns across modalities, The Item-Order-Rank working memory clari fi es data such as in and that there exists an intimate link between list chunking and Goal 2 cited above because suf fi cient subsets of working memory ) reviews supportive data. 2013 WM storage. Grossberg ( items can choose a predictive chunk, even if there are incongruent “ stored in an MF items ” An MF is a specialized IOR WM. The sequence elements. A Masking Field clari fi es data such as in Goal are list chunks that are selectively activated, via a bottom-up adap- 1, above, because the most predictive list chunks are chosen in tive lter, by subsequences of items that are stored in an item fi real time as linguistic data are processed in working memory. BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 31 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

32 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language fi lter activates WM. As items are stored in item WM, an adaptive comprehension misses some of the key challenges posed by the the learned list chunks that represent the most predictive item cation and repair in conversation. fi processes of clari groupings at any time, while suppressing less-predictive chunks. ’ The thrust of C&C s approach is that language processing is a In order for an MF list chunk to represent lists (e.g., syllables or process that involves rapid, local, lossy chunking “ Now-or-Never ” words) of multiple lengths, its cells interact within and between of linguistic representations and facilitates a form of autonomous multiple spatial scales, with the cells of larger scales capable of se- prediction of both our own and each other ’ s utterances. This leads lectively representing item sequences of greater length, and of in- to the proposal that “ Chunk-and-Pass processing implies that hibiting other MF cells that represent item sequences of lesser there is practically no possibility for going back once a chunk is ” self-similarity ). length ( “ (sect. 3.3, para. 2). ” created MFs solve the , which asks how a temporal chunking problem The phenomena of clari fi cation and repair seem to present an for chunk for an unfamiliar list of familiar speech units – important counterexample to this picture of language use. example, a novel word composed of familiar subwords – can be Dynamic revisions to utterances, or , are ubiquitous in di- repairs learned under unsupervised learning conditions. What mecha- alogue. In natural conversations it is rare for even a single utter- nisms prevent familiarity of subwords (e.g., MY, ELF, and ance to be produced without some form of online revision, with SELF), which have already learned to activate their own list these occurring approximately once every 25 words in conversa- chunks, from forcing the novel longer list (e.g., MYSELF) to 2013 ), with the rate of repairs ad- tional speech (Hough & Purver always be processed as a sequence of these smaller familiar 2011 justed to task demands (Colman & Healey ) and to individual chunks, rather than as a newly learned, unitized whole? How ; Lake 2012 differences such as clinical conditions (Howes et al. does a not-yet-established word representation overcome the et al. 2011 ). Repair contagion, whereby the probability of salience of already well-established phoneme, syllable, or word another repair occurring increases after an initial one, is also representations to enable learning of the novel word to occur? ). 2013 common (Hough & Purver This solution implies the properties of Goal 1, as well as psycho- Although many of these repairs are syntactically or lexically local physical data like the Magical Number Seven and word superiority for example, words or word fragments that are in C&C – ’ s sense effects. restarted – some involve more-substantial revisions, and some Lists with repeated words can be coded by a three-level occur after a turn is apparently complete (Schegloff et al. 1977 ). Fig. 1 ): The fi rst level contains item chunks that input network ( Conversation analysts claim that the (minimum) space in which lter. This MF inputs to fi to a Masking Field through an adaptive direct repairs or revisions to a speaker ’ s utterance can be made ” items “ fi a second MF through an adaptive lter that compresses is the four subsequent turns in the conversation (Schegloff “ are, however, ” items rst MF into list chunks. These fi from the fi cant, nonlocal mech- 1995). This highlights the operation of signi ’ list chunks. Thus, the second MF s list chunks represent sequenc- anisms that can make use of prior phonetic, lexical, syntactic, and es of list chunks. Because it is also an IOR working memory, it can semantic information over relatively long intervals. store sequences with repeated list chunks, including sequences Even self-repairs, the most common and most local form of DOG EATS DOG ”– for example, – with repeated words “ backtracking in conversation, are often nonlocal in a different thereby providing the kind of multilevel, multiscale, predictive hi- sense, as they are produced in response to concurrent feedback erarchy that the authors seek. from an interlocutor, which works against the idea of encapsulated local processing (e.g., Bavelas & Gerwing ). 2007 1979 ; Goodwin The more strongly people are committed to the predictions of their own language processor, the less able they must be to deal poten- – with these real-time adjustments or reversals of decisions Better late than Now-or-Never: The case of tially of phonetic, lexical, syntactic, or semantic information – in interactive repair phenomena response to feedback from others. However, it seems that in con- versation such revisions are the norm, not the exception. People doi:10.1017/S0140525X15000813, e76 can take advantage of each other s repair behavior, too: In a ’ a b c visual world paradigm, when experimental subjects hear repaired Patrick G. T. Healey, Julian Hough, Christine Howes, and referring expressions compared to fl uent ones, participants can a Matthew Purver use repaired material to speed up reference resolution a Cognitive Science Research Group, School of Electronic Engineering and ). Additionally, experiments in interrup- 2001 (Brennan & Schober Computer Science, Queen Mary University of London, London E1 4NS, United ) show that participants often 2011 cation (Healey et al. fi tive clari b Department of Philosophy, Linguistics and Theory of Science, Kingdom; c cation fi restart the interrupted turn after responding to a clari Fak. LiLi, Universität University of Gothenburg, 405 30 Gothenburg, Sweden; request, again showing that people must, at least in some cases, Bielefeld, D-33501 Bielefeld, Germany. have access to the previously produced material. [email protected] [email protected] Ambiguities can emerge late in a dialogue, and people routinely [email protected] [email protected] deal with them. Although C&C do acknowledge the availability of Abstract: Empirical evidence from dialogue, both corpus and fi ca- mechanisms to “ repair the communication by requesting clari experimental, highlights the importance of interaction in language ” tion from the dialogue partner (sect. 3.1, para. 8), they do not ’ s (C&C s) ’ – use and this raises some questions for Christiansen & Chater discuss how and whether these repair phenomena are consistent proposals. We endorse C&C ’ s call for an integrated framework but with the Chunk-and-Pass model. Similarly, C&C argue that argue that their emphasis on local, individual production and early commitment to predictions about what is coming next fi cult to accommodate the ubiquitous, comprehension makes it dif ’ s should lead to frequent reuse of our own and each other fi cation and repair in interactive, and defeasible processes of clari lexical and syntactic representations; however, the evidence for conversation. this in natural conversation is controversial. We have found that syntactic reuse is actually less common than would be expected fi Language is rst encountered and deployed in interaction. A char- by chance (Healey et al. ). The need to respond constructive- 2014 acteristic feature of natural interaction is that people often need to ly to a conversational partner seems to overwhelm some of the address problems with mutual understanding in which elements processes observed in individual language processing. of what has been said need to be reworked or redone in some These observations reinforce C&C ’ s emphasis on the highly way. These processes raise some important questions for Christi- time-critical and piecemeal, incremental nature of language pro- s approach ’ s) proposals. We support C&C ’ ’ ansen & Chater s (C&C cessing, but they also suggest that the demands of engaging and agree that an integrated framework for the language sciences with a live conversational partner requires more fl exible, defeasi- is a desirable goal for language researchers. However, we argue ble, and interactive mechanisms. Their proposal currently ’ that C&C s emphasis on local, individual production and BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 32 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

33 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ cient fi captures a type of incrementality that is essential for ef relationship to, and must be actively integrated with, the stimuli ) calls Wundt ’ s Principle, ” “ working memory, what Levelt ( 1993 that were just encountered. Thus, in real-life contexts, previous whereby a consuming module can begin operating with a uence on the processing of new material exerts a powerful in fl minimal amount of characteristic input. However, repair phenom- material. ena entail other kinds of incrementality as desiderata for a psycho- Consider the difference between hearing the sequence of logical model: namely, recoverability and repairability of and hearing the sequence of words words “ friend-mulch-key ” increments from the interactive context. friend-ship-pact. “ ” In the fi rst sequence, the representation of One existing formal and computational model capable of cap- is degraded by interference with the word friend mulch and turing the different facets of incrementality needed for repair friend In the second sequence, by contrast, the word key. interacts mechanisms is Dynamic Syntax (DS, Purver et al. ). 2011 ; 2006 This simple example re meaningfully with and ship pact. ects a fl DS models language as a set of mechanisms for incrementally general and ubiquitous phenomenon in real-life language: New building up interpretations in context, and is therefore broadly material does not necessarily obliterate previous material. commensurate with the C&C program; these mechanisms can Instead, past and present information interact to produce under- also be induced (acquired) from the data available to a child standing, and the memory of past events continually shapes the learner (Eshghi et al. 2013 ), with the learning process being piece- 2006 present (Nieuwland & Van Berkum ). meal, incremental, and process-driven as required by C&C. It seems the processing bottleneck that C&C describe applies However, DS can also account for repair phenomena by using ex- best to early processing areas (e.g., primary sensory cortex), plicit recoverability mechanisms through backtracking over stored where sensory traces may have a very short lifetime (<200 ms). graphs of incrementally constructed semantic content (Eshghi At higher levels of the language hierarchy, however, neural cir- 2012 ). We take this approach to ; Hough & Purver 2015 et al. cuits must retain a longer history of past input to enable the be complementary to the C&C model, showing that many of integration of information over time. Temporal integration is nec- their insights can be practically implemented, while also address- essary for higher-order regions to support the understanding of a cant challenges posed by interactive repair phenom- fi ing the signi new word in relation to a prior sentence or a new sentence in re- ena in dialogue. In sum, we propose a model that is compatible lation to the larger discourse. We have found that temporal inte- aspect of their approach, but not with the with the “ Now ” gration occurs over longer timescales in higher-order regions Never. “ ” (Hasson et al. 2011 ), and that the intrinsic 2008 ; Lerner et al. neural dynamics become slower across consecutive stages of the ). cortical hierarchy (Honey et al. 2012 ; Stephens et al. 2013 Thus, the temporal bottleneck appears to gradually widen across the consecutive stages of the language processing hierarchy, as in- How long is now? The multiple timescales of creasingly abstract linguistic structures are processed over longer language processing timescales. uenced by the ideas of Macdonald and Christiansen (1996), In fl doi:10.1017/S0140525X15000825, e77 ), concerning the memory that is intrinsic to 1997 as well as Fuster ( b a a ongoing information processing, and supported by recent single- Christopher J. Honey, and Janice Chen, Kathrin Müsch, b unit, electrocorticography, and functional imaging data, we have Uri Hasson developed a brain-based framework for such a functional organi- a Department of Psychology, Collaborative Program in Neuroscience, b ). In this framework, (a) virtually all 2015 zation (Hasson et al. Department of University of Toronto, Toronto, ON M5S 3G3, Canada; cortical circuits can accumulate information over time, and (b) Psychology and the Neuroscience Institute, Princeton University, Princeton, the timescale of accumulation varies hierarchically, from early NJ 08540. sensory areas with short processing timescales (tens to hundreds [email protected] [email protected] of milliseconds) to higher-order areas with long processing time- [email protected] [email protected] scales (many seconds to minutes). In this hierarchical systems per- http://www.honeylab.org spective, memory is not restricted to a few localized stores and it is http://hlab.princeton.edu not transient; instead memory is intrinsic to information process- Abstract: Christiansen & Chater (C&C) envision language function as ing that unfolds throughout the brain on timescales from millisec- a hierarchical chain of transformations, enabling rapid, continuous “ onds to minutes. We have suggested the term ” process memory processing of input. Their notion of a Now-or-Never ” “ bottleneck may to refer to active traces of past information that are used by a local be elaborated by recognizing that timescales become longer at neural circuit to process incoming information in the present that is, the successive levels of the sensory processing hierarchy – moment; this is in distinction to the more traditional notion of ” window of “ Now expands. We propose that a hierarchical “ process “ working memory, ” which is a more functionally encapsulated is intrinsic to language processing. ” memory memory store. Process memory may support the Chunk-and-Pass mechanism Meaningful interactions between linguistic units occur on many ow. fl that C&C propose for organizing inter-regional information timescales. After listening to 10 minutes of a typical English nar- As they note: “ incremental processing in comprehension and pro- 1,000 words composing ∼ rative, a listener will have heard 100 ∼ duction takes place in parallel across multiple levels of linguistic 25 paragraphs. When the 1,001st word sentences grouped into ∼ ” representation, each with a characteristic temporal window in the narrative arrives, it enters a rich syntactic and semantic (sect. 3.2, para. 5). In our view, the Now-or-Never bottleneck context that spans multiple timescales and levels of abstraction. can be made compatible with contextual language processing by Christiansen & Chater (C&C) rightfully emphasize the constraints “ allowing the Now (i.e., the local circuit memory of prior ” imposed by the rapidity of language input. Here we highlight the events) to have a variable duration. For example, the ” “ Now importance of a related class of constraints: those imposed by the could be understood to have a short (e.g., milliseconds) timescale need to integrate incoming information with prior information fl in sensory areas, where representations are eeting, and then to over multiple timescales. gradually expand in duration in higher-order areas, where chunk- with the obser- Now-or-Never bottleneck C&C motivate the “ ” ing is required over longer (e.g., seconds) and longer (e.g., new material rapidly oblit- vations that ” and “ memory is fl eeting “ minutes) windows of input. Thus, the “ Now ” may be understood (abstract). These statements tend to hold ” erates previous material as a time window around the present moment, in which informa- 1962 ) and in short-term true in low-level auditory masking (Elliott tion can be integrated, and the duration of the “ Now ” may length- memory experiments involving unrelated auditory items (Warren en as one moves from sensory areas toward higher-order language et al. 1969 ). However, in real-life language processing, memory circuits. fl cannot all be eeting. This is because new stimuli have a prior BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 33 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

34 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ In summary, we share the vision of C&C in which language more) can be bound in a different pattern. Synchronous binding function arises from a chain of transformations across a hierarchy ring. Moreover, only a small requires the neurons to continue fi learning to of circuits, and that language learning is a kind of “ number of patterns can be supported simultaneously, so there At the same time, we suggest that this hierarchical pro- ” process. are a limited number of bindings; the bound neurons do not all cessing framework could be re ned to account for the process fi fi re at the exact same time, so separate patterns must be quite dis- memory that is intrinsic to language processing and is needed tinct. Another option is to bind with STP. This method has neither for comprehending incoming input within multiple timescales of of these limits, with a much larger number of bindings supported prior context. and the duration being up to minutes; it does, however, take longer to form. Binding can also be done with LTP, but this shades into permanent associative memory. When people or computer systems process language, it is faster exibility in language fl Neural constraints and and safer to avoid binding. When binding is necessary, lower-level processing is likely to use synchrony. Higher-level processing is processing likely to use STP. So the speech signal uses synchrony; neurons representing the prime formants re synchronously in the audito- fi doi:10.1017/S0140525X15000837, e78 ). The simulated neural parser (Huyck ry cortex (Eggermont 2001 Christian R. Huyck 2009 ) uses STP for binding the slots in the neural implementation of verb frames associated with sentences. These can be used im- Department of Computer Science, Middlesex University, London NW4 4BT, United Kingdom. mediately after sentence processing to retrieve the meaning of the [email protected] sentence, but they are gradually erased by the STP fading. The http://www.cwa.mdx.ac.uk/chris/ neurons that support the binding are reused later for processing other sentences. Abstract: Humans process language with their neurons. Memory in Finite-state automata (FSA) do not require binding. The evi- neurons is supported by neural ring and by short- and long-term fi dence from text engineering to support the bottleneck is that synaptic weight change; the emergent behaviour of neurons, the Message Understanding Competitions for Text Extraction fi synchronous ring, and cell assembly dynamics is also a form of (Appelt et al. ) converged on an FSA cascade to solve the 1993 memory. As the language signal moves to later stages, it is processed problem of processing text. One automaton separated words, a with different mechanisms that are slower but more persistent. second categorised them lexically, a third did simple phrase parsing, and a fourth combined phrases. These could be run in The Now-or-Never bottleneck in language processing that Chris- a cascade, and perhaps a cascade is the basic mechanism that tiansen & Chater (C&C) propose has a great deal of evidence to the brain uses. support it. Like all cognitive processes, language processing cations for learning. As C&C note, the bottleneck also has rami fi must be implemented in neurons, and the bottleneck is a neural First, the whole language cascade (whatever that may be) is being one. Signals from the environment must be processed by learned simultaneously. Initially, low-level phenomena, such as neurons, and those neurons must keep a memory trace of those morphemes, are learned. Later, larger systems such as simple signals or they will be lost. Moreover, any processing mechanism phrase grammars begin to be learned, but the lower-level must not only be implemented by the behaviour of neurons, but in systems are still being developed. We do not know how these bi- the case of language, the process must be learned by those ological neural systems work, much less how they are learned. neurons. One mechanism may be that things are being learned and cell as- Neural memory comes in several forms. Neurons spike propa- semblies (CAs) are formed; CAs can be connected to form FSA. gating signals across their synapses to post-synaptic neurons taking Binding may be involved initially, and the synapse can then be tens of milliseconds. Neurons can be wired into cell assemblies modi fi ed to combine CAs into FSA; STP can support reverbera- (Hebb 1949 ) that can persistently fi re for seconds. Synaptic fi tion, which can then lead to LTP. Although one nite-state au- ed for seconds to minutes by means of weights can be modi fi tomaton in the cascade is being learned, both FSA above and short-term potentiation (STP), or for days, months, or longer, below it can be learned so that the whole system continues to through long-term potentiation (LTP). The formation of a cell as- improve. sembly, by potentiation, can form a circuit that can last inde fi nite- At the highest level, dialogue and above, the bottleneck begins ly. When that long-term memory is activated by a cascade of to disappear. Rich cognitive maps support this kind of processing, fi neural ring in the cell assembly, the long-term memory is also and memory is formed mostly through LTP and CA circuit dy- an active short-term memory. fi namics. Since these CAs can persistently re, and the circuits When a sentence is parsed, either in speech or in text, the can be reactivated using associative memory, it is possible to re- parsing is generally done in one pass. This single pass can be member large amounts of things. (For example, I can still remem- seen in eye-tracking evidence, especially when repairs are ber some of the dialogue from the movie I saw this weekend, and ). One pass parsing is typically sim- 1980 needed (Just & Carpenter the plot.) ulated with a stack, but a memory-based mechanism (Lewis & There is solid support for the Now-or-Never bottleneck in lan- Vasishth 2005 ) can eliminate the need for a stack. A memory- guage processing, although the bottleneck s duration is reduced as ’ based parsing mechanism has been implemented in a neural the signal passes through stages of language processing. The dis- parsing model (Huyck 2009 ), with the persistence of the cell as- tributed nature of neural processing supports multiple stages in sembly showing the strength and duration of the memory. I am processing, and the simultaneous learning of these stages. Pro- unaware of any existing simulated neural mechanism for back- cessing and learning is implemented in neurons, although CA dy- tracking in parsing. namics and binding issues are often not considered by One important aspect of eliminating the stack in parsing is that researchers. By expanding understanding and modelling at the it reduces the need for binding. Binding is another type of neural neural level, we can better understand language processing, and memory mechanism that, although needed in standard computa- we can construct more robust language processing systems. tional models, is typically overlooked. In a standard program, if a variable is assigned a value, the two are bound. Binding is usually a primitive operation so it is ignored. Binding in a neural system is fi cult because it is not primitive. There are various more dif ACKNOWLEDGMENT ring is most widely used in binding mechanisms; synchronous fi This work was supported by the Human Brain Project Grant 604102, ). Two bound assemblies the literature (Fuster & Alexander 1971 Neuromorphic Embodied Agents that Learn. ring pattern, while another pair (or fi re in roughly the same fi BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 34 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

35 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language a Levelt-like characterisation of production as the inverse of Mechanisms for interaction: Syntax as parsing, can match the required level of granularity. procedures for online interactive meaning In contrast, such data follow as an immediate consequence of building the DS view of syntax. Speakers and hearers both use the ned tree-growth mechanisms to construct a representation fi de doi:10.1017/S0140525X15000849, e79 of what is being said, taking the immediate context as input: b a The only difference between them is the additional requirement Ruth Kempson, Stergios Chatzikyriakidis, and on speakers that the construction process has to be commensurate c Ronnie Cann with some more richly annotated (possibly incomplete) structure a Philosophy Department, King ’ s College London, Strand, London WC2R 2LS, corresponding to what they have in mind. This dynamic predicts b Department of Philosophy, Linguistics and Theory of United Kingdom; that switching from parsing to production, and the converse, c Linguistics Science, University of Gothenburg, 41124 Gothenburg, Sweden; will be seamless, yielding the effect of in-tandem construction and English Language, School of Philosophy, Psychology and Language without needing to invoke higher levels of inference (Poesio & Sciences, University of Edinburgh, Edinburgh EH8 9AD, Scotland. 2011 Rieser ) or superimposed duplication of the one type of activ- [email protected] [email protected] 2013b ity upon the other (Pickering & Garrod ). Each individual [email protected] will simply be constructing the emergent structure relative to http://www.kcl.ac.uk/artshums/depts/philosophy/people/staff/associates/ the context he or she has just constructed in his or her other ca- emeritus/kempson/index.aspx ). Despite the DS com- 2013 ; 2011 pacity (Gregoromichelaki et al. http://www.stergioschatzikyriakidis.com/contact.html mitment to word-by-word incrementality, interpretation can be http://www.lel.ed.ac.uk/~ronnie/ built up with apparent delays, because language input invariably ect participant interactivity in Abstract: We argue that to re fl encodes no more than partial content speci fi cations, allowing sub- conversational dialogue, the Christiansen & Chater (C&C) perspective sequent enrichment. needs a formal grammar framework capturing word-by-word The result is, as C&C say, that there will be no encapsulated incrementality, as in Dynamic Syntax, in which syntax is the incremental c to the language faculty as universals fi linguistic universals speci fl building of semantic representations re ecting real-time parsing will be grounded in general constraints on online cognitive pro- dynamics. We demonstrate that, with such formulation, syntactic, cessing. Yet this should not be taken to deny the existence of semantic, and morpho-syntactic dependencies are all analysable as such universals: to the contrary, robust structural universals are grounded in their potential for interaction. predicted as dictated by limits imposed by logical and processing constraints in combination. Following their observation of a Now-or Never bottleneck on cog- Consider the syntactic puzzle precluding multiple long-distance nitive processing and a Chunk-and-Pass constraint to overcome dependencies. Within DS, semantic representations as trees are this hurdle, Christiansen & Chater (C&C) set the challenge that fi ned as sets of nodes each of which is uniquely identi de fi ed in existing grammars be evaluated in terms of commensurability terms of its position relative to other nodes in the tree (Blackburn with their claim that language itself should be seen in processing 1994 ). This de fi nition restricts emergent tree & Meyer-Viol terms. Directly in line with their perspective is Dynamic Syntax growth to transitions which meet this characterisation. The (DS), in which syntax is a set of mechanisms for online building effect is to freely license multiply building any one node, while en- of semantic representations used in both production and percep- suring that no such multiple actions give rise to distinguishable ). These 2001 ; 2011 ; Kempson et al. 2007 ; 2005 tion (Cann et al. output. In the case of left-periphery effects, where on the DS mechanisms involve anticipatory speci fi cations of structure xed ( “ un fi xed ” ) account, nodes can be constructed as not yet fi relative to some other structure as context, with the need for within the current domain, nothing precludes such an action subsequent update, thus achieving the desired tightly time- being repeated. However, such multiple applications of this strat- constrained interpretation process. As co-developers of DS, we egy will invariably give rise to one and the same node, yielding a suggest three points of comparison between DS and the construc- well-formed result as long as attendant attributes are compatible: tion-grammar (CoG) perspective which C&C envisage: (1) incre- hence, the restriction precluding multiple long-distance depend- mentality; (2) the parsing-production interface; (3) lack of nal languages, with their as-yet un ency. Verb- fi fi xed arguments, c to language. structural universals speci fi might seem apparent counterexamples; but here, the Chunk- Though C&C stress the importance of incrementality of both and-Pass constraint provides an answer: Case speci fi cations on parsing and production, given that CoG de fi nes syntax as stored xed node are taken to induce an immediate update of fi an un construction-types, somehow learned as wholes, it is not clear xed relation, allowing another construction fi that node to a locally what basis this provides for the word-by-word incrementality dis- fi fi xed node again with potential from its case speci of an un cations played in conversation. In informal dialogue, participants can in- for update in anticipation of the following verb. The supposed terrupt one another at any point, effortlessly switching roles. fi nal languag- counterexample of NP NP NP V sequences in verb- These switches can split any syntactic and semantic dependencies es thus merely demonstrates the interaction of logic-based and distributing them across more than one participant: In the follow- processing-based constraints, in turn accounting for typological ing examples, number 1 involves a syntactic split between prepo- fi observations such that verb- nal languages are typically case- nitive and controlling subject; sition and noun, and between in fi marking (Kempson & Kiaer 2010 ). and number 2 involves a morpho-syntactic dependency split This constraint extends to language change, further bolstering ( have plus past participle) and a syntactic/semantic dependency the overall perspective (Bouzouita & Chatzikyriakis 2009 ). As split (re fl exive and local antecedent). C&C observe, language change commonly involves prosodic re- (1) A: We ’ re going to – duction of adjacent items leading to composite grammaticalised B: Burbage to see Granny. ect what fl forms. On the DS view, such novel creations would re lled kitchen): Are you (2) A (seeing B emerging from a smoke- fi had earlier been discretely triggered sequences of update – OK? Have you actions, now with the novel composite form triggering this se- B (interrupting): burnt myself? No fortunately not. quence of update actions as a single macro induced by that form. Accordingly, we expect such grammaticalised forms to Such data, despite being widespread in conversation, pose fl ect whatever general limits are imposed by intersections re severe challenges to conventional syntactic assumptions, including of logic and processing constraints (see Chatzikyriakidis & CoG, because the fragments are characteristically not induced as Kempson [2011] for arguments that weak [clitic] pronoun clusters independently licensed by the grammar and even the sequence in Greek constitute such a case). In short, DS buttresses C&C ’ s may not be well-formed, as in example number 2. Furthermore, claims about language as a mechanism for progressive ’ it is hard to see how C&C s account of such interactions, given 35 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

36 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language construction of information-bearing units. Despite much variation indicates that regardless of the perceptual salience of the verb across languages, synchronic and diachronic, the C&C program root/stem (i.e., word initial in Quechua and word nal in fi promises to enable formally characterisable perspectives on lan- Navajo), the children ’ s earliest verb forms were root/bare stems, guage directly matching the dynamics of language behaviour in not permitted in the adult grammar; however, they never pro- interaction. duced isolated af xes, contrary to what would be predicted if fi they were using a simple chunking procedure (Courtney & Saville-Troike ). Interestingly, Tamil children use bare 2002 stems in imperative contexts, similar to adults. In contrast, their earliest indicative (nonimperative) verb forms are non-adult-like On the generalizability of the Chunk-and-Pass and consist predominantly of verbal participles (derived or in ect- fl fi ed non fi nite stems) with the auxiliary, tense, and agreement suf x- processing approach: Perspectives from es stripped away (Lakshmanan 2006 ). The mismatch between the language acquisition and music ’ children s early verbs and the adult input (consisting of complex multimorphemic words) emphasizes the role of innate knowledge doi:10.1017/S0140525X15000850, e80 of fundamental grammatical concepts (e.g., verb root/stem, in- fi fl ected stem, and af xes). Usha Lakshmanan and Robert E. Graham A Chunk-and-Pass strategy alone (without independent gram- Department of Psychology, Southern Illinois University Carbondale, s success with matical mechanisms), cannot explain children ’ Carbondale, IL 62901. free word order “ found in many morphologically complex lan- ” [email protected] [email protected] guages. In Tamil, an SOV (Subject-Object-Verb) language, Christiansen & Chater (C&C) offer the Chunk-and-Pass Abstract: sentential constituents (NPs, PPs, and CPs) may appear in nonca- strategy as a language processing approach allowing humans to make nonical sentential positions through rightward and leftward sense of incoming language in the face of cognitive and perceptual scrambling. Tamil is a null argument language, and sentences constraints. We propose that the Chunk-and-Pass strategy is not with overt realization of all arguments are rare. Tamil children adequate to extend universally across languages (accounting for between the ages of 17 months and 42 months, exhibit sensitivity fi cient to generalize to other typologically diverse languages), nor is it suf to Case restrictions and movement constraints on scrambling and auditory modalities such as music. successfully use adult-like word order permutations to signal in- ). terpretive differences (Focus versus Topic) (Sarma 2003 Christiansen & Chater (C&C) claim universality and primacy for A Chunk-and-Pass strategy would predict that shorter sentenc- their Chunk-and-Pass processing approach in language acquisition es are easier for children to process and produce than longer sen- and suggest that music provides an example of another complex tences. However, this cannot explain scenarios where the reverse acoustic signal with multilayered structuring, to which one could situation holds. For example, Tamil children (below age 5) apply the Chunk-and-Pass strategy as well. However, fundamental produce signi fi cantly fewer participial relatives than older chil- issues that C&C leave unaddressed suggest that this strategy may dren. They also strongly prefer tag relatives to the participial rel- not be generalizable to typologically diverse languages and to ative, although the former are longer and less frequent than the domains beyond language. We discuss two such issues: (1) latter. Crucially, the participial relative, though shorter (and cross-linguistic differences (e.g., morphology and word-order) more frequent), is structurally more complex because it involves and (2) domain-speci c differences (e.g., language versus music). fi movement (Lakshmanan 2000 ). It is unclear how the Chunk-and-Pass strategy would work in Let us now examine the generalizability of the Chunk-and-Pass the acquisition of synthetic languages, with complex in fl ectional approach to other complex acoustic input, as in the case of music. morphology (e.g., Tamil, Turkish, Navajo, Quechua, Cree, Some argue that music contains some semantic information, such xation (through agglutina- Swahili). Because there is extensive suf fi as meaning that emerges from sound patterns resembling quali- tion or fusion), the morpheme-to-word ratio in such languages is ties of objects and suggesting emotional content, or sometimes high, resulting in lengthy words. A single multimorphemic word as a result of symbolic connections with related but external ma- ection expresses meanings that in a language with limited or no in fl terial (Koelsch 2005 ). For example, Wagner was known to have would require a multiword clause or sentence to express. Al- short musical melodies ( ) that represented characters in leitmotif though C&C suggest that chunking of complex multimorphemic his operas, such that interactions between characters could be in- words, by means of local mechanisms (e.g., formulaicity), also ferred or interpreted from musical composition. However, these applies to agglutinative languages, they mainly consider evidence more concrete occurrences are outliers among musical works, based on English, whose impoverished in ection and low mor- fl and other interpretations of musical semantics remain much pheme-to-word ratio (particularly in its verb forms), facilitates weaker than in the context of language. Thus, although it is possi- chunking using the word (as opposed to its subparts) as a basic ble there is structural chunking, music lacks the semantic unit of analysis. “ approach ” interactionist information to inform something like an In C&C s framework, frequency and perceptual salience ’ (McClelland ) to parsing. 1987 (rather than innate grammatical mechanisms) drive the chunking Another way in which music differs from language is in the process. Existing studies on the acquisition of morphologically context of anticipation. C&C discuss anticipation as a predictive complex languages indicate that mechanisms proposed for perceptual strategy that helps streamline the process of organizing English do not readily extend to synthetic language types (Kelly incoming speech signals. Although music perception involves an- 2014 ). Crucially, lexicon-building does not take place et al. ticipation, music provides clues of a different nature regarding through storage of frequently encountered (and frequently what will follow in a phrase. Anticipation based on hierarchical used) exemplars in memory; instead, the chunking strategy may phrase structure might be similar across language and music, fi rst step in the process, in preparation for the next be only a but listeners also use rhythm, meter, and phrase symmetry to stage, namely, grammatical decomposition of stored units and predict how a musical phrase will end. C&C also discuss anticipa- the acquisition of the combinatorial principles determining their tion in discourse; however, anticipation works differently in music. subparts (see Rose & Brittain 2011 for evidence from Northern In ensemble performances, musicians often simultaneously East Cree). However, even a minor role for the chunking strategy ) music, which is dif- produce and perceive (their own and others ’ in relation to morphologically complex languages may be prob- ferent from linguistic turn-taking. lematic. A single verb root/stem, for example, is manifested In sum, it is unclear why the “ mini-linguist ” theory of language through numerous surface realizations rendering the frequency acquisition and processing theory need to be mutually exclusive – factor unreliable. Additionally, evidence from children acquiring t the child be acquiring grammar as a framework for why can ’ Quechua and Navajo, two morphologically rich languages, BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 36 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

37 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language processing and chunking? What also needs explanation is the a capacity of c. 2 seconds unless refreshed, rapidly overwritten question: Why would having only domain-general mechanisms by incoming stimuli. On these or similar foundations the current for processing different types of complex acoustic input be theory is built. s bottleneck, and chunking as a way of mitigat- advantageous? Assuming Miller ’ ing it, I see at least two points in the current theory that are either problematic or need further explication: 1. Chunking involves recoding longer How many buffers? lossy lower-level strings into shorter, higher-level strings with “ ” or multiple buffers with ” Process and perish “ ’ s theory, the higher- compression of the lower level. In Miller push-down stacks? level chunks replace the lower ones, using that same short-term memory buffer. But in C&C ’ s theory, the higher-level chunks doi:10.1017/S0140525X15000862, e81 will need to be retained in another buffer, as the next low-level in- otherwise, for example, discontinuous syn- crement is processed – Stephen C. Levinson tactic elements will get overwritten by new acoustic detail. Max Planck Institute for Psycholinguistics, Nijmegen & Donders Institute for Because there is a whole hierarchy of levels (acoustic, phonetic, Brain, Cognition and Behaviour, Radboud University, 6500 AH Nijmegen, The Netherlands. phonological, morphological, syntactic, discourse, etc.), the [email protected] passing the buck upward strategy will only allow calculation of ” “ coherence if there are just as many memory buffers as there are This commentary raises two issues: (1) Language processing is Abstract: levels. hastened not only by internal pressures but also externally by turn- 2. Mismatching chunks across levels. C&C ’ s theory seems taking in language use; (2) the theory requires nested levels of to presume nesting of chunks as one proceeds upward in compre- processing, but linguistic levels do not fully nest; further, it would seem hension from acoustics to meaning. A longstanding linguistic ob- s no obvious ’ to require multiple memory buffers, otherwise there servation is that the levels do not in fact coincide. A well-known treatment for discontinuous structures, or for verbatim recall. example is the mismatch between phonological and syntactic words (Dixon & Aikhenvald 2002 ): Consider resyllabi fi cation, Christiansen & Chater (C&C) have tried to convert a truism of as in the pronunciation of my bike is small as mai.bai.kismall s ’ 1956 short-term memory psycholinguistics (essentially, Miller – 1999 ) (Vroomen & de Gelder here, the lower-level units don ’ t limitation) into a general theory of everything in language, in match the higher ones. Similarly, syntactic structure and semantic which representations are mere traces of processing, while hierar- All men looks like Tall men in surface structure do not match: chy, patterns of change and the design features of language all structure, but has a quite different underlying semantics. Jackend- follow from processing limitations. But like most general theories, off 2002 ) theory of grammar, with interface rules handling the ’ s( ed, and it is hard to know exactly what this one seems underspeci fi mismatch between levels, is an attempt to handle this lack of would falsify it. nesting across levels. In this commentary I make two points. First, I suggest that the pressure for speed of processing comes not only from the effect of Another fl y in the ointment is that, despite the hand-waving in an evanescent signal on internal processing constraints, but also sect. 6.1.2, nonlocal dependencies are not exceptional. Particle from outside, from facts about how language is used. Second, I verbs, conditionals, parentheticals, wh-movement, center-embed- would like to gently question the truism of the “ process and ding, topicalization, extraposition, and so forth, have been central theory of linguistic signals. ” perish to linguistic theorizing, and together such discontinuous construc- Language comes, for the most part, as an acoustic signal that is tions are frequent. Now, it is true that English – despite these con- as C&C note, faster than comparable delivered remarkably fast – generally likes to keep together the bits that belong – structions nonlinguistic signals can be decoded. But why? One might try, together. But other languages (like the Australian ones) are speculatively, to relate this to the natural processing tempo of like classical Latin with c. 12% of – much freer in word order ) or to some the auditory cortex (Hickok & Poeppel 2000 NPs discontinuous, as in the three-way split (parts in bold, Pink- ciency. In fact, there are more obvious general drive to ef fi Figure 1 . ) in 2012 ; Snijders 2005 ster reasons for haste namely, the turn-taking system of language – Likewise, the preference for strictly local chunking runs into ). The turn-taking system operates with 1974 use (Sacks et al. culties at other linguistic levels. Consider the phonological fi dif short units (usually a clause with prosodic closure), and after rule that, according to the grammar books, requires the French s such unit, any other speaker may respond, the ’ one speaker mon ma to become ma before a vowel ( possessive pronoun fi thus ensuring communi- – rst speaker gaining rights to that turn is governed by mon ); in fact, ” “ , femme vs. mon épouse my wife cation proceeds apace. Turn transitions on average have a gap of the properties of the head noun from which it may be separated, only c. 200 ms, or the duration of a single syllable. Speakers are Marie will become “ ( Marie sera soit mon soit ton épouse as in hastened on by the fact that delayed responses carry unwelcome ; Schlenker ” either my or your wife 2010 ). Morphology isn ’ t neces- ). Now, the consequences 2015 semiotics (Kendrick & Torreira sarily well behaved either, some languages even randomizing of this system for language processing are severe: It takes fi xes (Bickel et al. 2007 ). So we need to know how the local- af c. 600 ms for preparation to speak a single word (Indefrey & 2004 n& Levelt fi ) and c. 1,500 ms to plan a single clause (Grif 2000 ), so to achieve a gap of 200 ms requires that midway Bock during an incoming turn, a responder is predicting the rest of it planning his or her response well in advance of the end. To and guard against prediction error, comprehension of the incoming turn must proceed even during preparation of the response – so guaranteeing overlap of comprehension and production processes (Levinson & Torreira 2015 ). This system pushes processing to the limit. truism, ” namely that, Let ’ s now turn to the psycholinguistic “ given the short-term memory bottleneck and the problems of competition for lexical access, processing for both comprehension of and production must proceed in “ chunks ”– the “ increments ” Figure 1 (Levinson). A discontinuous noun phrase (NP) in Latin ’ ) short-term memory bot- 1956 s( incremental processing. Miller wrapped around verb and adverb. ) auditory loop with ’ s( 1987 tleneck is often married to Baddeley 37 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

38 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ processing preference fails to outlaw all of the discontinuous verbose. Compressing information is a useful strategy for a structures in language, and where our push-down stack capacities speaker who faces memory constraints, but it is useful only to actually reside. the extent that the listener can still recover the intended Finally, C&C ’ s Now-or-Never bottleneck theory suggests that – minimiz- meaning. This view of language use as rational action details of an utterance cannot be retained in memory when follow- ing costs while maximizing information transfer is supported – – ing material overwrites it only the gist of what was said may by a rich body of theoretical and empirical work (Clark 1996 ; other-initiated repair “ persist. But the practice of suggests other- ” Frank & Goodman 2012 ; Goodman & Stuhlmüller 2013 ; in the following excerpt Sig repeats verbatim what he wise – ). Grice 1975 shoot even though three con- earlier said, just with extra stress on Although C&C argue that compression is the key factor in the , p. 109): 2007 versational turns intervene (Schegloff emergence of structure, evidence at both the acquisition and evo- lution timescales suggests language is the product of the interac- tion between both compression and informativity. At the timescale of acquisition, experimental work suggests the resolu- tion of reference in word learning is the product of communica- tive inferences (e.g., Baldwin ; 2009 ; Frank et al. 1993 ; 1991 Frank & Goodman 2014 ). And at the timescale of language evo- lution, a growing body of work suggests that the forms of words are also equilibria between these two pressures (Lewis & Frank 2012 ; Piantadosi et al. ). ; Mahowald et al. 2014 1936 ; Zipf 2011 ) found that words that are For example, Piantadosi et al. ( 2011 less predictable in their linguistic context are longer, suggesting The fact that we can rerun the phonetics (? = rising intonation, that speakers may lengthen words that are surprising in order to underlining = stress) of utterances shows the existence of other increase time for the listener to process. buffers that escape the proposed bottleneck. In addition to linguistic form, these pressures in fl uence the mapping between form and meaning. An equilibrium in the struc- ture of form-meaning mappings is one in which the listener is able to recover the intended meaning, but the speaker does not exert ad- ditional effort over-describing. A range of semantic domains re fl ect Linguistic structure emerges through the 2009 ; Kemp & Regier 2012 ; this equilibrium (Baddeley & Attewell interaction of memory constraints and Regier et al. 2007 ), and ambiguity, more generally, has been argued ). Am- to re fl ect this communicative tradeoff (Piantadosi et al. 2012 communicative pressures biguity is an equilibrium in cases where the listener can recover the doi:10.1017/S0140525X15000874, e82 intended meaning from the communicative context. One example at least one “ which has a literal meaning of ” some, “ is the word Molly L. Lewis and Michael C. Frank and possibly all ” but can be strengthened pragmatically to mean Department of Psychology, Stanford University, Stanford, CA 94305. 1972 “ at least one but not all ” (Horn ). Because its meaning is deter- [email protected] [email protected] mined through communicative context, its literal semantics can http://web.stanford.edu/~mll/ ” overlap those of its competitor, “ all. http://web.stanford.edu/~mcfrank/ The key challenge associated with this broader proposal that – uence linguistic structure is providing – fl processing pressures in If memory constraints were the only limitation on language Abstract: direct evidence for a causal link between these two timescales. processing, the best possible language would be one with only one fi This problem is dif cult to study in the laboratory because the pro- word. But to explain the rich structure of language, we need to posit a posed mechanism takes place over a long timescale and over mul- second constraint: the pressure to communicate informatively. Many tiple individual speakers. Furthermore, the presence of a causal aspects of linguistic structure can be accounted for by appealing to link does not entail that phenomena in processing are directly re- equilibria that result from these two pressures. fl rather, entirely new properties may – ected in linguistic structure emerge at higher levels of abstraction from the interactions of Christiansen & Chater (C&C) claim that memory limitations force ). It may, there- 1972 more fundamental phenomena (Anderson the cognitive system to process the transient linguistic signal by fore, not be possible to directly extrapolate from brief communica- fl compressing it. They suggest that this processing pressure in u- tive interactions observed in the laboratory to properties of ences the ultimate structure of language over the course of linguistic structure. language evolution. Taken at face value, this proposal would Several recent pieces of experimental data begin to address lead to a degenerate linguistic structure, however. If memory con- this challenge, however. In one study, Fedzechkina et al. straints were the only pressure on language, languages would cial language that arbi- ( 2012 ) asked speakers to learn an arti fi – evolve to compress meaning into the simplest possible form a trarily distinguished nouns through case-marking. Over learning single word (Horn 1984 ). But, as the authors point out, natural sessions, speakers developed a system for marking in contexts languages are not of this sort; they are richly structured into where meanings were least predictable ected in fl a pattern re – lexical and phrasal units of varying length. To account for this var- the case-marking systems of natural language. Other work has iability, we highlight the need to consider the communicative used a similar paradigm to reveal the emergence of typologically function of language. Communication serves as an important prevalent patterns in the domains of word order (Culbertson counter-pressure against compression in language processing, et al. ) and phonology 2015 ; Culbertson & Newport 2012 not just as a caveat. (Wilson 2008). Interlocutors use language with the goal of communicating in- A particularly promising approach for exploring this causal link formation, but they also aim to minimize energetic cost (Zipf ; Reali & Grif- is through transmission chains (Kirby et al. 2008 ). For the speaker, this goal implies minimizing production 1949 ths fi 2009 ). In a transmission chain, a participant learns and cost, and for the listener it implies minimizing comprehension recalls a language, and then the recalled language becomes the cost. Importantly, these processing constraints have opposing learning input for a new learner. By iterating over learners, we cost functions (Horn ; Zipf 1949 1984 ). For a producer, process- can observe how languages change across transmission of learn- ing is minimized when a form is easy to say, and thus highly ers over the course of language evolution. Kirby et al. ( 2015 ) compressible. For the comprehender, however, processing is have compared the emergence of linguistic structure in a minimized when a form is minimally ambiguous and thus BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 38 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

39 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language regime that iterates over different partners of learners versus a assumed; they must be understood in terms of trade-offs among regime where the same two partners repeatedly interact with ’ wings evolved under aerody- selective pressures. Clearly, birds fi nd that linguistic structure emerges only by each other. They namic constraints rather than vice versa. However, biological iterating over different partners, demonstrating the unique con- traits such as memory are not exempt from evolving. In proposing tribution of cross-generational learning to the emergence of a bottleneck to which everything else in the system must adapt structure. Others have begun to use this paradigm to link the in- xed and independent fi while the bottleneck itself remains teraction of processing pressures to the emergence of communi- in the target article), C&C implicitly assume that it Fig. 1 ( ; cative regularities in semantic structure (Carstensen et al. 2015 cannot evolve. Lewis & Frank 2015 ). To justify this assumption, C&C should have offered evidence In sum, the consequences of memory constraints are likely a of stabilizing selection pressures that act against genetic variants critical factor in shaping language structure. But an additional im- coding for a broader or narrower bottleneck, and thereby affecting portant constraint is the pressure to communicate informatively, fi cognition and, ultimately, tness. Alternatively, they might have and this constraint should not be overlooked in accounting for lin- assumed that the biological mechanisms underlying the memory guistic structure. bottleneck cannot be genetically variable – an odd assumption, which runs counter to substantial evidence in humans of (a) a range of verbal memory decay rates (Mueller & Krawitz 2009 ), in- cluding in particular the longer verbal working memory span in in- 2010 s (Cui et al. ’ dividuals with Asperger ); (b) heritable variation The bottleneck may be the solution, not the in language and in word memory (Stromswold 2001; van Soelen problem ) and in working memory (Blokland et al. 2011 et al. ; 2011 ); and (c) variation in perceptual memory Vogler et al. 2014 doi:10.1017/S0140525X15000886, e83 ; Mery et al. 2015 2007 across species (Lind et al. ). Given that her- b c a itable variation in a trait means that it can respond to selection (e. Arnon Lotem, Oren Kolodny, Joseph Y. Halpern, evolve, and g., Falconer 1981 ), it is likely that the bottleneck can d e and Shimon Edelman Luca Onnis, that it is what it is because individuals with longer or shorter verbal a Department of Zoology, Tel Aviv University, Tel Aviv 6997801, Israel; 1 tness. fi working memory had lower biological b c Department Department of Biology, Stanford University, Stanford, CA 94305; If language is supported by domain-general mechanisms, d Division of of Computer Science, Cornell University, Ithaca, NY 14853; If the emer- verbal memory is even less immune to evolution. Linguistics and Multilingual Studies, Nanyang Technological University, e gence of language constitutes a recent and radical departure Department of Psychology, Cornell University, Ithaca, NY Singapore 637332; from other cognitive phenomena, it is in principle possible that 14853. working memory evolved and stabilized prior to and separately [email protected] [email protected] “ from the increasingly abstract levels of linguistic representation ” [email protected] [email protected] [email protected] (sect. 3.2, para. 2) posited by C&C. However, there are good ar- As a highly consequential biological trait, a memory Abstract: guments in support of a domain-general view of language (e.g., ” bottleneck “ cannot escape selection pressures. It must therefore co- 2010 Chater & Christiansen ). In particular, linguistic representa- evolve with other cognitive mechanisms rather than act as an tions and processes are hardly as modular as C&C assume (Onnis independent constraint. Recent theory and an implemented model of & Spivey 2012 ). Furthermore, theories of neural reuse (Anderson language acquisition suggest that a limit on working memory may evolve ) point to the massive redeployment of existing mechanisms 2010 to help learning. Furthermore, it need not hamper the use of language for new functions, resulting in brain regions coming to be involved for communication. in diverse cognitive functions. If circuits that support language continue contributing to nonlinguistic functions (including The target article by Christiansen & Chater (C&C) makes many working memory), a memory bottleneck is not a prior and inde- useful and valid observations about language that we happily pendent constraint on language, but rather a trait that continues s major points appear in our ’ endorse. Indeed, several of C&C to evolve under multiple selective pressures, which include own papers, including the following: (a) the inability of non- language. analog “ approaches to language to compete with ” chunked, The bottleneck may be the solution, not the problem. As we “ combinatorics over chunks (Edelman, ); (b) the 2008b digital ” ; 2008 ; Lotem & Halpern 2010 have suggested (Goldstein et al. centrality of chunking to modeling incremental, memory-con- ), a limited working memory may be an ad- 2008 2012 ; Onnis et al. strained language acquisition and generation (Goldstein et al. aptation for coping with the computational challenges involved in ; Kolodny et al. 2010 ) and the possible evolutionary roots 2015b segmentation and network construction. (Importantly, regardless 2014 of these features of language (Kolodny et al. ; 2015a ; fi of whether this speci c hypothesis is correct, entertaining such ); (c) the realization that language experi- 2012 Lotem & Halpern hypotheses is the only way of distinguishing a function from a ; cf. Edelman ence has the form of a graph (Solan et al. 2005 constraint; cf. Stephens & Krebs 1986 , Ch. 10.) A recently imple- 2008a , p. 274), corresponding to C&C ’ s “ forest tracks ” analogy; mented model that includes this hypothesis has been tested on and (d) a proposed set of general principles for language acquisi- tasks involving language, birdsong, and foraging (Kolodny et al. ), one of which is essen- 2010 tion and processing (Goldstein et al. ; Menyhart et al. 2015a 2015 2014 ; ; 2015b ) The model includes ’ s “ Now-or-Never bottleneck. ” However, tially identical to C&C a time window during which natural and meaningful patterns our theory is critically different in its causality structure. Rather are likely to recur and thus to pass a test for statistical signi fi cance, xed constraint to fi than assuming that the memory limit is a while spurious patterns decay and are forgotten. We stress that which all other traits must adapt, we view it as an adaptation rather than acting as a constraint, the duration of the window that evolved to cope with computational challenges. Doing so uencing the distribution must co-evolve with the mechanisms in fl brings theory in line with standard practice in evolutionary of data so as to increase the effectiveness of memory representa- fi ndings, and raises nu- biology, is more consistent with research 2012 ). tions (Lotem & Halpern merous important research issues. We expand on these points in We do agree with C&C regarding some of the consequences of the following paragraphs. the memory bottleneck, such as the need for online incremental constraint. No biological trait can be simply assumed as a “ ” construction of hierarchical representation. Indeed, our model ef- Viewing the Now-or-Never bottleneck as an evolutionary con- (Kolodny ” Chunk-and-Pass fectively implements what C&C call “ is unwar- – ’ s central idea straint to which language adapts – C&C 2 We believe, however, that the ultimate constraint 2015b et al. ). as ranted. In evolutionary theory, biological constraints – on learning structure (such as that of language) in time and opposed to constraints imposed by physics and chemistry, which space is not the memory bottleneck in itself, but rather the – are not subject to biological evolution cannot simply be 39 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

40 Commentary/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language computational challenges of chunking the data and of building some control over our own memory loads during language pro- hierarchies. ’ t always so severe in comprehen- duction. The bottleneck also isn Biological communication is about affecting behavior, not t as uniformly eager as C&C portray. ’ sion, and chunking isn pumping bits. fi Our nal point focuses on the communicative func- Downstream linguistic input affects interpretation of earlier mate- tion of language. Viewing a memory window as a communication 1974 ; Warren & Sherman 1994 rial (MacDonald t ’ ), which shouldn “ bottleneck ” suggests that massive amounts of information must occur if chunking greedily passes off the early information to the ow through the channel in question. However, the real function fl next level. Variability in the tolerance of memory loads suggests uence the rich network of connotations and fl of a message is to in that the Now-or-Never bottleneck is really more of a wide- ’ interconnections already present in the listener s brain (cf. mouth jar, or perhaps more of an adjustable drawstring closure, Edelman 2015 , sect. 2.3). Communication is about generating and the consequences for the nature of language will therefore ; Green & Marler 1970 adaptive behavioral changes (Burghardt need adjustment as well. 1979 ) – the listener gleans from it cues relevant to decision- Similarly, C&C view the lossy nature of Chunk-and-Pass pro- making. For this, a signal must be informative and reliable in cessing as essential to explaining the nature of language process- 1993 the given context (Leger ); the amount of information is ing, but chunking is neither as lossy nor as bottom-up as they not the main issue (except as a signal of quality, as in complex suggest. C&C argue that in speech perception, sounds are ). This implies that evolu- courtship songs; Lachmann et al. 2001 rapidly chunked into words, leaving the sounds behind, so that fi t into the infor- tionary selection in language is for how messages the just-perceived sounds do not interfere with upcoming ones. mation already represented by their recipient; a bottleneck may These claims create several puzzles: First, this very bottom-up fi cant constraints here. not impose signi characterization of chunking is inconsistent with evidence for uences in perception. C&C fl top-down in s focus on using ’ NOTES context only for predicting the future is misplaced, because top- 1. If verbal memory indeed evolves, language is the niche in which it down processes also allow higher-level information to elaborate does so. The target article seems to gloss over the intimate connection earlier percepts. Examples include the word superiority effect between cultural evolution and niche construction (Odling-Smee et al. (Cattell ) and the phoneme restoration effect (Warren 1886 2003 ). In focusing on how linguistic patterns, which can be processed “ ), in which word representations affect perception of their 1970 (sect. 5, para. 3), through that bottleneck, will be strongly selected ” s ’ parts (letters, phonemes). If chunking is so eager and lossy, it C&C ignore the possibility of there being also selection for individuals not clear how higher-level word information could re ne the fi who can better process linguistic patterns. lower-level percepts that should have already been discarded by it is entirely “ As C&C note, correctly, regarding Chunk-and-Pass, 2. possible that linguistic input can simultaneously, and perhaps redundantly, lossy chunking. Second, if the memory bottleneck is so narrow, (sect. 3.2, para. 4). This point suggests ” be chunked in more than one way how is there room for interference, which by de nition depends fi that chunking on its own, especially when carried out recursively/hierarchi- on several elements being in memory at the same time? There cally, is likely to severely exacerbate the combinatorial problem faced by are numerous examples of semantic and sound overlap creating the learner, rather than resolve the bottleneck issue. memory interference over fairly long distances during both com- ; Van Dyke & Johns 2011 prehension (Acheson & MacDonald ), and production (Hsiao et al. ; Smith & Wheeldon 2014 2012 2004 t be as strict at ’ ), again suggesting that the bottleneck can C&C describe. Third, if lossy chunking is the solution to Memory limitations and chunking are variable memory interference, why is it so easy to nd interference fi and cannot explain language structure effects? The existence of memory interference suggests that chunking may not always be so lossy after all. In at least some cir- doi:10.1017/S0140525X15000898, e84 cumstances, there appears to be real value in non-lossy process- ing, such as the Levy et al. ( 2009 ) example that C&C note as Maryellen C. MacDonald well as use of prosodic information over long distances (Morrill Department of Psychology, University of Wisconsin-Madison, Madison, ). These and other examples call into question the et al. 2014 WI 53706. essence of lossy, greedy, bottom-up chunking as a design [email protected] feature for language. http://lcnl.wisc.edu/people/mcm/ C&C note some variability in memory limits and chunking, but Abstract: Both the Now-or-Never bottleneck and the chunking they do not discuss the consequences of variability for their mechanisms hypothesized to cope with it are more variable than account. They illustrate their ideas with an individual identi fi ed Christiansen & Chater (C&C) suggest. These constructs are, therefore, as SF, who can recall vast strings of meaningless digits by chunk- ’ s claims for the nature of language. Key too weak to support C&C ing them into meaningful units such as dates, and using the aspects of the hierarchical nature of language instead arise from the chunks to guide production. The analogy to language is unfortu- nature of sequencing of subgoals during utterance planning in language nate, because SF s chunking strategies are both conscious and id- ’ production. ’ chunking iosyncratic, inviting the inference that language users units are similarly variable. In sum, if memory limitations and Christiansen & Chater (C&C) overstate both the limitations of the the lossy and eager characteristics of chunking have notable ex- Now-or-Never bottleneck and the lossy character of chunking, ceptions and are subject to individual differences, then it is dif fi - and they are overly optimistic that memory limitations can cult to make them the foundation of claims for the nature of explain the nature of language. C&C correctly note that human language. memory limitations during planning for language production More seriously, no matter how we conceive the memory bot- (where planning of the utterance incremental planning promote tleneck, it can explain neither the existence of a hierarchy in lan- and its execution of action are interleaved), but the memory lim- guage representations, nor why the hierarchy has certain levels of radical incre- itations are not as strict as they suggest. Whereas “ representation across individuals and not others. Consider a non- mentality ”– very minimal advance planning owing to a severe linguistic analogy: the visual processes necessary to recognize a once had its proponents in language produc- memory bottleneck – ’ cup. Let s assume that these processes, also constrained by tion, recent studies argue for looser constraints, with more toler- memory bottlenecks, have multiple stages of chunking and ance for higher memory loads and more extensive advance passing from low-level visual processing up to object recognition. ). The extent of advance planning planning (Ferreira & Swets 2002 From these perceptual stages, however, we would not want to may even be under some degree of implicit strategic control (Fer- conclude that the percept itself, the cup, has a hierarchical struc- 2002 ; Wagner et al. 2010 ), suggesting that, rather reira & Swets ture. Similarly, the memory-constrained chunking and passing than the memory bottleneck controlling us, we instead can exert BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 40 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

41 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ for language perception, even if it works exactly as C&C morphemes. For example, infants remain quite sensitive to pho- – its hierarchical language describe, does not give the percept – netic distributions until well into the fi rst year; at 6 to 8 months, structure. just 2 3 minutes of focused exposure to new distributions may – Rather than trying to wring structure out of memory limitations, phonetic categories ’ be enough to temporarily rearrange infants I suggest that key aspects of hierarchical structure emerge from ). And yet, by this same age, infants typically rec- (Maye et al. 2002 2013 ). Like all how goals are realized in action (MacDonald ognize at least a handful of words, including “ ” and mommy actions, language production must unfold over time, meaning daddy “ 1999 ” ), their own name (Bortfeld (Tincoff & Jusczyk that the various subgoals of the action must be planned and ), and several body part terms, 1995 ; Mandel et al. 2005 et al. ordered in some way (Lashley 1951 ). For both nonlinguistic and “ such as ). Does CPP 2012 (Tincoff & Jusczyk ” hand “ and ” feet linguistic actions, the nature of the hierarchy is constrained by somehow build linguistic structure even without clear basic the need to make decisions for some subgoals in order to plan units over which to operate (in contradiction to hypotheses others. To reach for a cup, the choice of which hand to use deter- C&C articulate on this matter; e.g., sect. 3.2, para. 1)? Alternative- mines and must precede planning the reach. Similarly, a speaker ly, does CPP operate on units only as they reach some criterion of ?) before programming their artic- must choose words ( cup or mug ed phonemes fi availability, so that words composed of early-identi ulation, naturally creating a hierarchy of lexical and sublexical would potentially be available for chunking, whereas words with plans. Although language and nonlinguistic action are not identi- fi more dif cult-to-identify phonemes are not? Or do processes cal, important aspects of the hierarchical nature of language other than Chunk-and-Pass need to be brought in to account for the earliest phases of language acquisition? emerges from the staging of language production planning pro- The second working edge we identify relates to stability and cesses over time. Furthermore, although action plans are held plasticity of representations. C&C note that stability and plastic- in memory and are affected by the nature of that memory, ity trade off: Learning depends on representations being memory limitations themselves cannot bear the explanatory updated to incorporate new content, but at the same time, burden that C&C ascribe to them. some degree of stability is needed to avoid new information over- whelming previously acquired information. They argue that stability is a natural product of the compression that occurs during Chunk-and-Pass processing. The processing of linguistic Exploring some edges: Chunk-and-Pass lossy ”– the only features retained are those that are content is “ processing at the very beginning, across s current model of the language, making ’ captured by a learner representations, and on to action cult to dramatically alter that model since the features nec- it dif fi essary to do so are likely the very ones lost in compression. This doi:10.1017/S0140525X15000904, e85 seems persuasive on the face of it, but leaves unclear how CPP can account for a different stability/plasticity issue: namely, the Rose Maier and Dare Baldwin observation that representations of different types display distinct Department of Psychology, University of Oregon, Eugene, OR 97403. les. In language, acquired representations stability/plasticity pro fi [email protected] [email protected] of some kinds (e.g., phonetic and syntactic representations) display a strong propensity to stabilize and become markedly re- “ working edges We identify three Abstract: ” for fruitful elaboration of the ; Kuhl 2004 ; 1989 sistant to change (e.g., Johnson & Newport Chunk-and-Pass proposal: (a) accounting for the earliest phases of Lenneberg 1964 ; Yoshida et al. 2010 ), whereas a variety of evi- language acquisition, (b) explaining diversity in the stability and plasticity dence suggests that other representational types (e.g., open- of different representational types, and (c) propelling investigation of class lexical items) seem to display considerably more plasticity action processing. (e.g., Curtiss 1977 2000 ; Talmy ; Weber-Fox & 1990 ; Newport 1996 ). In question is whether these different plasticity Neville Experience is dynamic and ephemeral, yet humans routinely gener- fi les across representational types arise naturally from CPP. pro ate abstract representations of their individualized experience that Are there differences in the information to be encoded across simultaneously achieve enough stability, plasticity, and various types of representations such that the model would interindividual parity to radically facilitate social and cognitive func- predict an emphasis on stability in some cases versus ongoing s (C&C s) ambitious Chunk-and- ’ tioning. Christiansen & Chater ’ plasticity in others? Alternatively, will it be necessary to look to Pass processing (CPP) proposal offers hope of a comprehensive mechanisms beyond CPP to account for such differences, such and elegant account of how this can be. CPP has impressive explan- as diverse neural commitment timetables? atory breadth, neatly tying language acquisition to language change focuses on action processing as a par- “ working edge Our third ” and language evolution, while also offering promise of a uni ed fi ticularly fruitful target for broadening the scope of CPP-related in- s ’ account of perception and cognition more generally. By C&C vestigation. Intuitively, language and action processing seem closely own acknowledgment, however, many facets of the CPP account linked. Language can be regarded as one form of action, after all, will ” working edges “ cry out for elaboration. In our view, three and both language and action are subject to the Now-or-Never bot- be (a) accounting for the earliest inception of language acquisition, tleneck, making them amenable to a CPP account, as C&C them- fi (b) explaining stability and plasticity differences in learning pro les selves note. Strikingly, however, investigation regarding action across knowledge systems (within language as well as across processing lags considerably behind language. One glaring domains), and (c) elaborating CPP on the action processing front. example is the lack of a generally accepted inventory of basic fi Regarding the rst issue, C&C provide a workable framework actions, comparable to inventories of phonemes or syllables in lan- for describing language acquisition once basic acoustic units guage (cf. interesting but small-scale efforts along these lines, such have been discovered (e.g., phonemes, syllables), but do not as , Gilbreth & Gilbreth therblig ). Another example concerns 1919 describe how utter novices initially break into the system. Of hierarchical structure, which seems to be a fundamental organiz- course, there is a sizable literature investigating how infants initi- ing principle of both action and linguistic representations. To illus- ate analysis of streaming speech (e.g., Vouloumanos & Werker trate in the action context, observers typically note that an action 2012 ). One litmus test of the viability of 2007 ; Werker et al. such as getting a cup of coffee comprises embedded subgoals, CPP will be its ability to account for the phenomena documented such as getting a mug from a cupboard, placing it on a counter, fi ed Chunk-and-Pass framework. in this literature within a uni pouring coffee into the mug, and so on. At the same time, relevant ndings in- fi Among the complexities to be confronted here include ned as levels of that hierarchy seem not to be as crisp or well-de fi ’ identi fi cation/construction of basic acoustic dicating that infants they are in language. A “ learning to process ” account may provide units may still be taking place at the same time that they are be- welcome guidance for continuing attempts to gain purchase on the ginning to chunk longer strings of sounds together into words or BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 41 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

42 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ representation of structure in action, and perhaps also will ulti- (1) Jim is allegedly probably unable to frequently deliver mately help to explain cross-domain differences in representation- assignments on time. al structure. All in all, as an explicitly domain-general approach, (2) *Jim is frequently unable to probably deliver allegedly CPP holds promise for accelerating understanding in the action his assignments on time. domain in a way that promotes interdisciplinary convergence with theorizing about language. There is a large literature on many languages suggesting that this ordering is universal. Explanations based on statistical regularity, general cognition, pure logic, or social conventions appear utterly implausible. (c) Conceptually possible but linguistically impossible word or- Many important language universals are not dering. reducible to processing or cognition “ [M]any potential orders are never found ... which poses a puzzle for any culturally based account (Cinque 2013 , p. 17). Consider, ” doi:10.1017/S0140525X15000722, e86 for example, the relative ordering of the categories demonstrative, numeral, adjective, and noun, the topic of Greenberg ’ s Universal David P. Medeiros, Massimo Piattelli-Palmarini, and Thomas 1963 ; see also Hawkins 1983 ; Dryer 20 (Greenberg ; 2009 ; 1992 G. Bever ). All descriptions agree that some orders Cinque 2013 ; 2005 ; 1996 Department of Linguistics, University of Arizona, Tucson, AZ 85721-0028. are never found: Whereas (3) and (4) are common orders, no lan- [email protected] [email protected] guage is reported to have as a basic noun phrase order (5) *Num [email protected] Adj Dem N or (6) *Adj Dem N Num. http://dingo.sbs.arizona.edu/~massimo/ http://dingo.sbs.arizona.edu/~tgb/ (3) These three blind mice Dem Num Adj N (4) Mice blind three these N Adj Num Dem Christiansen & Chater (C&C) ignore the many linguistic Abstract: universals that cannot be reduced to processing or cognitive constraints, (5) *Three blind these mice *Num Adj Dem N some of which we present. Their claim that grammar is merely acquired (6) *Blind these mice three *Adj Dem N Num language processing skill cannot account for such universals. Their claim that all other universal properties are historically and culturally based is a nonsequitur about language evolution, lacking data. The observed restrictions on nominal ordering are particularly In this latest attempt to reduce language to other mental systems, interesting in light of experimental work by Culbertson et al. Christiansen & Chater (C&C) present two main points, each with y, fl ). Brie 2012 ; Culbertson et al. 2014 (e.g., Culbertson & Adger two subpoints: (1a) Working memory constraints account for they cial grammar fi nd their adult subjects, in a series of arti fi many features of sentence processing during comprehension; learning experiments, to reproduce typological word ordering pat- (1b) these features in turn can account for a variety of universal terns, apparently drawing on innate cognitive biases. This is a properties of language. (2a) Thus, learning a language is actually strong piece of evidence that the distribution of word order pat- learning a set of rapidly deployable recoding templates and pro- terns is not historical bricolage; subjects discriminate novel typo- cesses; (2b) what appear to be other kinds of psychologically or logically favored patterns from disfavored patterns, with no biologically determined structures of language are actually cultur- obvious basis in their native language. Grammar learning is “ process and pattern merely ” ally and historically determined. Such attempts have a long learning. C&C argue that in learning to comprehend (and, we history, with a considerable modern literature on the issue presume, talk), the child perforce must be learning a range of stat- ; Hawkins started in the 1970s (e.g., Bates & MacWhinney 1982 istically valid local patterns so that the system can proceed rapidly. 1983 ; notable recent examples ; Rumelhart & McClelland 1988 The heart of the idea is that learning patterns from repeated stim- include Arbib ; Christiansen & Chater 2007 ; Bybee 2012 ; 2008 ulus similarities is endemic to many aspects of maturation, hence 2005 Perfors et al. 2011 ; Reali & Christiansen ; Rizzolatti & not speci fi c to language. In this, they agree with a variety of 2003 ; 2006 ; Tomasello Arbib 1998 . All of these attempts have ; Townsend & Bever 1970 learned pattern accounts (e.g., Bever been quickly and persuasively countered: Berwick et al. 2013 ; 2001 ). However, there are severe empirical problems. Their Crain et al. 2009 ; Gualmini & Crain 2005 ; Kam & Fodor 2013 ; account says nothing, for instance, about which chunks may Piattelli-Palmarini et al. 2002 ; Wexler 2008 ; Pietroski 2008 .) Irreducible language universals. Many linguistic systems are relate to each other; as far as C&C are concerned, anything goes. irreducible to processing or cognitive explanations. We But there is considerable evidence for richly nuanced, universal highlight several that seem particularly challenging to C&C s ’ principles governing many kinds of grammatical relations (subja- views. cency, case, theta relations, etc.). It also makes long-distance de- ; (a) The Verb+Object Constraint (VOC) (Baker 2008 ). In 2013 fi pendencies mysterious. If learners look rst for local associations our conceptualization of the world, actions are more intimately con- in blindly segmenting their language, subject to a crippling limit nected with their agent than with the object, but not syntactically so. on short-term memory, it is unclear how long-distance dependen- Verb+Complement forms a syntactic constituent (a chunk) but cies could be stable in any lineage, much less universal. Subject+Verb does not. This abstract structural relationship explains ” of apparent linguistic structures (i.e., those that are The “ rest the fact that in all languages of the world idioms are formed by a not explained by immediate processing or by cognitive or statisti- , kick the bucket verb and its object (In English, for example, sell cal facts) are culturally and historically determined. the farm , hits the fan , etc.). This fact is particularly surprising for We do not belabor a response to this point because it is irrele- VSO languages, on the perspective: Surface adja- ” Chunk-and-Pass “ vant to the major substantive claims by C&C, and they offer very cency ought to lead to V+S idioms being more readily chunked and little independent or new evidence for it. It is a claim about how ... O idioms are, in simple learned in such languages, while V s languages that cannot ’ the structures evolved that we see in today clauses, discontinuous. be immediately accounted for in their interpretation of processing (b) There is a universal hierarchy of syntactic and semantic and cognitive constraints. dominance relations (Belletti 2004 ; Cinque 1999 ; 2013 ): for To us it seems like a very far-fetched claim about how things ) > necessity probably ) > epistemic ( allegedly example, evidential ( worked in the forest primeval. We do know from contemporary brie still ) > durative ( y) fl (necessarily > obligation ) > continuative ( facts that (most) languages live in families suggesting some histor- ) > completive obligatorily ). (The > indicates domi- (partially ( ical devolution; and there are clusters of shared properties among cations of a sentence, a tran- fi nance in the ordering of modal modi neighboring languages that do not share families, also suggesting sitive relation.) For example, in English we have: historical in fl uences. But these facts presuppose the existence of BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 42 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

43 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ fl uenced fl edged languages, ready to differentiate and to be in fully allowing only possessor phrases and determiners to intervene – as by neighbors. but not a relative clause. The cut-off point for French is – in (1a,b) still lower: A determiner can intervene, as in (1a), but not a possessor or a relative clause. Most restrictive of all is Thai, in which even de- terminers cannot intervene. The processing bottleneck, it seems, is Processing cost and its consequences not absolute; it is manifested in different ways in different languages. Another example of systematic variation in processing effects doi:10.1017/S0140525X15000916, e87 involves the notorious constraint on want to contraction illustrated below. Grady William O ’ Department of Linguistics, University of Hawaii at Manoa, Honolulu, HI 96822. (2) a. Contraction allowed: [email protected] Ask whether they want to stay there. (cf. They want to stay there.) I focus on two challenges that processing-based theories of Abstract: wanna language must confront: the need to explain why language has the particular properties that it does, and the need to explain why b. Contraction prohibited: processing pressures are manifested in the particular way that they are. I Ask who they want to stay there. (cf., They want Mary discuss these matters with reference to two illustrative phenomena: to stay there.) proximity effects in word order and a constraint on contraction. *wanna s (C&C s) proposal has much to recom- ’ Christiansen & Chater ’ Jaeggli ( 1980 ) proposed that contraction is blocked in (2b) by mend it: Processing resources are severely limited, and the presence of an invisible Case-marked trace between want Chunk-and-Pass is a promising strategy for accommodating – to and a classic example of grammatical analysis. In contrast, those limitations. The hope and promise of this type of work is ) outlined a processing-based alternative that O ’ Grady ( 2005 that in addition to shedding light on the nature of incremental pro- turns on the interplay between two pressures: (a) for reasons cessing, it can help explain speci c properties of linguistic systems. fi ller-gap dependencies are best re- fi related to working memory, C&C focus their attention on very general features of language, solved at the fi rst opportunity; (b) for articulatory reasons, contrac- such as duality of patterning, the bounded nature of linguistic and tion is most natural when combine with each other to want units, and the existence of multiple levels of representation. But without delay. Matters are straightforward in (2a), where the artic- many properties at a ner level of granularity also call for atten- fi to , producing a want to ulatory system moves seamlessly from fi tion. Why, for example, do we nd certain systems of agreement contracted pronunciation. and case marking, but not others? Why are some languages erga- want-to (3) Ask whether they stay there. tive? Why are fi ller-gap dependencies subject to certain types of ↓ locality constraints? Traditionally, the answers to such questions wanna invoke principles of grammar, not processing. However, a wave ; of recent research by C&C and others (e.g., Hawkins 2004 The situation is very different in (3) than in (2b), in which the ’ ) proposes a very different ap- 2015a ; 2013 2014 ;O ; Grady 2005 transition from is interrupted by the need to promptly to to want proach: Languages are the way they are because of their need wh ller-gap dependency by associating the resolve the fi word with to adapt to processing pressures. her ). The re- to stay We want want , which is transitive here (cf. At least two challenges immediately arise. On the one hand, it is sulting delay, often accompanied by prosodic re fl exes such as necessary to demonstrate that processing pressures can help ), compromises the natu- lengthening of want (Warren et al. 2003 resolve the baf ing puzzles that spring up everywhere in the pho- fl ralness of contraction. nology, morphology, and syntax of natural languages. On the other hand, it is necessary to develop a theory to explain why the effects (4) Ask who they want # to stay there. of the processing bottleneck are felt when and where they are. Two examples help illustrate this point. As C&C note (sect. 6.1.2), items that enter into a relationship Here too, though, there is evidently room for variation. Ito with each other should occur in close proximity, for obvious pro- 2005 ) reported that 5 of the 41 English speakers who she ( cessing reasons. But how close? In Thai, not even a determiner studied allowed in patterns like (2b). Crucially, however, wanna can intervene between a verb and the head of its direct object they also permitted contraction in the less-demanding (2a). The ). But the picture is complicated by (one says ” I read book that “ reverse is, of course, not true: Many speakers permit contraction data from other languages. fi in the easy pattern but not the dif cult one. In sum, case studies such as these help con rm that processing fi (1) a. A determiner intervenes: (English, French, Mandarin) s Now-or-Never bottleneck) shape the way lan- pressures (C&C ’ book] read [ that guage works, creating an explanatory narrative that is fundamen- b. A possessor NP intervenes: (English, Mandarin) tally different from traditional grammar-based accounts. At the s read [ a good friend ’ book] same time, we gain insight into the nature of processing itself, for which an intriguing story is beginning to emerge. Because pro- c. A relative clause intervenes (Mandarin): cessing cost can never be reduced to zero, there is no perfect lan- ] books that I just bought read [ guage and no single way to manage processing costs. What we fi nd instead is systematic variation in what languages (and speakers) [that I just bought] ) (compare English: read books tolerate, with a preference for less-costly options over more-de- manding alternatives. The end result is an array of effects in phe- 2004 Hawkins ( ,p.123ff)offersakeyinsight:Allotherthings nomena ranging from typological variation to developmental being equal, if a language permits a more costly implementation Grady ’ order (O ). 2015b ; 2013 of a particular relationship, it will also permit a less costly implemen- Processing cost offers an important idea on which to build. The llows a relative clause to appear tation. For example, Mandarin a next step requires further close-range attention to the details of a – between the verb and the head of its direct object, as in (1c) how languages work, how they differ from each other, and how costly option in terms of working memory; as predicted, however, fi elds of linguistics, they are acquired. Here, in the traditional data ex possessor phrase and a simple Mandarin also allows a less compl lie the clues needed to settle the disputes that de fi ne the contem- determiner to occur in that position. English sets the bar lower, porary study of language. BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 43 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

44 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ sentence, as well as the syntactic structure, are recovered as the Conceptual short-term memory (CSTM) sentence is processed. Words that do not fi t the syntax or supports core claims of Christiansen and 1993 ). Syn- meaning are systematically misperceived (Potter et al. Chater tactic and semantic choices are made online (Potter et al. ). 1998 Memory for the sentence may be reconstructed from meaning, doi:10.1017/S0140525X15000928, e88 ; rather than recalled word for word (Lombardi & Potter 1992 ). Because almost all of the sentenc- 1990 Potter & Lombardi ; 1998 Mary C. Potter es one normally encounters (and all of the experimental sentences) Department of Brain and Cognitive Sciences, Massachusetts Institute of include new combinations of ideas, structure-building is not simply Technology, Cambridge, MA 02139. amatteroflocatingapreviouslyencounteredpatterninlong-term [email protected] memory: It involves the creation of a new relationship among ex- http://mollylab-1.mit.edu/lab/ isting concepts. As with words, so with a new pictured scene: Not only must crit- Rapid serial visual presentation (RSVP) of words or pictured Abstract: fi ed, but also the relations ical objects and the setting be identi scenes provides evidence for a large-capacity conceptual short-term memory (CSTM) that momentarily provides rich associated material among them – the gist of the picture (e.g., Davenport & Potter 2009 1993 ; ; from long-term memory, permitting rapid chunking (Potter ). Associated long-term memory of visual scenes must be ac- 2004 ). In perception of scenes as well as language comprehension, we 2012 tivated to recognize that one is looking at a picnic, or a bride and make use of knowledge that brie fl y exceeds the supposed limits of groom, or a ball game. As C&C suggest, structure-building pre- working memory. sumably takes advantage of as much old structure as possible, using any preexisting associations and chunks of information to Christiansen & Chater (C&C) focus on cognitive limitations in bind elements. language understanding and production that force immediate de- 3. There is rapid forgetting of information that is not structured cisions at multiple levels. Our experiments using rapid serial visual Conceptual informa- or that is not selected for further processing. presentation (RSVP) of written words and of pictured scenes show tion is activated rapidly, but the initial activation is highly unstable that a large-capacity but short-lasting conceptual short-term and will be deactivated and forgotten within a few hundred millisec- memory (CSTM), consisting of associations from long-term onds if it is not incorporated into a structure, consistent with C&C s ’ memory, is retrieved in response to currently active stimuli and proposal. As a structure is built – for example, as a sentence is being ; thoughts (Potter 1993 2012 ). We “ understand ” when some struc- – the resulting interpretation can be held in parsed and interpreted tural connections are found between the current stimuli and memory and ultimately stabilized or consolidated in working or CSTM. In visual perception of scenes and objects, as well as in long-term memory as a unit, whereas only a small part of an un- language comprehension, we make quick use of knowledge that structured sequence such as a string of unrelated words or an inco- fl y exceeds the supposed limits of short-term memory. Consis- brie herent picture can be consolidated in the same time period. ’ s core ideas, rich but unselective associations arise tent with C&C Because similar principles seem to apply to language compre- quickly but last only long enough for selective pattern recogni- hension and to nonlinguistic visual understanding, I have pro- s terms. Irrelevant associations never ’ tion – chunking, in C&C posed that understanding in both cases is abstractly conceptual become conscious (or are immediately forgotten). rather than fundamentally language-based. For example, pictured Three interrelated characteristics of CSTM support key ideas in objects and their names give equivalent and equally rapid informa- s target article. Demos of some of these effects can be seen C&C ’ 1977 ; Potter et al. 1975 tion about meaning (Potter & Faulconer ). ). on Scholarpedia (Potter 2009 Other perceptual senses such as audition and touch also have 1. There is rapid access to conceptual (semantic) information rapid access to the same conceptual level. about a stimulus and its associations. Conceptual information If the CSTM hypothesis is correct, then the Now-or-Never bot- about a word or a picture is available within 100 – 300 ms, as tleneck occurs after a rich set of associations from long-term shown by experiments using semantic priming (Neely ), in- 1991 memory has enabled conceptual chunking of incoming linguistic 1984 ); eye tracking cluding masked priming (Forster & Davis or visual information. At that point, the information can be ) or looking at pictures (Loftus 1992 ; 1983 when reading (Rayner passed through the bottleneck to a more abstract level of discourse ); measurement of event-related potentials during reading 1983 or scene understanding. Moreover, the severe limitations of (Kutas & Hillyard 1996 ; Luck et al. ); and target detection 1980 working memory seen for arbitrary lists of letters, numbers, ; Sperling in RSVP with letters and digits (Chun & Potter 1995 or geometric fi gures are largely overcome when proactive interfer- ; 2008 ; Meng & Potter ), with pictures (Intraub 1971 et al. 1981 ence from reuse of a small set of stimuli is eliminated (Endress & 2010 Potter 1976 ; Potter et al. ), or with words (Davenport & Potter 2014a ). The desperate speed of processing noted by C&C ; Potter Potter 2005 ; Lawrence 1971b ; Meng & Potter 2011 is not due solely to the limitations of short-term memory, fi ned targets can be detected in a ). Conceptually de 2002 et al. fl ects the pressure to think, see, but more generally re – stream of nontargets presented at rates of 8 10 items per understand, and act as fast as possible, in order to survive in a pred- ), showing that categorical in- 2014 second or faster (Potter et al. atory world. formation about a written word or picture is activated and then selected extremely rapidly. The converging evidence shows that semantic or conceptual characteristics of a stimulus have an effect on performance as early as 100 ms after its onset. This time course is too rapid for slower cognitive processes, such as intentional Language acquisition is model-based rather encoding, deliberation, or serial comparison in working memory. than model-free 2. New structures can be discovered or built out of the momentarily activated conceptual information, in fl uenced by doi:10.1017/S0140525X1500093X, e89 the observer ’ s task or goal. Evidence for this claim comes from comparing responses to RSVP sentences, scrambled sentences, Felix Hao Wang and Toben H. Mintz and lists of unrelated words. It is possible to process the syntactic Department of Psychology University of Southern California, 3620 McClintock and conceptual structure in a sentence and, hence, subsequently Ave, Los Angeles, CA 90089-1061. to recall it, when reading at a rate such as 12 words per second [email protected] [email protected] 1980 1970 ; Potter 1984 ; 1993 ; Potter et al. (Forster ; 1986 ). In http://dornsife.usc.edu/tobenmintz contrast, when short lists of unrelated words are presented at that rate, only two or three words can be recalled (see also Law- Abstract: Christiansen & Chater (C&C) propose that learning language is rence ). For sentences, the meaning and plausibility of the 1971a learning to process language. However, we believe that the general- BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 44 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

45 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Commentary/ fi purpose prediction mechanism they propose is insuf cient to account for the syntactic structures in question (Kam et al. 2008 ). This does many phenomena in language acquisition. We argue from theoretical not rule out the possibility that a different model-free mecha- considerations and empirical evidence that many acquisition tasks are nism would succeed at learning the right generalizations, but model-based, and that different acquisition tasks require different, adopting the view that learning language is learning to process specialized models. language does not get around the fundamental challenges. We now turn to an example from our own work in cross-situa- Given the Chunk-and-Pass processing necessitated by the Now-or- tional word-learning, where model-based and model-free versions Never bottleneck, Christiansen & Chater (C&C) propose that 2007 of learning mechanisms can both work in principle (Yu et al. ). s con- ’ language. In C&C learning language is learning to process Cross-situational word learning refers to naturalistic situations ceptualization, the learning and prediction processes are general, where learners encounter words under referential ambiguity, and (henceforth, model-free ), and knowledge used in prediction learn the correct word-to-referent mappings via the accumulation arises gradually. In discussing the consequences of this scenario, of cross situational statistics (Yu & Smith 2007 , among others). The C&C impose a dichotomy between these prediction-based associative learning account for how cross-situational statistics are models that are the outcome of learning to process, and learning used proposes that learning is model-free, in that passive accumu- based on more specialized constraints on how linguistic informa- lation of the co-occurrence statistics between words and their pos- ” tion is processed (the “ child as linguist approach, henceforth, fi sible referents suf ces for learning word-referent mappings. In model-based ). In this commentary, we leave aside discussion of contrast, model-based word-learning accounts posit that, like a the Now-or-Never bottleneck per se and focus on C&C s claims ’ mini-linguist, learners have the overarching assumption that about its theoretical consequences for language acquisition. words are referential, and learners actively evaluate possible s perspective provides an interesting framework for ’ C&C ; Waxman & word-referent mappings (e.g, Trueswell et al. 2013 guiding research and developing theories. However, we argue Gelman ). Although computationally, both accounts are plau- 2009 cant constraints on the broader the- fi that it does not provide signi 2007 sible (Yu et al. ), we recently carried out an experiment eld is engaged: in particular, oretical debates with which the fi showing the importance of learners ’ knowledge that words are ref- debates about the nature of constraints on learning. Our argument erential – a model-based, top-down constraint (Wang & Mintz, is based on theoretical necessity and empirical evidence. Theoret- under revision). We created a cross-situational learning experiment ically, the model-free approach is destined to be misled by in which there was referential ambiguity within trials, but reliable cally, the general-purpose learn- surface-level information. Speci fi cross-situational statistical information as to the word-referent ed with respect to the level of anal- ing procedure is underspeci fi mappings. In two different conditions, we held word and referent ysis given different problems: Information for particular problems co-occurrence statistics constant but gave each group of partici- may exist at different levels, and using the wrong level may lead pants different instructions. Both groups were instructed to the learner astray. Empirically, when the model-based and perform a distractor task, and only one group was also told to model-free approaches are computationally equivalent, the learn word meanings. Only the latter group successfully learned model-free approach simply may not coincide with human perfor- the mappings, even though both groups were exposed to the mance. To support these claims we cite two cases: one from same word-to-referent co-occurrence patterns. Thus, although a syntax, and another from word learning. model-free learner could succeed in the task, human learners re- Many arguments for model-based learning come from phe- quired the notion that words refer for word learning. We take nomena that require a speci c level of analysis. An oft-cited fi this as evidence that model-based hypothesis testing is required example is the constraint on structure-dependence, which speci- for word learning empirically, even though the model-free fi es that grammatical operations apply to abstract phrasal struc- version could have worked in principle. tures, not linear sequences. It accounts for the fact that the yes/ In sum, although the Now-or-Never bottleneck presents inter- no question in 1(b), following, is the correct form that is related esting challenges for theories of language acquisition, the per- to the declarative 1(a), but question in 1(c) is not. spective C&C espouse does not solve problems that model- based approaches do, and empirically, model-free mechanisms 1. a. The girl who is smiling is happy. do not apply to certain learning situations. Thus, casting acquisi- b. Is the girl who is smiling happy? tion as learning to process across levels of linguistic abstraction c. *Is the girl who smiling is happy? does not avoid the theoretical controversies and debates that inhabit the fi eld. It simply shifts the debate from the nature of is is moved to the cially on which fi The distinction hinges super the constraints on linguistic knowledge acquisition to the beginning of the sentence in the question. The grammatical princi- We do not ” “ learning to process. nature of the constraints on ple that governs this operation is subject-auxiliary inversion; in 1(a), believe that this shift has substantial theoretical consequences the girl who is smiling the subject is the complex noun phase [ ], so for understanding the nature of the constraints on language is the entire structure inverts with . The model-based argument is learning. s input lacks the positive examples of the ’ that young children complex embedded questions as in 1(b), but rather consists of simpler utterances such as 2(a) and 2(b); without the notion that syntactic operations operate over phrasal structures, why would a learner not conclude from 2(a) and 2(b) to simply front the fi rst is ? ” Chunk-and-Pass “ What gets passed in processing? A predictive processing solution (2) a. The girl is happy. to the Now-or-Never bottleneck b. Is the girl happy? doi:10.1017/S0140525X15000941, e90 ) model-free approach addresses s( ’ Reali and Christiansen 2005 this question. They demonstrated that a model-free learner who Sam Wilkinson is sensitive to local bigram patterns could make the correct pre- Department of Philosophy, Durham University, Durham DH1 3HN, dictions about the structure of yes/no questions with complex United Kingdom. noun phrases. This demonstration showed how attending to [email protected] local sequential patterns could achieve the appropriate behavior despite not representing linguistic material at the level of syntac- Abstract: Now-or- “ I agree with the existence, and importance, of the tic hierarchies, as called for by model-based accounts. However, Never ” bottleneck. However, there is a far simpler and more it turned out that the success of the model-free mechanism was parsimonious solution to it. This solution is predictive processing, and the failure to view the solution that this provides fundamentally boils an artifact of idiosyncrasies in English that had nothing to do with BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 45 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

46 Response/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language down to viewing prediction as one aspect of cognition, rather than as its d like to gesture towards some of the consequences that Now I ’ central principle. predictive processing has for how we think about language. It has all of the same nine consequences that C&C enumerate, but, “ bottleneck presents a real challenge. Rather Now-or-Never The ” again, they are (at least for the most part) facets of each other than the solution presented by Christiansen and Chater (C&C), rather than separate consequences. Let me illustrate this with however, an alternative – one that is both simpler and more eco- two seemingly distant consequences: the multilevel organization – nomical is possible. They do allude to the solution I want to of language (Consequence 1), the nature of what is learned present, but they apply it locally rather than globally. The solution during language acquisition (Consequence 5). in question is prediction. One explanation for why this globally The hierarchical arrangement of hypotheses in predictive pro- applied solution is not presented is that C&C adopt a traditional “ cessing clearly suggests that language processing has a multilevel view of cognition, according to which inputs come in, get pro- ” organization. Of course, our processing of other, nonlinguistic, cessed, and passed on. Adherence to this view is evidenced in stimuli has a similar organization, but that structure is not so and get talk of come in ” processing. Inputs ” Chunk-and-Pass “ “ lack themselves clearly delineated since nonlinguistic worldly items passed. ” chunked and ” “ Within a predictive processing frame- “ that structure. As theorists, we tend to use rough-and-ready descrip- work, on the other hand, the direction, if anything, is reversed. tions in natural language (e.g., ” or “ This is a “ light comes from above Processing ” constitutes inputs having been successfully predicted “ ” face ) when talking about neurally encoded hypotheses, but there is from the top down. What gets “ passed ” is prediction error, not nothing intrinsically linguistic about the hypotheses themselves. The some honed incoming product. What does get honed, in light of same applies when that which is being processed is linguistic (e.g., a incoming prediction error, is predictions. Indeed, communicative utterance or a written sentence). Very schematically an expected event does not need to be explicitly represented or commu- ’ s brain can “ to the ” bottom “ put, from the top ” it goes like this: One “ about the shapes seen, or the sounds heard. ” uncertain be initially nicated to higher cortical areas which have processed all of its relevant 2010 features prior to its occurrence. (Bubic et al. , p. 10, quoted in Having resolved that, it can be uncertain about the letters or pho- Clark 2013 ) nemes, and then the words used, and then what they mean, and then what is meant by them, or by the whole utterance, and so If we adopt a wholesale predictive processing approach, accord- on. Within this picture, the way in which hypotheses are hierarchi- aid to processing as traditionally ing to which prediction is not an cally arranged, and priors are updated and can become entrenched, construed but is rather its fundamental principle, then we over- developing a sensitivity to deep, below-the-surface structured statis- come the Now-or-Never bottleneck in an evolutionarily, biologi- ’ tical regularities in the world, suggests (in line with C&C s sugges- cally, and computationally plausible way, and end up with all of tion) that acquisition is indeed learning to process. Gone is the the same consequences for how to understand language that the – need for innate linguistic knowledge (although some of our priors authors are at pains to point out. may be, in some sense, innate). or our propensity to form them – Firstly, all of the presented solutions to the Now-or-Never bot- However, learning to process is learning to predict, where this in- tleneck need not be seen, as C&C present them, as separate, but volves being attuned to the dynamic statistical structure of the rather may be viewed as different facets of predictive processing. world of which language, and language users, are an important part. of input, In other words, (a) “ eager recoding and compression ” In conclusion, although I am sympathetic to the spirit of what and (b) hierarchical levels of become conse- representation “ ” C&C present, a more wholesale predictive processing account deploy all available quences of, and not additions to, a need to (c) “ yields very similar consequences but casts things in a different information predictively. ” Let me explain why. (and arguably more plausible) light. Predictive processing is a concrete implementation of a Baye- sian strategy. Incoming signals are noisy and ambiguous, and so the brain uses Bayesian inference (it takes into account not only ACKNOWLEDGMENT of the hypothesis with the input, but also its “ fi t ” the “ prior prob- This research was supported by a Wellcome Trust Strategic Award ability ) to settle on one hypothesis rather than another. Thus, a ” (WT098455MA). fi hypothesis can have a really good t but such a low prior probabil- ity that it isn t selected (or it can have a poor fi t, but such a high ’ prior probability that it is selected). This Bayesian strategy gets implemented in the brain as follows. The selection of a hypothesis determines a set of predic- tions about subsequent inputs, namely, inputs that are compati- Authors ’ Response ble with the hypothesis. If the hypothesis does a bad job of predicting inputs, it will be tweaked or abandoned altogether in favour of another hypothesis. These hypotheses are hierarchi- cally arranged, with the hypotheses of one level providing the Squeezing through the Now-or-Never “ Higher ” parts of the hier- inputs (prediction error) for the next. bottleneck: Reconnecting language archy are, roughly, those parts that are further away from the processing, acquisition, change, and structure sensory stimulus. These tend to operate at longer timescales, and at higher levels of abstraction. Lower ” parts of the hierarchy “ doi:10.1017/S0140525X15001235, e91 are closer to the sensory stimulus. These tend to be at shorter timescales, and at low levels of abstraction. These, for example, a b,c,d Nick Chater and Morten H. Christiansen ’ correspond to early stages of visual processing: your brain s a Behavioural Science Group, Warwick Business School, University of early statistically driven attempts to make sense of (predict) b Department of Psychology, Warwick, Coventry, CV4 7AL, United Kingdom; noisy inputs. c The Interacting Minds Centre, Aarhus Cornell University, Ithaca, NY 14853; cient, and it in- Predictive processing is time and energy ef fi d Haskins Laboratories, New Haven, University, 8000 Aarhus C, Denmark; volves compression (more or less depending on the “ lossy ” CT 06511. occasion). You save on bandwidth by passing on only what is news- [email protected] newsworthy is simply what the receiver ” worthy. What counts as “ [email protected] of the message hasn ’ t already predicted, namely, prediction error. Abstract: If human language must be squeezed through a narrow To sum up, then, predictive processing in the brain always in- cognitive bottleneck, what are the implications for language volves (1) compression and (2) hierarchical arrangement of hy- processing, acquisition, change, and structure? In our target potheses (which, to use C&C ’ s terminology, can be thought of article, we suggested that the implications are far-reaching and ” as “ representations ). BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 46 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

47 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Response/ form the basis of an integrated account of many apparently et al. ]; Lotem, Kolodny, Halpern, Onnis, & Edelman unconnected aspects of language and language processing, as Wilkinson )? ]; [ Lotem et al. well as suggesting revision of many existing theoretical accounts. A second set of issues, which we discuss in section R3, With some exceptions, commentators were generally supportive focuses on the empirical and computational viability of both of the existence of the bottleneck and its potential the framework for language processing that we derive implications. Many commentators suggested additional theoretical from the Now-or-Never bottleneck. According to the and linguistic nuances and extensions, links with prior work, and Chunk-and-Pass framework, language comprehension re- c considerations; some relevant computational and neuroscienti fi quires a succession of increasingly abstract chunking oper- argued for related but distinct viewpoints; a few, though, felt ations, and, at each level, chunking must occur as rapidly as traditional perspectives were being abandoned too readily. Our possible and the resulting chunks immediately passed to response attempts to build on the many suggestions raised by the commentators and to engage constructively with challenges to our higher levels. The reverse process, where the speaker con- approach. verts an abstract message into articulatory instructions, is proposed to involve what we term Just-in-Time language production. Key questions include the following: How does how the Chunk-and-Pass framework relate to existing R1. Introduction theories of language processing, both in psycholinguistics In our target article, we argued that a powerful and general ( Bicknell et al .; Chacón et al. Ferreira & Christian- ; cognitive constraint, the Now-or-Never bottleneck, has far- ’ MacDonald; O ; son ) and computational linguistics Grady reaching consequences for both language comprehension )? How do these propos- ( Gómez-Rodríguez and Huyck and production. This perspective implies that language als relate to experimental data ( ; Baggio & Vicario acquisition and language change proceed construction- Healey, Howes, Hough, & Purver [ Healey et al. ]), in- by-construction, rather than involving more abrupt, cluding effects of top-down processing ( Dumitru ; system-wide shifts. We argued, moreover, that the Potter Healey et al. ; MacDonald ; )? Can our account picture that arises from the Now-or-Never bottleneck has meet the challenges of interactive dialogue ( ; Badets implications for the structure of language itself: Syntactic .; Healey et al ; Baggio & Vicario Lev- .; Kempson et al structure is viewed as processing history, thus enforcing a )? How far does the Chunk-and-Pass approach inson tight link between the psychology of language processing ), and to nonlinguistic apply to sign language ( Emmorey and linguistic theory. domains such as music and action ( Lakshmanan & The Now-or-Never bottleneck is a general cognitive con- Graham; Maier & Baldwin )? straint that, we suggest, applies to perception, motor A third set of issues, which we address in section R4, con- control, reasoning, and memory: Unless information is cerns the implications of the Now-or-Never bottleneck and recoded and/or used rapidly, it is subject to severe interfer- Chunk-and-Pass processing for language acquisition, evolu- ence from an onslaught of further information. Our article tion, and structure. In our target article, we argued that the explores possible implications of the Now-or-Never bottle- bottleneck has far-reaching implications for language neck for language: how it is processed and acquired, how across multiple timescales, ranging from the duality of pat- languages change, and the structure of language itself. terning observed across languages (roughly, having distinct The argument is that the Now-or-Never bottleneck has phonological and lexical levels), the locality of most linguis- profound implications in each of these domains: For tic regularities, and what we take to be the instance-based example, it requires that processing is incremental and nature of language acquisition and language change. Key predictive, using a Chunk-and-Pass mechanism; that questions include whether our account provides suf fi cient acquisition is item-based; that languages change constraints to explain language acquisition ( Endress & construction-by-construction; and that there may be an ; Katzir ) and Lakshmanan & Graham; Wang & Mintz intimate relationship between language structure and Lewis & Frank how it may be developed further ( ; processing. Maier & Baldwin ); and how far the account can explain The commentators on our article have provided a rich Bergmann, language change and evolution ( Behme ; variety of perspectives and challenges with which to evalu- Dale, & Lupyan [Bergmann et al.] ; Endress & ate and potentially to further develop this account. We have .). Some commenta- Lotem et al ; Lewis & Frank ; Katzir grouped our responses to commentators according to key tors explore how this approach can be a productive frame- themes that emerged. work for understanding regularities within and across fi rst set of issues, discussed in section R2, concerns The Kempson et al. O ’ Grady ); others believe ; languages ( the evidence for, and nature of, the bottleneck. Key ques- ; that further constraints are required ( Chacón et al. tions include: Does the psychological and linguistic ; Medeiros, Piatelli-Palmarini, & Endress & Katzir Kempson, ; evidence support (Ferreira & Christianson [ ). Wang & Mintz ]; Medeiros et al. Bever ] ) Chatzikyriakidis, & Cann [ Kempson et al. ; Potter In the remainder of this response to commentators, we Baggio & Vicario Chacón, Momma, & or contradict ( ; will discuss these three sets of issues in turn before Chacón et al. ] Phillips [ ) the existence ; Endress & Katzir drawing general conclusions and considering directions of the bottleneck? Have we overstated its scope ( Levin- for future work. Frank & son; MacDonald )? What is its neural basis ( ; Grossberg ; Honey, Chen, Müsch, & Hasson Fitz ] ; Huyck [ Honey et al. )? How can the hypothesis be elab- R2. The nature of the Now-or-Never bottleneck orated ( Dumitru ; Potter)? And, if we accept the existence of the Now-or-Never bottleneck, should it be treated as Memory is fl eeting: Sensory and linguistic information is basic, or as arising from more fundamental principles subject to severe interference from the continual onslaught Bicknell, Jaeger, & Tanenhaus [ ; Badets Bicknell (e.g., of new material. If the input is not used or recoded right BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 47 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

48 Response/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language the general framework developed here, more detailed away, it will be lost forever: This is the Now-or-Never memory models will allow for more fi ne-grained predic- bottleneck. tions about language processing. The eeting nature of memory is illustrated by the fl Indeed, we suggest that one fruitful direction for re- fi nding that our ability to recall arbitrary sequences of search is to explore cognitive models in which processing sounds is extraordinarily limited (Warren et al. 1969 ). Yet and memory are not distinct mechanisms. As Honey we are able to process highly complex, nonarbitrary se- quences of linguistic input (and, similarly, musical input et al. point out, it may be appropriate to see memory as and action sequences). We proposed that these observa- arising from ongoing neural processing activity, rather tions imply that sensory and linguistic information must 1993 than as located in distinct stores (see, e.g., Crowder ; be used or recoded into higher-level representations right ). From this viewpoint, processing 1984 Kolers & Roediger away, to avoid being lost forever. and memory operations should be located in the same brain What is the origin of the Now-or-Never bottleneck? In regions (Hasson et al. 2015 ). This perspective has also been our target article, we stressed the importance of interfer- applied to accounts of individual differences in language new input interferes with existing input, particularly ence – processing, modeled using simple recurrent networks between elements that overlap phonologically or semanti- ), in which the same connections and (Elman 1990 cally. Such interference has been observed at a wide weights encode and process linguistic input (MacDonald variety of representational levels in studies of memory for & Christiansen 2002 ). This type of model captures the re- serial order (Brown et al. 2007 ). Likewise, as noted by lationship between language processing and short-term MacDonald, sentences containing words with overlapping memory performance, without any functionally distinct phonological forms and meaning create processing prob- working memory (by contrast with, for example, production “ The baker that the banker sought bought the lems (e.g., ] CC- 1992 s[ ’ system models such as Just and Carpenter house ” “ The runner that the banker feared bought vs. READER). As we shall discuss further, in this integrated ” Acheson & MacDonald the house, 2011 ; see also Van perspective on memory and processing it is not possible 2012 Dyke & Johns for a review). to modify memory capacity independently of processing Another possible origin of the bottleneck stems not from operations (Christiansen & Chater 2015 ; ; MacDonald 2016 interference, but from one or more capacity-limited buffers ). Thus, memory capacity is not a free 2002 & Christiansen )fa- 1956 ). So, for example, Miller ( Levinson (discussed by parameter that can be independently selected for by mously suggested a capacity of 7±2 chunks in short-term natural selection (see our discussion of ). Lotem et al. memory, an approach enriched and updated by Baddeley underscore our claim that the Now-or- Honey et al. and colleagues (e.g., Baddeley 1992 ;Baddeley&Hitch Never bottleneck implies longer integration timescales )arguedforacapacity 2000 ). More recently, Cowan ( 1974 for more abstract levels of representation. They substanti- limit of 4±1 items. Our reading of the recent memory ate this view with evidence from functional magnetic reso- literature is that many, and perhaps all, aspects of nance imaging (fMRI) and intracranial recordings, memory limitations may best be understood in terms of inter- s concern that our multilevel ’ Vicario & Baggio countering ference rather than capacity-limited buffers, because the representational approach lacks neural foundations. Ac- same patterns of forgetting an dmemoryerrorsareobserved cording to Honey et al. , incoming information is continu- 2007 ). From this per- over many timescales (e.g., Brown et al. – ally integrated with prior information yet once integration spective, apparent capacity limitations are a side effect of in- has occurred, the resulting interpretation and knowledge terference, rather than stemming from, for example, a fi xed Fer- fi cult to revise ( updating becomes entrenched and dif “ in memory (see also Van Dyke & Johns ” number of slots reira & Christianson ). Consistent with such interpreta- ). 2012 ) found that when a 2015 tive entrenchment, Tylén et al. ( From the point of view we expressed in the target article, narrative had a coherent storyline, then incidental facts the key issue is the limited nature of the bottleneck, tended to be forgotten if they were not central to the whether it stems primarily from interference, capacity lim- plot. However, when the storyline was jumbled, there itations, or a combination of the two. Note, in particular, was a greater recall of incidental semantic facts, presumably that memory performance depends on the number of because integration was not possible. Importantly, an fMRI chunks involved, and what counts as a chunk depends on version of the same experiment yielded activation of the prior experience with relevant material. Hence, the same same cortical hierarchies, from lower-level sensory circuits sequence of phonemes may, over experience, be chunked to higher-level cognitive areas, as noted by Honey et al. into a series of syllables or words, or into a single multiword (and discussed in the target article). ). We stress, too, that interference 2012 chunk (Jones effects will operate between chunks that is, chunks are – R2.1. Challenges to the Now-or-Never bottleneck not merely encapsulated units – so that some of the internal structure of chunks will be retained. This is evident, for Several commentators question the severity of the Now-or- example, in phonological interference effects in memory Never bottleneck. Some of these concerns, however, focus for serial order (Burgess & Hitch 1999 ). Thus, although on consequences that do not follow from the bottleneck. some commentators (e.g., ) Bicknell et al.; MacDonald For example, as illustrated by SF ’ s spectacular memory seem to have taken our notion of “ lossy compression ” as in- for sequences of numbers chunked by running times, dicating a near total loss of information, we use the term in chunking low-level material facilitates memory for that ma- the standard computer science sense as indicating that not terial. More broadly, low-level information is remembered all information is retained . More generally, we are able to only to the extent that it has been processed. So the Now- outline the consequences of the Now-or-Never bottleneck or-Never bottleneck does not imply complete amnesia for without taking a stand on the exact nature of the underlying past low-level sensory or linguistic information – people – memory representations although, of course, within can, after all, remember tunes and poems by heart. What 48 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

49 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Response/ over six syllables, when participants are refrained from re- they cannot do is recall unprocessed sequences of noises or sponding until the end of the sentence. But this hardly letters, which they are unable to chunk in light of prior ex- changes the broad message that the “ raw ” sensory input perience. So, although we can remember new words in our is rapidly lost, presumably through interference, although language, recalling a complex sound-pattern from a foreign some limited information can, as we would predict, be re- language (e.g., for speakers of English, a word or phrase in tained through being encoded in larger units (e.g., through Khoisan) will be very dif s ’ Endress & Katzir cult. Hence, fi “ retaining a memory of the degree of ” ambiguousness of claim that children can learn a word from a single encoun- or the word dent tent ). ter does not challenge the Now-or-Never bottleneck (the Note that Connine et al. ( 1991 notion of fast-mapping has, though, been questioned in c ) highlighted task-speci fi some recent studies, e.g., Horst & Samuelson 2008 ; One major issue effects as a possible driver of their results: “ ). 2012 McMurray et al. left unresolved by the present research is the degree to ) that the bottle- Endress & Katzir We stress also (pace which delayed commitment is subject to strategic factors neck applies equally to explicit and so-called implicit introduced by task speci fi c demands ” (p. 246). With this memory (i.e., with or without awareness), if indeed such and also in mind, we can only agree with Bicknell et al. ( a distinction can be defended (e.g., Shanks & St. John ), that memory (including Ferreira & Christianson ). Our claim is that memory is dependent on process- 1994 memory for low-level information encoded into higher- ing, and this remains true irrespective of whether memory level units) can be used strategically in the service of task is assessed through explicit or implicit measures. For goals (e.g., Anderson & Milson 1989 ; Anderson & Schooler example, many psychology undergraduates will have been 1991 , our framework seems ). Indeed, as noted by Potter (see, e.g., Gregory ” Dalmatian exposed to the hard-to-see “ naturally compatible with allowing newly built structures ). Famously, once one can see the pattern as a Dalma- 2005 uenced by the observer (para. 4). “ to be fl ’ s task or goal ” in tian, the Dalmatian interpretation is typically available many Moreover, it is possible that such strategic task-related years later (e.g., to help segment the image, an implicit effects may appropriately be modeled by bounded rational measure of memory) – and the image will immediately be suggest. Similarly, we suggest analysis, as Bicknell et al. recognized as familiar and as a Dalmatian (explicit mea- fi that this approach to modeling task-speci c effects is com- successfully sures). But, of course, people who have not processing model described patible with the “ good enough ” gestalt will, of course, not remember “ ” found the Dalmatian by (Ferreira & Swets Ferreira & Christianson 2002 ). We that they have seen this speci fi c pattern of black-and-white see the Now-or-Never viewpoint as providing a framework marks on a piece of paper or a computer screen many boundedly rational ” within which ” good enough “ and “ years before. In short, an image is memorable only to the models may fruitfully be integrated. extent that it has been successfully processed. This explains stress, Bicknell et al. and Whereas Endress & Katzir why prior exposure to an image will assist the processing of and we agree, that not all low-level information is lost im- later copies of the same image, because such exposure mediately (though it will be lost if it cannot be processed “ ” helps create a that can be reused, allowing for cumu- gist into higher-level units), Baggio & Vicario argue that the lative learning effects over multiple exposures (see, for processing of sequential material such as language should example, Endress & Potter 2014a ). not be viewed as a race against time at all. They do not Similarly, Bicknell et al. stress that perceptual data are deny the existence of the Now-or-Never bottleneck, but not necessarily immediately forgotten – and we agree. The suggest that the brain has a number of mechanisms Now-or-Never bottleneck implies that perceptual or lin- through which the effects of the bottleneck can be coun- guistic data that cannot be successfully processed into tered, including inference, pragmatics, and skills associated higher-level representations will suffer severe interference with literacy. from subsequent material. But where that data can be Yet we are not sure that Baggio & Vicario ’ s suggestions recoded successfully, more low-level details may be re- change the picture substantially. Focusing for now on tained because they are embedded within a richer reading, even though we can always re xate a word that fi memory structure, thus countering interference from sub- we have missed or misread while reading, becoming a sequent material to some extent. Nonetheless, we would fl uent reader requires overcoming a reading-based ana- anticipate that recalling such low-level details is likely to logue of the Now-or-Never bottleneck for three reasons: be cognitively effortful, although some details may be re- 70 – (1) memory for visual information is short-lived (60 tained when crucial to the task at hand. ms; Pashler ); (2) visual input is taken in at a fast 1998 The in fl uence of task constraints is illustrated by a study rate during normal reading (about 200 words per minute; describe, by Connine et al. ( that 1991 ), em- Bicknell et al. ); and (3) memory for visual sequences 1985 Legge et al. ploying a phoneme labeling task. Participants indicate ). is limited (to about four items; Luck & Vogel 1997 which of two sounds they heard at the beginning of the Because memory for what has just been read is short- third word in a sentence, and are instructed to use any avail- lived and subject to rapid interference, we suggest that able information from the sentence to make their response. readers must perform chunking operations on text input The stimuli were ambiguous between a voiced and un- as quickly as possible in order to read fl uently. Indeed, in- tent and , voiced initial consonant, yielding a blend of dent dividual differences in chunking ability predict self-paced “ followed by a disambiguating context: When the __ in ). 2015b reading performance (McCauley & Christiansen ...” the fender/forest Therefore, while encoding the word, participants are explicitly instructed to pay attention R2.2. Is the Now-or-Never bottleneck a side effect of a to the details of the fi rst phoneme. Accordingly, some low- deeper constraint? level information is likely to be retained over a short period. report their own study indicating slightly Bicknell et al. In our target article, we argued that the Now-or-Never bot- longer periods of retention of phonemic information, tleneck provides a powerful motivation for online BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 49 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

50 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Response/ This viewpoint would, for example, be compatible with the- prediction in language processing, and in cognition more ories of memory, mentioned earlier, in which memory con- broadly. Given the underspeci fi ed nature of the sensory sists of one or more capacity-limited buffers (e.g., Baddeley and linguistic input, predictive information is required to and hence where the capacity limit can be adjusted – ) 1992 analyze new input as rapidly as possible, before it is (as is appropriate, for example, in thinking about computer obliterated by the onslaught of further material. Similarly, RAM memory or hard disk capacity). prediction is required for online learning, in which the dis- We suggest, by contrast, that human memory and pro- parity between predictions and sensory data can immedi- cessing are fundamentally integrated and that the Now-or- ately be used to drive learning. According to the Now-or- Never bottleneck arises from interference effects that are Never bottleneck, unless the disparity between predictions unavoidable, given that the same neural and computation- and input is computed and exploited right away, the al machinery is used for successive, and potentially sensory information will be lost, and with it, the opportunity strongly overlapping, and hence interfering, inputs (e.g., for learning. 2007 Brown et al. ; Hintzman ). 1983 ; Murdock 1988 and Wilkinson By contrast, Badets argue, from differ- From this standpoint, the Now-or-Never bottleneck is ent perspectives, that online prediction should not be not usefully characterized as having a variable size, seen as helping to deal with the Now-or-Never bottleneck, which is subject to independent variation and selection. but as the central engine of cognition. There might not be Rather, the bottleneck emerges from the computational substantial disagreement here, however. A cognitive theory architecture of the brain; and variation in memory perfor- based on prediction still has to specify at which point the mance depends on the effectiveness of Chunk-and-Pass error between prediction and sensory or linguistic input mechanisms to mitigate its impact. So SF ’ s ability to is assessed, to guide action and shape learning. The Now- encode streams of digits as running times indicates not a or-Never bottleneck requires that prediction error is calcu- bottleneck ” particularly wide “ but rather a particularly ef- : If the disparity online lated and used to drive learning ). Expert fi 1980 cient recoding strategy (Ericsson et al. between prediction and sensory input is not calculated right away, then sensory input will be lost. Notice that, by chess players are able to recall positions of real chess contrast, many prediction-based learning methods do not chunks “ games by encoding them using a rich set of ” learn online. For example, the parameters in connectionist from prior games (yet even top chess players have no networks or Bayesian models are often adapted to provide ” nonsense “ chess positions and memory advantage for of available data, which typ- fi t to the whole “ batch ” the best fi neither do they have signi cantly above-average general ically involves storing and resampling these data throughout 1973 visuospatial abilities; Simon & Chase ; Waters et al. learning. Indeed, the requirement for learning to be online 2002 ). Similarly, we suggest that individual differences in is very strong: Online learning algorithms face the danger of fi cacy of language processing operations will the ef so-called “ catastrophic interference ” where learning new depend on being able to draw on a rich set of prior linguis- 1999 items damages memories of old items (e.g., French ). ciently recode linguistic input (Chris- fi tic experiences to ef ecan,aswenote,beavoided Such catastrophic interferenc 2016 ; MacDonald & 2012 ; Jones tiansen & Chater by using item-based learning models, so that learning from Christiansen ). 2002 fi experience involves not re tting the parameters of a model From this standpoint, it is not appropriate to see the size (e.g., a stochastic-phrase structure grammar, or the like), of the Now-or-Never bottleneck as a free parameter that but continually adding to, and then generalizing from, a data- can be optimized through selection and variation, as em- base of stored exemplars (e.g., a ninventoryofconstructions). s variable Lotem et al. ’ ” “ time-window bodied in in their Needless to say, sensory experience must be encoded in an ; ; computer simulations (e.g., Kolodny et al. 2014 2015a ” raw “ abstract form (rather than purely as acoustic or visual ” in this model is window “ ). Note, too, that the 2015b input) to reduce interference with other stored items. In – 300 items) compared with buffers typically large (e.g., 50 our target article, we argued that item-based learning is a postulated in the study of human memory (Baddeley ); 2003 plausible model for language acquisition (Tomasello 1992 ), so its psychological status is not clear either. ive learning, imposed by the and the need for online predict In any case, to the extent that see the Now- Lotem et al. Now-or-Never bottleneck, may favor item-based learning cally to or-Never bottleneck for language as shaped speci fi throughout perception and cog nition more broadly (e.g., the linguistic environment, their approach appears to ). Kolodner 1993 ;Poggio&Edelman 1990 depend on the structure of language being exogenously From a different theoretical viewpoint, Lotem et al. raise fi xed, to provide a stable target for adaption of the Now- the possibility that the Now-or-Never bottleneck should not or-Never bottleneck. But language is not given exogenously; fi xed constraint on cognitive ma- necessarily be viewed as a fi t it is shaped by generations of rapid cultural evolution to chinery, but may instead itself be an adaptation of our learn- with, among other things, the learning and processing ing mechanisms, driven by natural selection (see also biases of the brain, including the Now-or-Never bottleneck. 2004 ’ Endress & Katzir ). s discussion of Major & Tank We have suggested elsewhere that language is shaped by the The argument of our target article focused on the nature brain, rather than the brain being shaped by language and implications of the Now-or-Never bottleneck, but the ). So linguistic regularities will 2008 (Christiansen & Chater question of the origins of the bottleneck is, of course, of arise from, among other things, the Now-or-Never bottle- great interest. Lotem et al . argue that the bottleneck has neck; and hence the Now-or-Never bottleneck is prior to, been adapted through natural selection to optimize the rather than an adaptation for, the structure of language. s ability to learn. They note that a wide variety of evi- brain ’ dence shows that memory performance varies between indi- R2.3. Neural plausibility? viduals, and is to some extent heritable. They interpret this variation to suggest that the size of the bottleneck is itself How might the Now-or-Never bottleneck be implemented variable – and that this size can potentially be selected for. neurally? Grossberg argues that many key aspects of our 50 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

51 Response/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language so – that have been acquired by item-based learning approach are already embodied in existing computational Chunk-and-Pass processing is not restricted to the syntactic models of neural function created by his research team, units used in parsing. Second, while the operation of the and, in particular, in the notions of Item-Order-Rank Sausage Machine is informationally encapsulated from (IOR) working memory and by a learning and chunking semantic and pragmatic factors, the Chunk-and-Pass mechanism called the Masking Field (MF) (for a less model assumes that all sources of information, from low- detailed discussion along somewhat similar lines, see level sensory input to pragmatics and world knowledge ). We are sympathetic with the proposal that Huyck are brought to bear online to create and recode chunks at Chunk-and-Pass processing, and, more broadly, the all levels of analysis. Thus, we stress that the Chunk-and- serial character of high-level thought (e.g., Pashler ), fl uences (see Dumitru Pass view includes top-down in 1998 ), derive from the basic operating principles of the rather than operating purely bottom-up in a modular brain, as carrying out a sequence of parallel constraint sat- , fashion (a concern raised by Healey et al. , Lotem et al. isfaction processes. The data outlined by Honey et al. and MacDonald ). suggest that each computational step (e.g., chunking and The third difference to note is that, unlike the Sausage recoding linguistic input) may work in parallel across Machine, which postulates cognitively decisive breakpoints large areas of the brain, so that multiple processes at the sausages “ at the boundaries between ” (i.e., phrase structure same representational level cannot be carried out simulta- created by the parser), the Chunk-and-Pass viewpoint neously, and hence language processing, and high-level allows links (and interference) between items that are not thought more generally, is sequential (e.g., Rumelhart grouped within the same chunk (e.g., words which are ). If this is right, then the Now-or-Never bot- 1986b et al. not in the same phrase or clause). But the strength of tleneck may be a side effect of the basic principles of such links will reduce rapidly, in a graded fashion, as the neural computation, rather than a free parameter that distance “ ” between items increases, as would be predicted can be readily modi ed by natural selection (contra fi by the memory interference processes that we take to un- ). Lotem et al. derlie the Now-or-Never bottleneck. Chunk-and-Pass pro- offer a very different perspective on brain Frank & Fitz cessing implies a strong bias toward local structure in function inspired by the processing properties of the cere- language, but is entirely compatible with the existence of ). They question the severity of the bot- 2011 bellum (Fitz Medeiros et al. some nonlocal dependencies (see ; tleneck in light of computational results from what they ). We emphasize that the Now- Healey et al. ; Levinson reservoir computing, in which an untrained neural ” term “ or-Never bottleneck explains the remarkably, though not network projects a temporal input stream into a high di- mensional space; a second network is trained to read off in- completely, local structure of language (as noted by “ ” They report simulations formation from the reservoir. ), with its hierarchy of levels of representa- Kempson et al. that they take to show that the network can reliably tions largely corresponding to local sequences of linguistic recover complex sequential input after long delays. Inter- material. As we outlined in our target article, this contrasts esting as these results are, they seem to provide a poor fi t with the batch-coded communication signals used in engi- with the large literatures on both human memory limita- neering and computer science and which are optimal tions and restrictions on language processing. It is thus within an information theory framework (Cover & unclear whether such networks would predict the aspects Thomas ). 2006 of language processing discussed in our target article, and Turning to production, we argued that the Now- by other commentators (e.g., ; Ferreira & Christianson or-Never bottleneck implies that once detailed low-level ; Grossberg Kempson et al .). production instructions have been assembled, they must be executed right away, or they will be obliterated by inter- ference from the oncoming stream of later instructions: This is Just-in-Time production. Some commentators R3. The case for Chunk-and-Pass language Chacón et al .; MacDonald ; ) Ferreira & Christianson ( processing have taken Just-in-Time production to imply so-called The Now-or-Never bottleneck is a fundamental constraint radical incrementality , in which phonological words are ar- on memory that the language system deals with by Chunk- ticulated immediately in the absence of any planning and-Pass comprehension and Just-in-Time production. The ahead. They have rightly noted that such radical incremen- Chunk-and-Pass very phrase has, to some commentators, tality is inconsistent with evidence of task-related effects suggested a link with the Sausage Machine parsing model ) 2002 on production. For example, Ferreira and Swets ( ). This has led some commenta- 1978 of Frazier and Fodor ( showed that participants plan ahead when producing utter- tors to level concerns at the Chunk-and-Pass approach that ances involving the results of arithmetic calculations. are more appropriately directed at the Sausage Machine Indeed, speakers appear to plan beyond the immediate Ferreira & Christian- ; Chacón et al Bicknell et al. .; ( phonological word, but likely no more than a clause in ; Healy et al. ; son ). According to the MacDonald ). We want to stress, 1992 advance (e.g., Bock & Cutting Sausage Machine model, a preliminary syntactic analysis though, that just as comprehension at the discourse level is created within a window of about six words and then takes place over a relatively long timescale, so does plan- ning at the discourse or conceptual level in production. shunted off as a packet (like successive sausages coming This is because chunks at the discourse level have a out of a real sausage machine) to a second stage that com- Honey longer duration than articulatory chunks (see pletes the syntactic parsing. But although the Sausage et al .). Whereas planning at the level of the phonological Machine has a packet-by-packet character, it differs funda- word may be quite short in temporal scope, planning will mentally from the Chunk-and-Pass model along at least extend further ahead at the level of multi-word combina- three key dimensions. First, the Chunk-and-Pass account tions (what might traditionally be called the “ grammatical operates at a variety of representational levels, using units 51 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

52 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Response/ ( Levinson ). In this regard, we are encouraged by the de- ” ), and even longer at the conceptual/discourse level level ’ , illustrating how an Grady O tailed examples provided by ). Thus, the evidence that (e.g., Smith & Wheeldon 2004 account of this kind can be elaborated to deal with linguis- Chacón et al . discuss in this regard (e.g., Lee et al. tically complex phenomena such as the contraction, wanna 2013 ; Smith & Wheeldon 1999 ) is not inconsistent with and his more detailed processing-based explanations of Just-in-Time production. central linguistic phenomena including binding and quanti- Nonetheless, it is important to note that people do inter- ). 2015a ; 2013 Grady ’ cation across languages (O fi leave planning and articulation processes when producing ). 2002 utterances under time pressure (Ferreira & Swets Levin- Given the speed of turn-taking (e.g., as noted by R3.1. Chunk-and-Pass processing and semantic ), such time pressures may be the norm in normal con- son interpretation versations, limiting the amount of advance planning fl ected by the patterns of dis possible. This is re uencies ob- fl Ferreira & Several commentators (e.g., Chacón et al. ; served in production, indicative of brief planning ahead at Honey et al. ; Frank & Fitz Christianson ; ) rightly ). We 1988 ; Holmes 1993 the clausal level (e.g., Ferreira stressed that a Chunk-and-Pass model of comprehension see this limited planning ahead as compatible with Just- must integrate current input with past input to produce a in-Time production, whereby production is limited to just semantic interpretation that can interface with general a few chunks ahead for a given level of representation. Cru- nal stages of such interpretation, there- fi knowledge. The cially, as noted in the target article, such chunks may fore, has to do more than merely chunk linguistic input: involve multi-word sequences, which are articulated as Inferential processes will be required to resolve anaphora units rather than as a chain of individual words (Arnon & and other referring expressions (Garnham & Oakhill ). This 1999 ; Bybee & Scheibman 2013 Cohen Priva 1985 rst, to bridge between current input and prior lin- fi ), allows speakers to plan ahead to some degree when this ) and, second, guistic and nonlinguistic context (Clark 1975 is required by task demands, though our account suggests s intentions (e.g., Lev- ’ to update beliefs about the speaker that such planning would be limited to a few chunks inson 2000 ) and about the environment (Gärdenfors & Rott 1 within a given level of linguistic representation. Future ). We argue, though, that the Now-or-Never bottle- 1995 work is needed to further develop this perspective on pro- neck implies that processes of semantic and pragmatic in- duction in more detail. terpretation and belief revision must occur right away, or The Now-or-Never bottleneck, and the processing con- that is, the opportunity for such interpretation is lost – sequence that follows from it, applies across modalities. belief updating, as well as semantic interpretation narrowly Just-in-Time mechanisms of motor planning will be used construed, is incremental. whether the language output is speech or sign. Similarly, The phenomenon of rapid semantic analysis and belief Chunk-and-Pass processing will be required to deal with ed, for example, in the celebrated fi updating is exempli the onslaught of linguistic material, whether that material demonstration that so-called shadowers (i.e., ” close “ is spoken or signed. However, as Emmorey points out, – people able to repeat speech input at a latency of 250 the detailed implications of the Now-or-Never bottleneck 300 ms or even less) are sensitive not only to syntactic may differ between modalities. She notes that the speed structure, but also to semantic interpretation (Marslen- of the speech articulators, in contrast to manual gestures, Wilson 1987 ). Or consider a very different paradigm, in contributes to a rapid serial information transmission strat- ing paragraph of text is read fl which a potentially baf egy being adopted for speech, while greater parallelism is either with or without an explanatory title or context (Bransford & Johnson 1972 ). In the absence of the explan- used in signed communication. So, for example, she atory context, memory for the passage is poor. This means points out that while spoken words consist of a sequence that, even if the clarifying context is provided later, the cog- of phonemes, signed words typically correspond to multiple nitive system is unable to make much sense of the passage sign elements (spatial locations and temporally de fi ned in retrospect. Unless it is understood at the time, the details notes that spoken movements). Similarly, Emmorey will be too poorly remembered to be reinterpreted success- fi xes temporally before or after the languages deploy af Potter fully. offers a possible framework for such interpre- ed item, whereas morphology is usually signaled modi fi tations in terms of what she calls conceptual short term simultaneously in signed languages. We suggest that differ- (CSTM): activations of long-term memory associ- memory ences in sequential learning abilities in the auditory and ated with active stimuli and thoughts (Potter 2012 ). Impor- visual domains may also be important: The perceptual rich but unselective associations arise tantly, she notes that “ fi nds sequential structure in auditory mate- system readily quickly but last only long enough for selective pattern rec- rial in comparison with visual material (Conway & Christi- – Thus, CSTM may ” s terms. ’ chunking, in C&C ognition 2015 ); conversely, the visual ansen 2005 ; ; Frost et al. 2009 u- fl allow the rapid integration of conceptual information, in modality readily creates visual gestalts to encode simultane- enced by task demands and goals, which will facilitate ously presented movements in one or more effectors incremental interpretation through Chunk-and-Pass pro- ; Wagemans et al. 1990 2012 – see (compare Bregman cessing. It also enables the building of the kinds of online also ). Dumitru semantic and discourse-related representations called We have presented Chunk-and-Pass processing as a general solution to the constraints imposed by the Now- . CSTM may further provide a nonsyn- Chacón et al for by or-Never bottleneck. We appreciate the call for proposals tactic basis for the successful processing of nonlocal depen- concerning how such a framework might be elaborated, dencies (an issue raised by ; Healy et al. ; Medeiros et al. for example, with respect to the nature of discourse repre- ). Levinson .), developmental underpinnings Chacón et al sentations ( , however, the re- Ferreira & Christianson As noted by ), and the nature of processing and rep- Maier & Baldwin ( sulting interpretations may often be rather shallow and resentational levels used in Chunk-and-Pass processing underspeci fi ed (e.g., Ferreira et al. 2002 ), with the depth 52 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

53 Response/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language model for processing and producing language in the and focus of such good-enough “ ” representations being af- ed spirit of the Chunk-and-Pass framework is exempli fi ). This can lead to fected by task demands (Swets et al. 2008 in a recent computational model (Chater et al. 2016 ; systematic misinterpretations, such as when participants in ). McCauley & Christiansen 2013 ) tended to derive the a study by Christianson et al. ( 2001 The remarkable speed and rapid turn-taking of interac- incorrect interpretation that Mary bathed the baby from Levinson tive dialogue ( ) presents a considerable challenge While Mary bathed the temporally ambiguous sentence to the cognitive system although the fact that we are able – culty of backtracking the baby played in the crib . The dif fi to understand time-compressed speech, which is several appears to be a key contributing factor in such misinterpre- times faster than the normal speech-rate, strongly suggests tations because the language system has limited opportuni- that the limiting factor is rate of articulation rather than ty for going back to correctly reinterpret previous input comprehension (Pallier et al. 1998 ). As Levinson points (Slattery et al. ). 2013 out, the ability of participants to turn-take with latencies Our formulation of Chunk and-Pass processing empha- of a fraction of a second implies that signi fi cant speech sized the importance both of bottom-up and top-down pro- planning has occurred before the other speaker has n- fi cesses. Indeed, we stressed that the pressure rapidly to ished; and prediction is, of course, required to plan an chunk locally ambiguous speech input provides a powerful s utterance is com- ’ appropriate response before a partner reason to harness the full range of relevant informational pleted. We suggest, though, that online prediction and in- sources as rapidly as possible. Integrating these sources of cremental interpretation are required, even when such information will best predict what input is likely, so that it constraints are relaxed: Unless the speech signal is not can be chunked and passed to higher levels of representa- recoded right away, it will be obliterated by interference tion as quickly as possible. Parallel models of word recogni- from later material. Thus, the ability to anticipate later ma- 1987 tion (e.g., Marslen-Wilson ; McClelland & Elman terial at higher levels of representation (e.g., at a discourse ) nicely exemplify this viewpoint: Acoustic, lexical, 1986 level) requires rapid online analysis at lower levels of repre- semantic, and pragmatic information is brought to bear in sentations, so that the output of such analysis can be fed real time in order to identify words rapidly, and indeed, into predictive mechanisms. for a word is thereby often “ the ” recognition point points out that dialogue can include repeti- Levinson reached well before the end of the word. We are therefore – for tions that span fairly long stretches of material highly sympathetic to the call from some commentators example, repeating a query after some intervening com- to highlight the importance of top-down processing ments. Note that this does not provide a problem for the Dumitru ( ; Healey et al. ; MacDonald ; Potter ). Note, present approach, as long as that material has been though, that top-down expectations from prior context or recoded into a more abstract form. The Now-or-Never bot- world knowledge may in some cases also produce misinter- tleneck implies that maintaining a representation of an pretations, as when study participants misinterpret the acoustic stream, a string of arbitrary acoustic instructions, as if it was the dog that did sentence The man bit the dog or a string of phonemes, will be impossible, but such infor- ; see also Potter). In such cases, the biting (Ferreira 2003 mation can be recalled, at least with much greater accuracy, higher-level expectations can run ahead of the linguistic when encoded into a hierarchy of larger units. Hence, this input (as emphasized by Dumitru), potentially leading un- type of example is entirely compatible with the Now-or- anticipated linguistic input to be misinterpreted. Never bottleneck. Rapid interactive dialogue often goes wrong: We continu- R3.2. The importance of dialogue .). The ally self-correct, or correct each other ( Healey et al ability to do this provides further evidence for an incremen- Since the turn of the millennium, researchers have become tal Chunk-and-Pass model of language comprehension – the increasingly aware of how viewing language processing in incrementally created chunked representation can then be the context of dialogue, rather than considering the isolated revised and reformulated, as appropriate, by making fairly production and comprehension of utterances, can have local modi cations to the chunk structure. We agree with fi profound implications for the psychology of language Baggio & Vicario , and Kempson et al. Healey et al., ). We therefore agree with 2004 (e.g., Pickering & Garrod that the ability to switch and repair is a source of strong the various commentators who emphasize the centrality constraints on theories of processing and, to the extent of dialogue in assessing the implications of Chunk-and- that the structure of processing matches linguistic structure Badets ; Pass processing ( Healey ; Baggio & Vicario 1985 (Pulman ), by extension to syntactic theory, arguably Kempson et al. ; et al. ) Levinson ; favoring approaches such as dynamic syntax (Healey Kempson et al. Levinson note the theoretical chal- and and construction grammar ) et al.; Kempson et al. lenges arising from real-world dialogue in which there O ( ). Grady ’ is often rapid turn-taking, in which partners may, for example, complete each other ’ s sentences. This possibility seems compatible with the idea that production and com- R3.3. Meeting the computational challenges of natural prehension processes are closely intertwined (see Pickering dialogue 2007 ; 2013a for reviews). For example, if com- & Garrod analysis-by-synthesis reconstruc- prehension involves an “ ” highlight the parallel Gómez-Rodríguez and Huyck tion of the process by which the utterance was produced, between our framework and the computational solutions then the comprehension process itself creates a representa- used by engineers to implement real-time natural language tion that can be used to continue the sentence. This is par- processing (NLP). Of particular interest is the observation ticularly natural within the present framework: The same that such arti fi cial systems are not subject to whatever hard- inventory of chunks can be deployed both by comprehen- ware limitations the human brain may be working under sion and production processes. Indeed, a single-system but nonetheless end up employing the same solution. 53 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

54 Response/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Music and language appear to share a hierarchical organiza- One possibility is that the limitations of the brain are actu- tion, and both can be processed highly effectively despite ally shaped by the problem, as suggest. Lotem et al. the severe pressure of the Now-or-Never bottleneck, and Another possibility is that NLP systems are dealing with far better than humans can process unstructured sequences human language, which is adapted to the Now-or-Never 1969 ). We therefore believe that of sounds (Warren et al. bottleneck (as discussed further subsequently), and there- the Chunk-and-Pass framework might fruitfully be fore has a very local structure. Arti fi cial NLP systems applied in future studies of music and other aspects of per- must process language that embodies these human con- ception and action. and, to replicate natural human conversational in- straints – teraction successfully, they may need to embody those very same constraints. Importantly, Gómez-Rodríguez argues in R4. Implications for language acquisition, favor of the latter, because computers are not limited by evolution, and structure memory to the same degree as humans. But these systems face the same problems as humans when interact- The Now-or-Never bottleneck and its processing ing with another person: Language needs to be processed consequences (Chunk-and-Pass comprehension and Just- here-and-now so that responses can be made within a rea- in-Time production) have, we argue, implications for how sonably short amount of time (e.g., there are about 200 ms language is acquired, how it changes and evolves over ). between turns in human conversation; Stivers et al. 2009 time, and for how we should think about the structure of For example, so-called chatbots (e.g., Wallace 2005 ) language. The commentators have raised important issues receive human language input (in text or voice) and in each of these domains. produce language output (in text or synthesized voice) in real time. Because no one is willing to wait even a few seconds for a response, and because we expect responses R4.1. Implications for language acquisition even to our half-formed, fragmentary utterances, these chatbots need to process language in the here-and-now, In our target article, we argued that the Now-or-Never bot- just like people. The strategies they employ to do this are tleneck implies that language learning is online: Learning cial fi revealing. As Gómez-Rodríguez notes, these arti must occur as processing unfolds, or the linguistic material systems essentially implement the same Chunk-and-Pass will be obliterated by later input, and learning will not be processing solutions that we discussed in our target possible. For parameter-based models of language, this article: incremental processing, multiple levels of linguistic cult fi can be dif learning seems to require surveying a – structure, predictive language processing, acquisition as large corpus of linguistic input to “ check ” the appropriate- learning to process, local learning, and online learning to ness of parameter settings. But if learning must occur predict. We see this convergence as further evidence in online, without the ability to retain, and review, a large ver- favor of the feasibility of Chunk-and-Pass processing as batim corpus, then parameter setting is dif cult (witness fi a solution to the pressures from the Now-or-Never “ models in the principles the dif fi culties of making ” trigger bottleneck. and parameters tradition [Gibson & Wexler 1994 ] learn successfully). An item-based model of language acquisition provides an alternative conception of online learning – new R3.4. Chunk-and-Pass in nonlinguistic domains? s model of the constructions can be added to the learner ’ fi t language one-by-one. Such item-based models also If, as we have argued, the Now-or-Never bottleneck is a well with empirical evidence on child language acquisition domain-general constraint on memory, then we should 2003 (e.g., Tomasello ), as well as with item-based models of expect Chunk-and-Pass processing to apply not just to lan- linguistic structure, such as construction grammar (e.g., guage comprehension, but also to a wide range of percep- ). Goldberg 2006 tual domains. Similarly, it seems likely that the principles of model-free ” “ characterize this view as a Wang & Mintz Just-in-Time production may be extended beyond speech ” model-based “ approach to language, contrasting it with a production to action planning and motor control in perspective incorporating linguistic constraints. They ; ). Indeed, as we Maier & Baldwin general ( MacDonald suggest that because our approach involves domain- noted in our target article, planning one ’ s own actions general learning, it is unable to capture many of the and perceiving the actions of others appear to involve the constraints on linguistic structure (such as the apparent creation of multilevel representational hierarchies, and sensitivity to structure, rather than linear order, in question we conjecture that Chunk-and-Pass and Just-in-Time pro- 2 formation cesses will operate in these domains (Botvinick 2008 ; ). This suggestion incorrectly presupposes that ). MacKay 1987 domain-general learning necessarily has to be constraint- In the target article, we speculated that music might be a free. All too often it is implicitly assumed that either lan- domain in which Chunk-and-Pass and Just-in-Time mech- guage acquisition is guided by (presumably innate) linguis- anisms might be required to process a highly complex and tic constraints or that there can be no constraints at all. But hierarchically structured auditory sequence, of comparable this is, of course, a false dichotomy. Indeed, we have argued complexity to human language. Lakshmanan & Graham ) that 2016 elsewhere (e.g., Christiansen & Chater 2008 ; appear skeptical, apparently on the grounds that music and there are substantial constraints on language, deriving language differ in a number of regards (e.g., music does not from a wide variety of perceptual, communicative, and cog- – have a semantics; or music does not involve turn-taking nitive factors (we discuss this point subsequently). although improvised styles of music including jazz and Of these constraints, the Now-or-Never bottleneck is of Indian classical music do frequently involve rapid turn- and so we – particular importance, but it is not the only one taking between players). But these concerns seem beside agree with Wang & Mintz that many additional constraints the point when considering the key question at issue: will shape both language itself and our ability to acquire it. 54 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

55 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Response/ In “ production, ” the task of CBL is to recreate the child fl The stronger the con uence of multiple cognitive and utterances encountered in the corpus, given the inventory easier other biases that shape language, the language will of chunks learned thus far in the acquisition process. be to learn, because each generation of learners simply When exposed to a corpus of Tamil child-directed follow in the footsteps ” of past learners. Language “ have to speech, the model was able to use its inventory of chunks has been shaped by many generations of cultural evolution to successfully produce a large proportion of the child ut- fi t with our learning and processing biases as well as pos- to independent terances in that corpus in the absence of “ sible. Thus, considering language as culturally evolved to be Indeed, CBL performed as grammatical mechanisms. ” easy to learn and process helps explain why language is well on Tamil as it did on Mandarin and English. Although ). learned so readily (e.g., Chater & Christiansen 2010 not a de nitive proof, the CBL simulations do suggest that fi fi This viewpoint ts nicely with the iterative learning Chunk-and-Pass processing may be more powerful than a 2009 ths fi ; Reali & Grif ) described studies (Kirby et al. 2008 3 priori speculations might suggest. , and their emphasis on language as Lewis & Frank by This underscores the emerging from the interaction of cognitive and communi- importance of implementing theoretical accounts compu- cative pressures. tationally – whether these accounts are usage-based or Compatible with this viewpoint, and in contrast to in order to deter- rely on innate grammatical mechanisms – s suggestion of acquisition ’ Lakshmanan & Graham mine the degree to which they account for actual linguistic fi “ ed) ” innate grammatical mechanisms, guided by (unspeci behavior. ), in their survey of the acquisition of pol- 2014 Kelly et al. ( raise important questions about how Maier & Baldwin ysynthetic languages, highlighted the importance of item-based acquisition gets off the ground: For example, s pat- several properties of the input in explaining children ’ what principles can the learner use to establish the basic terns of acquisition. For example, children learning units from which structures can be built? One possible Quiché Mayan and Mohawk initially produce the most answer is that information-theoretic properties of the se- perceptually prominent units of speech, and such percep- quence (e.g., points of unusually low predictability) may tual salience also appears to play a role in the acquisition fi provide clues to chunk boundaries. A simpli ed version of Navajo, Inuktitut, Quechua, and Tzeltal (Lakshmanan of this approach is employed by the CBL model (McCauley s stipulations notwithstanding). Another prop- ’ & Graham & Christiansen 2014; 2015c ), which uses dips in backward – has been shown by frequency erty of the input – transitional probabilities (which infants track; cf. Pelucchi Xanthos et al. ( 2012 ) to be key to the acquisition of et al. 2009 ) to chunk words together. Another approach complex morphology across a typologically diverse set of might discover chunks by way of undersegmentation, es- languages: French, Dutch, German (weakly in fl ecting lan- sentially treating intonational units as preliminary chunks. guages); Russian, Croatian, and Greek (strongly in fl ecting The PUDDLE (Phonotactics from Utterances Determine languages); and Turkish, Finnish, and Yucatec Maya (ag- Distributional Lexical Elements) model of word segmenta- glutinating languages). Using corpus analyses, Xanthos ) adopts this method tion (Monaghan & Christiansen 2010 2012 et al. ( ) found that the frequency of different mor- and is able to build a vocabulary by using shorter chunks phological patterns predicted the speed of acquisition of to split up larger chunks. For example, the model is able morphology, consistent with usage-based suggestions re- s name in isolation ’ to use the frequent occurrence of a child garding the importance of variation in the input for learn- to segment larger utterances in which that name also ing complex patterns in language (e.g., Brodsky et al. appears, mirroring the kind of developmental data (e.g., 2007 ) as well as for distributional learning more generally Bortfeld et al. 2005 ) that Maier & Baldwin mention. As dis- 2002 ). (e.g., Gómez ), the two ways of discover- 2015 cussed in McCauley et al. ( “ suggest that Lakshmanan & Graham without inde- ing chunks in CBL and PUDDLE likely occur side-by-side Chunk-and-Pass pro- ” pendent grammatical mechanisms in development, possibly alongside other mechanisms. s acquisition of cessing cannot explain children ’ free word “ Future research is needed to fully understand the interplay ” languages such as Tamil. However, a recent compu- order fi between these different mechanisms and their speci c tational model by McCauley and Christiansen (2014; characteristics. Fortunately, as Maier & Baldwin point 2015c ) casts doubt on this claim. This chunk-based out, there is considerable empirical evidence that can po- learner (CBL) implements Chunk-and-Pass processing at tentially help constrain models of initial chunk formation the word level, using simple statistical computations to (for reviews, see, e.g., Arnon & Christiansen, submitted; build up an inventory of chunks consisting of one or 2012 Werker et al. ). more words, when exposed to child-directed speech from a typologically broad set of 29 Old World languages, R4.2. Implications for language change and language including Tamil. Importantly, the model works entirely in- evolution crementally using online learning, as required by the Now- or-Never bottleneck. Following our idea, expressed in the Item-based models of language processing and acquisition target article, that acquisition involves learning how to imply an item-based model of language change. So, assum- process language, CBL gradually learns simpli fi ed versions ing that items can be identi fi ed with constructions at ” Comprehension “ of both comprehension and production. various levels of abstraction (e.g., from individual lexical consists of the chunking of natural child-directed speech, items, all the way to constructions determining, for presented to the model word-by-word (essentially, a varia- example, canonical word order), then the structure of the tion of “ shallow parsing, ” in line with evidence for the rel- language, both within a person, and across individuals, ed nature of child and adult language fi atively underspeci will change construction-by-construction, rather than comprehension; e.g., Frank & Bod 2011 ; Gertner & ipping of an abstract parameter, which may fl through the – see also Ferreira & 2012 ; Sanford & Sturt 2002 Fisher have diverse and widespread implications (e.g., Lightfoot Christianson ). 1991 ). Note, though, that more abstract constructions BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 55 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

56 Response/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language ). The pressure for such increases 2012 Language (Sandler may be relevant to a large number of sentences of the in complexity seems clear: the drive to communicate. While language. So the fact that the language changes one con- some theorists have argued that the language is not primar- struction at a time does not imply, for example, that it ” designed ily for communication, but rather for thought “ changes one sentence at a time. ), we suggest that the social impor- 2010 (e.g., Chomsky We see language change within a given language com- tance of communication underlies the continual generation munity as an accumulation of changes within the set of con- of new linguistic items, and the recombination of existing structions acquired by the members of that community. items in creative new ways. Of course, such forms are And we view the evolution of language as nothing more fi cation and erosion then subject to the forces of simpli than language change writ large. In particular, this when they are transmitted across generations of speakers – implies that we see the evolution of language as a result the forces described by theories of grammaticalization. of processes of cultural evolution over long periods of The picture of language as attempting to maximize commu- human history constrained by communicative goals, as nication richness, in the face of memory constraints, is well as our cognitive and neural machinery, rather than re- Lewis & Frank . elegantly outlined by sulting from the biological evolution of a language faculty Bergmann et al. through processes of natural selection or some other mech- note that language change can be af- anism. In short, language is shaped by the brain, rather than fected by the nature of the language community. For the brain being shaped by language (Chater & Christiansen example, the presence of a large number of second-lan- 2009 2008 ). In ; Christiansen & Chater 2010 ; Chater et al. guage speakers (and the properties of their fi rst language) the target article, we aimed to expand on this perspective will affect how the new language is processed and transmit- c properties of language, such as by exploring how speci fi ted. After all, the Chunk-and-Pass machinery built for a its highly local structure, the existence of duality of pattern- rst language will typically be recruited to process a fi ing, and so on, might arise given the powerful constraint second language, resulting in non-native patterns of chunk- imposed by the Now-or-Never bottleneck. ing. Preliminary support for this perspective comes from Endress & Katzir In this light, ’ s concern that we may fi analyses of the productions of rst (L1) and second (L2) ating the cultural and biological evolution of lan- be con fl language learners using the earlier mentioned CBL guage can be set aside; we explicitly reject the idea that ). McCauley model (McCauley & Christiansen 2014a ; 2015c there is any substantive biological evolution of language and Christiansen ( 2015a ) used the CBL model to compare (any more than there has been substantive evolu- biological “ the of productions by native Italian speakers chunkedness ” tion of any other cultural form, whether writing, mathemat- learning English or German, when compared with either ics, music, or chess) although, of course, there will be an child or adult native speakers of English and German. interesting biological evolutionary story to tell about the The results showed that, compared to those of the L2 cognitive and neural precursors upon which language has – speakers, the productions of the native speakers ’ Lotem et al. s worry that we have for- been built. Similarly, were considerably more whether children or adults – ” gotten about biological evolution is also misplaced. The “ fi t chunked as measured by repeated multiword sequences. between language and language users arises because lan- The inability of L2 speakers to chunk incoming input in a guage is a cultural product that is shaped around us (and uence their native-like way is likely to negatively in fl xed and exoge- fi our memory limitations), rather than a mastery of fundamental regularities such as morphology nously given system to which the brain must adapt. submitted and case (Arnon & Christiansen, ). In languages s sug- Indeed, our perspective aligns with Charles Darwin ’ with a preponderance of non-native speakers, the L2 learn- gestion that the cultural evolution of language can be ers may exert a greater pressure to regularize and otherwise viewed as analogous to biological evolution through simplify the language, as Bergmann et al. point out. Thus, The Descent of Man , natural selection. As early as in the impact of the Now-or-Never bottleneck and the specif- Darwin discussed the cultural evolution of linguistic ics of Chunk-and-Pass processing will vary to some degree forms in light of biological adaptation: “ The formation of based on individual experiences with particular languages. different languages and of distinct species, and the proofs Viewing language change as ope ratingconstruction-by-con- that both have been developed through a gradual struction does not necessarily rule out the possibility of abrupt 1871 , p. 59). ” (Darwin process, are curiously the same change, as we noted earlier – modifying a single abstract con- One of the great challenges of evolution by natural selec- struction (e.g., a ditransitive construction, Subj V Obj Obj ) 1 2 tion is to explain how biological organisms can increase in may have far-reaching consequences. Hence, we can disre- complexity. Darwin ’ s answer was that such complexity scontentionthatourapproachisin- ’ Endress & Katzir gard may be favored if it increases the number of offspring at consistent with a part of the literature, which they suggest that is, if it improves A par- ” tness. fi the next generation “ – reports that language change is abrupt and substantial. allel challenge arises for explaining the presumed increase The question of whether language change provides evi- in complexity of human languages, from, we may assume, fi “ deep ” dence for modi cations of linguistic principles initially limited systems of signed or vocal communication, or operates construction-by-construction is by no means to the huge richness in phonology, morphology, vocabulary, settled in the literature, as is the question of whether mac- and syntax, of contemporary natural languages, an issue roscopic linguistic change in a community over historical . Indeed, gradual increases in complexity Behme raised by time is actually abrupt at the level of individual speakers can happen relatively quickly, as indicated by the fact that ; Wang 1991 1993 (e.g., Hopper & Traugott ; Lightfoot the adults from whom they ” outperform “ children can 1977 an issue parallel to the gradualist vs. punctate equi- – learn language (Singleton & Newport 2004 ), and the incre- librium controversy in biological evolutionary theory, e.g., mental incorporation of new linguistic structures into ; Eldredge & Gould Dawkins 1986 1972 ). If compelling ev- emergent languages such as the Nicaraguan Sign Language idence could be found suggesting that language change in- (Senghas et al. 2004 ) or the Al-Sayyid Bedouin Sign volves modi fi cation of highly abstract linguistic principles 56 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

57 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Response/ subscripts indicate subject-noun/verb relationships: not embedded in a single construction, then this would contradict the item-based model that we see as following The chef 1. from the Now-or-Never bottleneck. But we do not who the waiter appreciated admired the 2 2 1 1 believe that the literature provides such evidence. musicians . The chef 2. who the waiter who the busboy offended 2 3 3 1 appreciated . admired the musicians 1 2 R4.3. Implications for language structure Whereas (1) is easy to comprehend, (2) creates problems Commentators provide a wide range of viewpoints con- for most people (e.g., Blaubergs & Braine 1974 ; Hakes cerning the relationship between the present account and ; Hamilton & Deese et al. 1976 ). This 1970 ; Wang 1971 language structure. A particular concern is how far the problem with multiple long-distance dependencies is not Now-or-Never bottleneck is able to capture so-called lan- unique to English but has also been observed for center- guage universals. We stress that we see broad patterns embedded constructions in French (Peterfalvi & Locatelli across languages, whether exception-less or merely statisti- 1986 ), German (Bach et al. ), Spanish (Hoover 1971 cal universals, as arising from the interaction of a multitude ), Hebrew (Schlesinger 1992 ), Japanese (Uehara & 1975 of constraints, including perceptual and cognitive factors, Bradley ), and Korean (Hagstrom & Rhee 1996 1997 ). communicative pressures, the structure of thought, and Indeed, corpus analyses of Danish, English, Finnish, so on (Christiansen & Chater 2008 ). Moreover, the trajec- French, German, Latin, and Swedish (Karlsson 2007 ) indi- tory of change observed for a particular language will also cate that doubly center-embedded sentences such as (2) be determined by a range of cultural and historical are practically absent from spoken language. Evidence forces, including sociolinguistic factors, language contact, from sequence learning suggests that the problems with and so on. In view of the interaction of this broad range multiple center-embeddings do not derive from semantic of factors, it may be unlikely that many aspects of language or referential complications but rather are due to basic are strictly universal, and indeed, human languages do memory limitations for sequential information (de Vries seem to exhibit spectacular variety, including on such et al. 2012 ), as discussed in the target article. These basic matters as the nature and number of syntactic catego- memory limitations may even result in the kind of “ illusion ries. Yet even if strict language universals are, to some ” of grammaticality , as when the noted by Chacón et al. ), we 2009 degree at least, a myth (Evans & Levinson second verb in (2) is removed to yield the sentence in (3), should nonetheless expect that language will be shaped, which to many people seems quite acceptable and even in part, by cognitive constraints, such as the Now-or- ; 2009 comprehensible (e.g., Christiansen & MacDonald Never bottleneck. ): Gibson & Thomas 2010 1999 ; Vasishth et al. In this light, concerns that the Now-or-Never bottleneck does not provide an account of all putatively universal 3. The chef offended who the waiter who the busboy features of language ( Medeiros et al Chacón et al. ; .; 3 1 2 3 . the musicians admired ) can be set aside. Explaining the Endress & Katzir 1 cross-linguistic patterns they mention using the aforemen- However, these memory limitations interact with the sta- tioned multiple constraints is likely to be a valuable direc- tistics of the language being used (as discussed previously) tion for future research. Indeed, we would argue that the missing verb ” effect can be observed “ such that the above fi c and concrete Now-or-Never bottleneck is a speci ) but not in German 2009 in French (Gimenes et al. example of the type of cognitive constraint that Medeiros et al. believe to underlie universal or near-universal fea- (Vasishth et al. ) or Dutch (Frank et al. 2010 ). 2016 tures of language. Because verb- nal constructions are common in German fi Kempson et al. argue that the Now-or Never Bottle- and Dutch, requiring the listener to track dependency neck and its implications have interesting links with relations over a relatively long distance, substantial prior formal theories of grammar, such as dynamic syntax, in experience with these constructions likely has resulted in which there is a close relationship between grammatical fi c processing improvements (see also language-speci structure and processing operations. Similarly, O Grady ’ , for Engelmann & Vasishth 2009 2016 ; Frank et al. suggests that the processing bottleneck is manifested similar perspectives). Nonetheless, in some cases the differently in different languages. We agree insofar as missing verb effect may appear even in German, under memory limitations arise from the interaction of the cogni- conditions of high processing load (Trotzke et al. 2013 ). tive system with the statistical structure of the language We would expect that other nonlocal dependencies (e.g., c proposals here and else- ’ fi Grady being learned. O s speci ’ , Levinson , Chacón et al. , Medeiros et al. as noted by where ( ) provide a promising direction for ; 2013 2015a ) would be amenable to similar types of expla- MacDonald – and cross-linguistically the development of a detailed nation within the framework of Chunk-and-Pass processing analysis, linking structure and processing in a way – valid and (as also noted by ). Kempson et al. Grady ’ O that is consistent with the Now-or-Never bottleneck. Onepropertymentionedby severalcommentators asbeing a widespread ( Chacón et al.; Levinson; MacDonald )ifnot R5. Conclusions and future directions ) is the exis- Medeiros et al. universal property of language ( tence of nonlocal dependencies. We have provided a broad Our target article highlights a fundamental constraint account of complex recursive structures incorporating long- imposed by memory interference on the processing and distance dependencies elsewhere (Christiansen & Chater production of sequential material, and in particular, on y discuss an often-cited 2016 ; 2015 fl ). Here, we brie language. Dealing with this Now-or-Never bottleneck re- example of long-distance dependencies in the form of quires, we argue, the chunking and recoding of incoming center-embedding as exempli ed in (1) and (2), where the fi material as rapidly as possible, across a hierarchy of BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 57 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

58 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language References/ Christiansen – in line with our account of Chunk-and-Pass processing – representational levels (this is Chunk-and-Pass processing). do not assume that distributional information is all there is to language ac- Similarly, it requires specifying the representations in- “ quisition: Young learners are likely to rely on many additional sources of volved in producing language just before they are used information (e.g., semantic, phonological, prosodic) to be able to infer dif- (this is Just-in-Time production). These proposals them- (Reali & Christian- ferent aspects of the structure of the target language ” , p. 1024). sen 2009 selves have, we suggest, a variety of implications for lan- ) raise a common Wang & Mintz (see also 3. Endress & Katzir guage structure (e.g., that such structure is typically concern relating to usage-based models: that the sparseness of the input highly local), for acquisition, and for language change and will prevent them from being able to process novel word sequences that evolution (e.g., that language changes construction-by-con- are grammatical but not predictable (such as Evil unicorns devour xylo- struction both within individuals during learning, and over ) addressed this challenge head-on, showing in 2005 ). Reali et al. ( phones a statistical learning experiment that human participants become suf - fi generations within entire language communities). ciently sensitive to the regularities of training examples to recognize The commentaries on our article have raised important novel sequences whose bigram transitions are absent in training. They sub- issues of clari cation (e.g., differentiating the present pro- fi sequently showed that a simple recurrent network (Elman 1990 ) could posals from bottom-up, syntax-driven models such as the correctly process sequences that contain null-probability bigram informa- tion by relying on distributional regularities in the training corpus. Thus, in ); have clari Sausage Machine, Frazier & Fodor ed 1978 fi contrast to the claims of Endress & Katzir , distributional learning important links with prior models and empirical results ciently powerful to deal with unpredictable but gram- appears to be suf fi Ferreira & parsing, ” good enough “ (e.g., the link with 1957 Colorless ) famous sentence s( ’ matical sequences such as Chomsky Christianson ); and have outlined supporting evidence green ideas sleep furiously 1999 (see also Allen & Seidenberg ). (e.g., from the time-course of neural activity involved in .) and pointed out language processing, e.g., Honey et al ways in which the approach can be deepened and made ). One commentator more linguistically concrete ( O ’ Grady ); fears that our proposals may be unfalsi fi able ( Levinson References ed others suspect that our approach may actually be falsi fi and a “ [The letters ” s initials stand for target article and ’ before author ” r “ by known features of language structure ( Medeiros response references, respectively] ), acquisition ( Wang & ), processing ( et al. MacDonald ), or language change ( Mintz ). We Endress & Katzir Acheson, D. J. & MacDonald, M. C. (2011) The rhymes that the reader perused hope that our target article will persuade readers that confused the meaning: Phonological effects on on-line sentence comprehen- memory constraints have substantial implications for un- – Journal of Memory and Language 65:193 sion. 207. [MCM, rNC] Adams, J. A. (1971) A closed-loop theory of motor learning. Journal of Motor derstanding many aspects of language, and that our re- Behavior – 50. [AB] 3:111 sponse to commentators makes the case that the many Allen, J. & Seidenberg, M. S. (1999) The emergence of grammaticality in connec- owing from the Now-or-Never bottleneck are com- claims fl tionist networks. In: – , ed. B. MacWhinney, pp. 115 The emergence of language patible with what is known about language (although not 51. Erlbaum. [rNC] Allopenna, P. D., Magnuson, J. S. & Tanenhaus, M. K. (1998) Tracking the time always with what is presumed to be the case by prior theo- course of spoken word recognition using eye movements: Evidence for contin- ries). Most important, we encourage interested readers – 39. [aMHC] uous mapping models. Journal of Memory and Language 38:419 to continue the work of the many commentators who Altmann, G. T. M. (2002) Learning and development in neural networks: The im- provide constructive directions to further explore the – 85:43 Cognition portance of prior experience. 50. [aMHC, MLD] nature of the Now-or-Never bottleneck, further elaborate Altmann, G. T. M. (2004) Language-mediated eye movements in the absence of a visual world: The “ blank screen paradigm. ” Cognition 93:79 87. [aMHC] – and test the Chunk-and-Pass and Just-in-Time perspectives Altmann, G. T. M. & Kamide, Y. (1999) Incremental interpretation at verbs: Re- on language processing, and help integrate the study of Cognition 64. [aMHC] stricting the domain of subsequent reference. – 73:247 these performance constraints into our understanding Altmann, G. T. M. & Kamide, Y. (2009) Discourse-mediation of the mapping of key aspects of language structure, acquisition, and evolu- between language and the visual world: Eye movements and mental repre- sentation. 71. [MLD, aMHC] – 111:55 Cognition tion (for some steps in this direction, see Christiansen & Altmann, G. T. M. & Mirkovic, J. (2009) Incrementality and prediction in human 2016 Chater ). 609. [aMHC] sentence processing. – 33:583 Cognitive Science Altmann, G. T. M. & Steedman, M. J. (1988) Interaction with context during human NOTES 38. [aMHC] Cognition 30:191 – sentence processing. “ early observations about speech errors contend that 1. Chacón et al. Ambridge, B., Rowland, C. & Pine, J. (2008) Is structure dependence an innate indicated that exchange errors readily cross phrasal and clausal boundaries ’ s complex question constraint? New experimental evidence from children 1980 ” ) (Garrett (para. 7). A careful reading of Garrett, however, shows that 55. [rNC] production. Cognitive Science 32:222 – phrases, as would be expected most exchange errors tend to occur within Ames, H. & Grossberg, S. (2008) Speaker normalization using cortical strip maps: A from our perspective. neural model for steady state vowel categorization. Journal of the Acoustical 2. Wang & Mintz seem to have misunderstood the aim of the model- 124:3918 Society of America – 36. [SG] ). Their point was not to provide a full- ing by Reali and Christiansen ( 2005 Anderson, J. R. (1990) The adaptive character of thought . Erlbaum. [aMHC, KB] edged model of so-called auxiliary fronting in complex yes/no questions fl Anderson, J. R. & Milson, R. (1989) Human memory: An adaptive perspective. (such as ) but rather to demonstrate Is the dog that is on the chair black? 96:703. [rNC] Psychological Review fi cient statistical information that the input to young children provided suf fl Anderson, J. R. & Schooler, L. J. (1991) Re ections of the environment in memory. – 408. [rNC] Psychological Science 2:396 for them to distinguish between grammatical and ungrammatical forms of Anderson, M. L. (2010) Neural reuse: A fundamental organizational principle of the ) noted some limitations of the simplest 2008 such sentences. Kam et al. ( Behavioral and Brain Sciences brain. 34:245 – 66. [AB, AL] bigram model used by Reali and Christiansen, but failed to address the Anderson, P. W. (1972) More is different. Science 177:393 96. [MLL] – t the results from the classic study by fi fact that not only did the model Aoshima, S., Phillips, C. & Weinberg, A. (2004) Processing ller-gap dependencies fi ) but also correctly predicted that children Crain and Nakayama ( 1987 should make fewer errors involving high-frequency word chunks com- Journal of Memory and Language – 54. [DAC] in a head- 51:23 fi nal language. pared to low-frequency chunks in a subsequent question elicitation Appelt, D., Hobbs, J., Bear, J., Israel, D. & Tyson, M. (1993) FASTUS: A nite-state fi study (Ambridge et al. ; see Reali & Christiansen 2009 ). For 2008 International Joint processor for information extraction from real-world text. example, higher rates of auxiliary-doubling errors occur for questions Conference on Arti , cial Intelligence fi ed. A. Chambery, France, August 1993. where such errors involved high-frequency word category combinations Ralescu, pp. 1172 – 78. [CRH] elephant is Is the boy who is washing the (e.g., more errors such as * Arbib, M. A. (2005) From monkey-like action recognition to human language: An elephant are tired? ). than * Are the boys who are washing the tired? evolutionary framework for neurolinguistics. Behavioral and Brain Sciences Most important for current purposes is the fact that Reali and 28:105 – 24. [aMHC] BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 58 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

59 References/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language . Arbib, M. A. (2012) How the brain got language: The mirror system hypothesis Bentz, C. & Winter, B. (2013) Languages with more second language Oxford University Press. [DPM] learners tend to lose nominal case. Language Dynamics and Change Arnon, I. & Christiansen, M. H. (submitted) The role of multiword building blocks in 27. [TB] – 3:1 Berwick, R. C. (1985) The acquisition of syntactic knowledge . MIT Press. [aMHC] explaining L1-L2 differences [aMHC, rNC] Berwick, R. C., Chomsky, N. & Piattelli-Palmarini, M. (2013) Poverty of the stimulus – Arnon, I. & Clark, E. V. (2011) Why brush your teeth is better than teeth Child- Rich languages from poor inputs stands: Why recent challenges fail. In: , ed. ’ ren s word production is facilitated in familiar sentence-frames. Language – M. Piattelli-Palmarini & R. C. Berwick, pp. 19 42. Oxford University Press. 29. [aMHC] – Learning and Development 7:107 [DPM] Arnon, I. & Cohen Priva, U. (2013) More than words: The effect of multi-word Berwick, R. C., Friederici, A. D., Chomsky, N. & Bolhuis, J. J. (2013) Evolution, Language and Speech frequency and constituency on phonetic duration. – Trends in Cognitive Sciences brain, and the nature of language. 100. 17:91 73. [aMHC, rNC] 56:349 – [aMHC] Arnon, I. & Snider, N. (2010) More than words: Frequency effects for multi-word Berwick, R. C. & Weinberg, A. S. (1984) The grammatical basis of linguistic per- – Journal of Memory and Language phrases. 62:67 82. [aMHC] formance . MIT Press. [aMHC] Aronoff, M., Meir, I. & Sandler, W. (2005) The paradox of sign language morphol- Cognition and the Bever, T. (1970) The cognitive basis for linguistic structures. In: 44. [KE] – 81(2):301 Language ogy. 362. Wiley. [aMHC, – , ed. J. R. Hayes, pp. 279 development of language Austin, J. L. (1962) How to do things with words . Harvard University Press. [aMHC] DPM] Bach, E., Brown, C. & Marslen-Wilson, W. (1986) Crossed and nested dependencies Bever, T. G. (1975) Cerebral asymmetries in humans are due to the differentiation of Language and Cognitive in German and Dutch: A psycholinguistic study. Annals of the New York two incompatible processes: Holistic and analytic. Processes – 62. [rNC] 1:249 62. [DPM] 263(1):251 – Academy of Sciences Baddeley, A. (1987) Working memory . Oxford University Press. [SCL] ’ cation of correlated activity: Hebb fi Bi, G. & Poo, M. (2001) Synaptic modi s postu- – 255:556 59. [rNC] Science Baddeley, A. (1992) Working memory. late revisited. Annual Review of Neuroscience 24:139 – 66. [SLF] Baddeley, A. D. & Hitch, G. J. (1974) Working memory. The Psychology of Learning Bickel, B., Banjade, G., Gaenszle, M., Lieven, E., Paudyal, N., Rai, I. P., Rai, M., Rai, 8:47 89. [rNC] – and Motivation x ordering in Chintang. N. K. & Stoll, S. (2007) Free pre fi Language 83(1):43 – Baddeley, R. & Attewell, D. (2009) The relationship between language and the 73. [SCL] environment: Information theory shows why we have only three lightness terms. Behavioral and Brain Bickerton, D. (1984) The language bioprogram hypothesis. 20:1100 – Psychological Science 107. [MLL] Sciences – 221. [aMHC] 7:173 Badets, A., Koch, I. & Philipp, A. M. (2016) A review of ideomotor approaches Bicknell, K. & Levy, R. (2010) A rational model of eye movement control in reading. to perception, cognition, action, and language: Advancing a cultural recy- In: Proceedings of the 48th annual meeting of the Association for Computational 15. doi: 10.1007/s00426-014- – 80:1 Psychological Research cling hypothesis. ̌ 78. Asso- , S. Carberry, S. Clark, & J. Nivre, pp. 1168 Linguistics – , ed. J. Hajic 0643-8. [AB] ciation for Computational Linguistics. [KB] Badets, A. & Osiurak, F. (2015) A goal-based mechanism for delayed motor inten- Bicknell, K., Tanenhaus, M. K. & Jaeger, T. F. (2015) Listeners can maintain and tion: Considerations from motor skills, tool use and action memory. Psycho- rationally update uncertainty about prior words. Manuscript submitted for 60. [AB] 79:345 logical Research – publication. [KB] Badets, A. & Rensonnet, C. (2015) Une approche idéomotrice de la cognition. nite trees. fi Blackburn, S. & Meyer-Viol, W. (1994) Linguistics, logic and Bulletin of – 115:591 ’ L Année psychologique 635. [AB] 29. [RK] – Interest Group in Pure and Applied Logics 2:3 Baker, M. (2001) The atoms of language . Basic Books. [aMHC] Blaubergs, M. S. & Braine, M. D. S. (1974) Short-term memory limitations on de- . Cambridge University Baker, M. C. (2008) The syntax of agreement and concord – 102:745 Journal of Experimental Psychology coding self- embedded sentences. Press. [DPM] 48. [rNC] Baker, M. C. (2013) On agreement and its relationship to case: Some generative Blokland, G. A. M., McMahon, K. L., Thompson, P. M., Martin, N. G., de Zubicaray, 32. [DPM] – 130(June):14 Lingua ideas and results. Child contribution to the achievement of joint reference. ’ G. I. & Wright, M. J. (2011) Heritability of working memory brain activation. Baldwin, D. (1991) Infants 62:874 – 90. [MLL] Development 90. [AL] – The Journal of Neuroscience 31:10882 Baldwin, D. A. (1993) Infants ’ ability to consult the speaker for clues to word ref- Boardman, I., Grossberg, S., Myers, C. & Cohen, M. (1999) Neural dynamics of 18. [MLL] erence. Journal of Child Language 20:395 – Percep- perceptual order and context effects for variable-rate speech syllables. Bannard, C. & Matthews, D. (2008) Stored word sequences in language learning. – tion and Psychophysics 6:1477 500. [SG] Psychological Science 48. [aMHC] – 19:241 Bock, J. K. (1982) Toward a cognitive psychology of syntax: Information processing contributions to sentence formulation. Psychological Review Nature Reviews Neuroscience Bar, M. (2004) Visual objects in context. 29. – 5:617 47. [aMHC] – 89:1 [aMHC] Bock, J. K. (1986) Meaning, sound, and syntax: Lexical priming in sentence pro- Bar, M. (2007) The proactive brain: Using analogies and associations to generate Journal of Experimental Psychology: Learning, Memory, and Cogni- duction. predictions. Trends in Cognitive Sciences 11:280 – 89. [aMHC] tion 12:575 – 86. [aMHC] Bard, E. G., Shillcock, R. C. & Altmann, G. T. M. (1988) The recognition of words 39. [aMHC] Bock, J. K. & Loebell, H. (1990) Framing sentences. Cognition 35:1 – after their acoustic offsets in spontaneous speech: Effects of subsequent – Bock, J. K. & Miller, C. A. (1991) Broken agreement. Cognitive Psychology 23:45 Perception and Psychophysics 408. [KB] context. 44:395 – 93. [aMHC] Barsalou, L. W. (2008) Grounded cognition. Annual Review of Psychology – 59:617 Natural Bock, K. (1987) Exploring levels of processing in sentence production. In: 45. [MLD] 63. Springer. [DAC] – , ed. G. Kempen, pp. 351 language generation Bates, E. & MacWhinney, B. (1982) Functionalist approaches to grammar. In: Bock, K. & Cutting, J. C. (1992) Regulating mental energy: Performance units , ed. E. Wanner & L. Gleitman, pp. Language acquisition: The state of the art in language production. 127. [rNC] 31:99 Journal of Memory and Language – 218. Cambridge University Press. [DPM] – 173 Cognitive Psychology 23:45 – Bock, K. & Miller, C. A. (1991) Broken agreement. Baumann, T. & Schlangen, D. (2012) INPRO_iSS: A component for just-in-time 93. [DAC] Proceedings of the ACL 2012 System Dem- incremental speech synthesis. In: Bod, R. (2009) From exemplar to grammar: A probabilistic analogy – based model of , pp. 103 108. Association for Computational Linguistics. – onstrations Cognitive Science – 33:752 93. [aMHC] language learning. [aMHC] Boeckx, C. & Leivada, E. (2013) Entangled parametric hierarchies: Problems for an Bavelas, J. B. & Gerwing, J. (2007) Conversational hand gestures and facial displays fi ed universal grammar. PLOS ONE 8(9):e72357. [aMHC] overspeci in face-to-face dialogue. In: Social communication , ed. K. Fiedler, pp. 283 – 308. Boemio, A., Fromm, S., Braun, A. & Poeppel, D. (2005) Hierarchical and asym- Psychology Press. [PGTH] metric temporal sensitivity in human auditory cortices. Nature Neuroscience Beckner, C., Blythe, R., Bybee, J., Christiansen, M. H., Croft, W., Ellis, N., Holland, 95. [GB] – 8(3):389 J., Ke, J., Larsen- Freeman, D. & Schoenemann, T. (2009) Language Bohnet, B. & Nivre, J. (2012) A transition-based system for joint part-of-speech 59 Language Learning is a complex adaptive system: Position paper. Proceedings of the tagging and labeled non-projective dependency parsing. In: (Suppl. 1):1 27. [aMHC] – 2012 Joint Conference on Empirical Methods in Natural Language Processing Behme, C. (2014a) Assessing direct and indirect evidence in linguistic research. – and Computational Natural Language Learning, Jeju Island, Korea, July 12 14, 83. [CB] – Topoi 33:373 65. Association for 2012 , ed. J. Tsujii, J. Henderson & M. Pasca, pp. 1455 – Evaluating Cartesian linguistics: From historic antecedents to Behme, C. (2014b) Computational Linguistics. [CG-R] computational modeling. Peter Lang. [CB] Borovsky, A., Elman, J. L. & Fernald, A. (2012) Knowing a lot for one s age: Vo- ’ Belletti, A., ed. (2004) Structures and beyond. The cartography of syntactic struc- cabulary skill and not age is associated with anticipatory incremental sentence tures, vol. 3. Oxford University Press. [DPM] Journal of Experimental Child Psychology interpretation in children and adults. Bellugi, U. & Fisher, S. (1972) A comparison of sign language and spoken language. – 112:417 36. [aMHC] 1:173 Cognition 200. [aMHC] – fi Börschinger, B. & Johnson, M. (2011) A particle lter algorithm for Bayesian word – Bellugi, U., Klima, E. S. & Siple, P. (1975) Remembering in signs. Cognition 3:93 segmentation. In: Proceedings of the Australasian Language Technology 25. [KE] 59 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

60 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language References/ , Canberra, Australia, ed. D. Mollá & D. Martinez, pp. Association Workshop Bybee, J. L. & Slobin, D. I. (1982) Rules and schemas in the development and use of 18. [ADE] – 10 – the English past tense. Language 58:265 89. [aMHC] Language Sciences Bortfeld, H., Morgan, J. L., Golinkoff, R. M. & Rathbun, K. (2005) Mommy and me: Campbell, L. (2000) What ’ s wrong with grammaticalization? 23:113 – 61. [aMHC] Familiar names help launch babies into speech-stream segmentation. Psycho- Cann, R. & Kempson, R. (2008) Production pressures, syntactic change and the 304. [RM, rNC] – 16(4):298 logical Science Language in emergence of clitic pronouns. In: ux: Dialogue coordination, fl Botvinick, M. M. (2008) Hierarchical models of behavior and prefrontal function. language variation, change and evolution , ed. R. Cooper & R. Kempson, 208. [rNC] – Trends in Cognitive Sciences 12:201 63. College Publications. [aMHC] – pp. 221 Bouzouita, M. & Chatzikyriakidis, S. (2009) Clitics as calci fi ed processing strategies. Cann, R., Kempson, R. & Marten, L. (2005) The dynamics of language: An intro- , ed. M. Butt & T. Hol- In: Proceedings of LFG09, Cambridge, UK, July 2009 . Elsevier (now published by Emerald Insight Publishers). [RK] duction loway-King, pp. 189 – http://www.stergioschatzi- 207. CSLI Press. Available at: Cann, R., Kempson, R. & Wedgwood, D. (2012) Representationalism and linguistic kyriakidis.com/uploads/1/0/3/6/10363759/lfg09bouzouitachatzikyriakidis_2. , ed. R. Kempson, T. Fernando & N. Philosophy of linguistics knowledge. In: pdf . [RK] Asher, pp. 357 – 402. Elsevier. [aMHC] Boyd, R. & Richerson, P. J. (1987) The evolution of ethnic markers. Cultural An- Cann, R., Purver, M. & Kempson, R. (2007) Context and wellformedness: The dy- thropology 2:65 – 79. [aMHC] namics of ellipsis. 58. [RK] – 5:333 Research on Language and Computation Branigan, H., Pickering, M. & Cleland, A. (2000) Syntactic co-ordination in dialogue. Papers and Reports on Child Carey, S. & Bartlett, E. (1978) Acquiring a single word. 75:13 25. [aMHC] Cognition – 29. [ADE] 15:17 Language Development – Bransford, J. D. & Johnson, M. K. (1972) Contextual prerequisites for understand- Carney, A. E., Widin, G. P. & Viemeister, N. F. (1977) Noncategorical perception of ing: Some investigations of comprehension and recall. Journal of Verbal stop consonants differing in VOT. Journal of the Acoustical Society of America – 11:717 Learning and Verbal Behavior 26. [rNC] 62:961 – 70. [KB] Auditory scene analysis: The perceptual organization of Bregman, A. S. (1990) Carr, M. F., Jadhav, S. P. & Frank, L. M. (2011) Hippocampal replay in the awake . MIT Press. [aMHC, rNC] sound state: A potential substrate for memory consolidation and retrieval. Nature Brennan, S. E. & Schober, M. F. (2001) How listeners compensate for dis uencies in fl Neuroscience 53. [aMHC] – 14:147 96. [PGTH] 44(2):274 spontaneous speech. Journal of Memory and Language – Carstairs-McCarthy, A. (1992) . Routledge. [aMHC] Current morphology The MIT Press. A prosodic model of sign language phonology. Brentari, D. (1998) Carstensen, A., Xu, J., Smith, C. & Regier, T. (2015) Language evolution in the lab [KE] tends toward informative communication. In: Proceedings of the 37th Annual Brinton, B., Fujiki, M., Loeb, D. F. & Winkler, E. (1986) Development of conver- Meeting of the Cognitive Science Society, Austin, TX, July 2015 , ed. D. C. cation. sational repair strategies in response to requests for clari Journal of fi Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings & Speech, Language, and Hearing Research 29(1):75 – 81. [GB] P. P. Maglio, pp. 303 – 308. Cognitive Science Society. [MLL] Broadbent, D. (1958) Perception and communication . Pergamon Press. [aMHC] Catani, M., Jones, D. K. & ffytche, D. H. (2005) Perisylvian language networks of the Brodsky, P., Waterfall, H. & Edelman, S. (2007) Characterizing motherese: On the Annals of Neurology human brain. 16. [GB] – 57(1):8 Proceedings of the 29th computational structure of child-directed language. In: Cattell, J. M. (1886) The time it takes to see and name objects. – 65. Mind 11:63 August 2007, pp. 833 Meeting of the Cognitive Science Society, 38, ed. D. S. – [MCM] McNamara & J. G. Trafton. Cognitive Science Society. [rNC] Chacón, D., Imtiaz, M., Dasgupta, S., Murshed, S., Dan, M. & Phillips, C. (sub- Brown, G. D. A., Neath, I. & Chater, N. (2007) A temporal ratio model of memory. fi mitted) Locality in the processing of ller-gap dependencies in Bangla. – Psychological Review 114:539 76. [aMHC, rNC] [DAC] Brown, M., Dilley, L. C. & Tanenhaus, M. K. (2014) Probabilistic prosody: Effects of Chang, F., Dell, G. S. & Bock, K. (2006) Becoming syntactic. Psychological Review relative speech rate on perception of (a) word(s) several syllables earlier. In: 72. [aMHC] – 113:234 Proceedings of the 7th International Conference on Speech Prosody, Dublin, Chater, N. & Christiansen, M. H. (2010) Language acquisition meets language Ireland, May 20 – , ed. N. Campbell, D. Gibbon & D. Hirst. pp. 1154 23, 2014 – Cognitive Science evolution. 57. [AL, rNC] – 34:1131 58. Dublin. [KB] Chater, N., Crocker, M. J. & Pickering, M. J. (1998) The rational analysis of inquiry: Brown-Schmidt, S. & Konopka, A. E. (2011) Experimental approaches to referential , ed. M. Oaksford & N. The case of parsing. In: Rational models of cognition domains and the on-line processing of referring expressions in unscripted 26. [aMHC] Chater, pp. 441 – 68. Oxford University Press. [aMHC] 2:302 – Information conversation. Brown-Schmidt, S. & Konopka, A. (2015) Processes of incremental message plan- Chater, N., McCauley, S. M. & Christiansen, M. H. (2106) Language as skill: ning during conversation. 22:833 – 43. Psychonomic Bulletin and Review Journal of Memory and Language Intertwining comprehension and production. – 54. [aMHC, rNC] [aMHC] 89:244 Chater, N., Reali, F. & Christiansen, M. H. (2009) Restrictions on biological adap- Brown-Schmidt, S. & Tanenhaus, M. K. (2008) Real-time investigation of referential tation in language evolution. Proceedings of the National Academy of Sciences domains in unscripted conversation: A targeted language game approach. 20. [rNC] 106:1015 – 32:643 Cognitive Science – 84. [aMHC] Chater, N., Tenenbaum, J. B. & Yuille, A. (2006) Probabilistic models of cognition: Bubic, A., von Cramon, D. Y. & Schubotz, R. I. (2010) Prediction, cognition and the – 10:287 91. [aMHC] Trends in Cognitive Sciences Conceptual foundations. 15. [SW] – 4(25):1 Frontiers in Human Neuroscience brain. Chatzikyriakids, S. & Kempson, R. (2011) Standard modern and Pontic Greek Buonomano, D. V. & Maass, W. (2009) State-dependent computations: Spatiotemporal person restrictions: Feature-free dynamic account. Journal of Greek Linguistics – Nature Reviews Neuroscience 10:113 processingin cortical networks. 25. [SLF] – 10:127 66. Available at: http://www.kcl.ac.uk/innovation/groups/ds/publica- Burgess, N. & Hitch, G. J. (1999) Memory for serial order: A network model of the . [RK] tions/assets/chatzikyriakidis-kempson-jgl-draft.pdf phonological loop and its timing. 106:551. [rNC] Psychological Review Chen, D. & Manning, C. D. (2014) A fast and accurate dependency parser using ning “ fi ” In: Communication by chem- Burghardt, G. M. (1970) De communication. neural networks. In: Proceedings of the 2014 Conference on Empirical Methods , ed. J. W. Johnston Jr., D. G. Moulton & A. Turk, pp. 5 18. Ap- – ical signals on Natural Language Processing, Doha, Qatar, October 25 , ed. A. 29, 2014 – pleton-Century-Crofts. [AL] Moschitti, B. Pang & W. Daelemans, pp. 740 – 50. Association for Computa- Bybee, J. (2002) Word frequency and context of use in the lexical diffusion of pho- tional Linguistics. [CG-R] netically conditioned sound change. Language Variation and Change 14:261 – Cherry, E. C. (1953) Some experiments on the recognition of speech with one and 90. [aMHC] – 79. with two ears. 25:975 Journal of the Acoustical Society of America Bybee, J. (2006) From usage to grammar: The mind ’ s response to repetition. Lan- [aMHC] guage 33. [aMHC] – 82:711 Choi, J. D. & McCallum, A. (2013) Transition-based dependency parsing with Bybee, J. (2007) Frequency of use and the organization of language . Oxford Uni- st Annual Meeting of the Asso- Proceedings of the 51 selectional branching. In: versity Press. [aMHC, DPM] ciation for Computational Linguistics (Volume 1: Long Papers), So a, Bulgaria, fi Bybee, J. (2009) Language universals and usage-based theory. In: Language uni- 62. Association for August 4 – 9, 2013 , ed. P. Fung & M. Poesio, pp. 1052 – versals , ed. M. H. Christiansen, C. Collins & S. Edelman, pp. 17 – 39. Oxford Computational Linguistics. [CG-R] University Press. [aMHC] . Mouton. [aMHC, rNC] Syntactic structures Chomsky, N. (1957) Bybee, J. & Hopper, P., eds. (2001) Frequency and the emergence of linguistic Chomsky, N. (1965) . MIT Press. [aMHC] Aspects of the theory of syntax . John Benjamins. [aMHC] structure Chomsky, N. (1975) fl ections on language Re . Pantheon Books. [CB] Bybee, J. & McClelland, J. L. (2005) Alternatives to the combinatorial paradigm of , ed. M. Language and learning Chomsky, N. (1980) The linguistic approach. In: The Linguistic linguistic theory based on general principles of human cognition. – 30. Harvard University Press. [CB] Piattelli-Palmerini, pp. 107 410. [aMHC] – 22:381 Review Chomsky, N. (1981) Lectures on government and binding . Mouton de Gruyter. The evolution of grammar: Tense, Bybee, J., Perkins, R. D. & Pagliuca, W. (1994) [aMHC] . University of Chicago aspect and modality in the languages of the world . Praeger Publishing. [CB] Knowledge of language Chomsky, N. (1986) Press. [aMHC] The architecture of language . Oxford University Press. [CB] Chomsky, N. (2000) Bybee, J. & Scheibman, J. (1999) The effect of usage on degrees of constituency: The Chomsky, N. (2002) . Cambridge University Press. [CB] On nature and language – 37:575 Linguistics t in English. ’ reduction of don 96. [aMHC, rNC] BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 60 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

61 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language References/ Behavioral and Chomsky, N. (2010) Some simple evo-devo theses: How true might they be for Crain, S. (1991) Language acquisition in the absence of experience. language? In: Approaches to the evolution of language , ed. R. K. Larson, V. M. 50. [DPM] Brain Sciences 14:597 – Deprez & H. Yamakido. Cambridge University Press. [rNC] Crain, S. & Nakayama, M. (1987) Structure dependence in grammar formation. interviews with James Chomsky, N. (2012) Language 63:522 – 43. [rNC] Noam Chomsky, the science of language – Crain, S., Thornton, R. & Khlentzos, D. (2009) The case of the missing generaliza- Cambridge University Press. [CB] McGilvray. 55. [DPM] – 20(1):145 Cognitive Linguistics tions. Christiansen, M. H. & Chater, N. (1999) Toward a connectionist model of recursion in – 14. 205. [aMHC, SLF] – 23:157 Cognitive Science human linguistic performance. Crick, F. & Mitchison, G. (1983) The function of dream sleep. Nature 304:111 Christiansen, M. H. & Chater, N. (2008) Language as shaped by the brain. Behav- [aMHC] Radical construction grammar . Oxford University Press. Croft, W. (2001) 58. [aMHC, ADE, DPM, rNC] – 31(05):489 ioral & Brain Sciences Christiansen, M. H. & Chater, N. (2015) The language faculty that wasn t: A usage- ’ [aMHC] Frontiers in Psychology Crowder, R. G. (1993) Systems and principles in memory theory: Another critique of 6:1182. based account of natural language recursion. , ed. A. Collins, S. Gathercole, M. Conway Theories of memory pure memory. In: doi: 10.3389/fpsyg.2015.01182. [rNC] Creating language: Integrating evolution, & P. Morris. Erlbaum. [rNC] Christiansen, M. H. & Chater, N. (2016) Crowder, R. G. & Neath, I. (1991) The microscope metaphor in human memory. In: . MIT Press. [rNC] acquisition, and processing Christiansen, M. H. & MacDonald, M. C. (2009) A usage-based approach to recursion Relating theory and data: Essays on human memory in honour of Bennet in sentence processing. 59(Suppl. 1):126 61. [aMHC,rNC] – ., ed. W. E. Hockley & S. Lewandowsky. Erlbaum. [aMHC] Language Learning B. Murdock, Jr Christianson, K. & Ferreira, F. (2005) Conceptual accessibility and sentence pro- Cui, J., Gao, D., Chen, Y., Zou, X. & Wang, Y. (2010) Working memory in early- – 35. [DAC] 98:105 duction in a free word order language (Odawa). Cognition school-age children with Asperger ’ s syndrome. Journal of Autism and Devel- Christianson, K., Hollingworth, A., Halliwell, J. F. & Ferreira, F. (2001) Thematic- opmental Disorders 40:958 – 67. [AL] Culbertson, J. & Adger, D. (2014) Language learners privilege structured meaning – 42:368 Cognitive Psychology roles assigned along the garden path linger. over surface frequency. 111 Proceedings of the National Academy of Sciences 407. [FF, rNC] 47. [DPM] – (16):5842 Christianson, K. & Luke, S. G. (2011) Context strengthens initial misinterpretations Culbertson, J. & Newport, E. L. (2015) Harmonic biases in child learners: In support c Studies of Reading fi Scienti of text. – 66. [FF] 15:136 139:71 Christianson, K., Williams, C. C., Zacks, R. T. & Ferreira, F. (2006) Misinterpreta- – 82. [MLL] of language universals. Cognition Discourse Processes Culbertson, J., Smolensky, P. & Legendre, G. (2012) Learning biases predict a word tions of garden-path sentences by older and younger adults. 29. [MLL, DPM] – 122(3):306 Cognition – 38. [FF] order universal. 42:205 Culicover, P. W. (1999) Chun, M. M. & Potter, M. C. (1995) A two-stage model for multiple target detection Syntactic nuts: Hard cases, syntactic theory, and language . Oxford University Press. [aMHC] acquisition Journal of Experimental Psychology: Human in rapid serial visual presentation. Culicover, P. W. (2013) The role of linear order in the computation of referential 27. [MCP] 21:109 Perception and Performance – ‘ Cinque, G. (1996) The Lingua 136:125 – 44. [aMHC] program: Theoretical and typological ’ dependencies. antisymmetric Curtiss, S. (1977) wild child. implications. Journal of Linguistics ” 32(2):447 – 64. [DPM] “ Genie: A psycholinguistic study of a modern-day Academic Press. [RM] . Cinque, G. (1999) Adverbs and functional heads: A cross-linguistic perspective . De Gruyter Slips of the tongue and language production Cutler, A., ed. (1982) Oxford University Press. [DPM] s universal 20 and its exceptions. ’ Linguistic Mouton. [aMHC] Cinque, G. (2005) Deriving Greenberg fi Cutler, A., Hawkins, J. A. & Gilligan, G. (1985) The suf 32. [DPM] 36(3):315 xing preference: A pro- – Inquiry Cinque, G. (2013) Cognition, universal grammar, and typological generalizations. cessing explanation. Linguistics 23:723 – 58. [KE] – 65. doi: 10.1016/j.lingua.2012.10.007. [DPM] Dahan, D. (2010) The time course of interpretation in speech comprehension. Lingua 130:50 Behavioral Cisek, P. & Kalaska, J. F. (2001) Common codes for situated interaction. Current Directions in Psychological Science 19:121 – 26. [aMHC] 24:883 84. [AB] and Brain Sciences – . Benjamins. Dahl, Ö. (2004) The growth and maintenance of linguistic complexity Clark, A. (2013) Whatever next? Predictive brains, situated agents, and the future of [TB] – 253. [aMHC, SW] cognitive science. Behavioral and Brain Sciences 36(3):181 Dale, R. & Lupyan, G. (2012) Understanding the origins of morphological diversity: 15(3 Advances in Complex Systems – 4):1 – 16. The linguistic niche hypothesis. Clark, H. H. (1975) Bridging. In: Proceedings of the 1975 Workshop on Theoretical . [TB] Available at: http://doi.org/10.1142/S0219525911500172 , ed. B. L. Issues in Natural Language Processing, Cambridge, MA, June 1975 . John Darwin, C. (1871) The descent of man, and selection in relation to sex, vol. 1 – 74. Association for Computational Lin- Nash-Webber & R. Shank, pp. 169 Murray. [rNC] guistics. [rNC] Davenport, J. L. & Potter, M. C. (2004) Scene consistency in object and background . Cambridge University Press. [aMHC, GB, MLL] Using language Clark, H. H. (1996) 64. [MCP] – perception. Psychological Science 15:559 Clark, J., Yallop, C. & Fletcher, J. (2007) An introduction to phonetics and phonol- Davenport, J. L. & Potter, M. C. (2005) The locus of semantic priming in RSVP . Wiley-Blackwell. [aMHC] ogy, third edition – Memory and Cognition 33:241 target search. 48. [MCP] Clément, S., Demany, L. & Semal, C. (1999) Memory for pitch versus memory for Dawkins, R. (1986) The blind watchmaker: Why the evidence of evolution reveals a loudness. Journal of the Acoustical Society of America 106:2805 – 11. [aMHC] . Norton. [rNC] universe without design Cohen, M. A. & Grossberg, S. (1986) Neural dynamics of speech and language de Vries, M. H., Christiansen, M. H. & Petersson, K. M. (2011) Learning coding: Developmental programs, perceptual grouping, and competition for 5:10 Biolinguistics recursion: Multiple nested and crossed dependencies. – – 22. [SG] 5:1 Human Neurobiology short-term memory. 35. [aMHC] Colman, M. & Healey, P. G. T. (2011) The distribution of repair in dialogue. In: de Vries, M. H., Geukes, S., Zwitserlood, P., Petersson, K. M. & Christiansen, M. H. Proceedings of the 33rd Annual Meeting of the Cognitive Science Society, (2012) Processing multiple non-adjacent dependencies: Evidence from se- – 68. [PGTH] Boston, MA, July 2011 , ed. L. Carlson & T. F. Shipley, pp. 1563 quence learning. Philosophical Transactions of the Royal Society B: Biological Perception & Psycho- Coltheart, M. (1980) Iconic memory and visible persistence. 76. [rNC] – Sciences 367:2065 228. [aMHC] physics 27:183 – Dediu, D., Cysouw, M., Levinson, S. C., Baronchelli, A., Christiansen, M. H., Croft, Connine, C. M., Blasko, D. G. & Hall, M. (1991) Effects of subsequent sentence W., Evans, N., Garrod, S., Gray, R., Kandler, A. & Lieven, E. (2013) Cultural context in auditory word recognition: Temporal and linguistic constraints. evolution of language. In: Cultural evolution: Society, technology, language and – 50. [KB, rNC] Journal of Memory and Language 30:234 32. MIT Press. – , ed. P. J. Richerson & M. H. Christiansen, pp. 303 religion Conway, C. & Christiansen, M. H. (2005) Modality constrained statistical learning of [aMHC] tactile, visual, and auditory sequences. Journal of Experimental Psychology: Reading in the brain: The new science of how we read Dehaene, S. (2009) . Viking. 31:24 Learning, Memory, and Cognition 39. [rNC] – [GB] Conway, C. M. & Christiansen, M. H. (2009) Seeing and hearing in space and time: Dehaene-Lambertz, G., Dehaene, S., Anton, J. L., Campagne, A., Ciuciu, P., Effects of modality and presentation rate on implicit statistical learning. Euro- Dehaene, G. P., Denghien, I., Jobert, A., Lebihan, D., Sigman, M., Pallier, C. & 80. [rNC] – 21:561 pean Journal of Cognitive Psychology Poline, J.-P. (2006a) Functional segregation of cortical language areas by sen- Cooper, R. P. & Shallice, T. (2006) Hierarchical schemas and goals in the control of tence repetition. – 27:360 Human Brain Mapping 71. [aMHC] sequential behavior. 916. [aMHC] – 113:887 Psychological Review Dehaene-Lambertz, G., Hertz-Pannier, L., Dubois, J., Meriaux, S., Roche, A., Experimental Corballis, M. C. (2009) Mental time travel and the shaping of language. Sigman, M. & Dehaene, S. (2006b) Functional organization of perisylvian ac- 60. [AB] – 192:553 Brain Research Proceedings of the tivation during presentation of sentences in preverbal infants. Courtney, E. H. & Saville-Troike, M. (2002) Learning to construct verbs in Navajo 103:14240 National Academy of Sciences 45. [aMHC] – 54. [UL] and Quechua. Journal of Child Language 29:623 – Dell, G. S., Burger, L. K. & Svec, W. R. (1997) Language production and serial . Cover, T. M. & Thomas, J. A. (2006) Elements of information theory, second edition order: A functional analysis and a model. 47. – 104:123 Psychological Review Wiley. [rNC] [aMHC] Cowan, N. (2000) The magical number 4 in short-term memory: A reconsideration Dell, G. S. & Chang, F. (2014) The P-chain: Relating sentence production and 24:87 Behavioral and Brain Sciences 185. of mental storage capacity. – its disorders to comprehension and acquisition. Philosophical Transactions [aMHC, KE, rNC] BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 61 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

62 References/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language of the Royal Society B: Biological Sciences 369(1634):20120394. Proceedings of the Ninth International Conference on Cognitive Modeling, [aMHC] , pp. 240 45. ed. A. Howes, D. Peebles & R. Manchester, UK, July 2009 – DeLong, K. A., Urbach, T. P. & Kutas, M. (2005) Probabilistic word pre-activation Cooper. [rNC] during language comprehension inferred from electrical brain activity. Nature Erickson, T. A. & Matteson, M. E. (1981) From words to meaning: A semantic il- – 21. [aMHC] Neuroscience 8:1117 20:540 Journal of Verbal Learning and Verbal Behavior lusion. 52. [aMHC] – Dikker, S., Rabagliati, H., Farmer, T. A. & Pylkkänen, L. (2010) Early occipital Ericsson, K. A., Chase, W. G. & Faloon, S. (1980) Acquisition of a memory skill. 82. [aMHC, MLD, rNC] Psychological Science sensitivity to syntactic category is based on form typicality. – Science 208:1181 Eshghi, A., Hough, J. & Purver, M. (2013) Incremental grammar induction from 34. [aMHC] – 21:629 Proceedings of the 4th Annual ACL child-directed dialogue utterances. In: Dillon, B., Mishler, A., Sloggett, S. & Phillips, C. (2013) Contrasting intrusion pro- Workshop on Cognitive Modeling and Computational Linguistics (CMCL) , les for agreement and anaphora: Experimental and modeling evidence. fi – fi So 103, ed. V. Demberg & R. Levy. Asso- a, Bulgaria, August 8 2013, pp. 94 103. [DAC] Journal of Memory and Language 69:85 – ciation for Computational Linguistics. [PGTH] Ding, N., Melloni, L., Zhang, H., Tian, X. & Poeppel, D. (2016) Cortical tracking of Eshghi, A., Howes, C., Gregoromichelaki, E., Hough, J. & Purver, M. (2015) Nature Neuroscience hierarchical linguistic structures in connected speech. Feedback in conversation as incremental semantic update. In: Proceedings of 19:159 64. [GB] – , the 11th International Conference on Computational Semantics (IWCS) Dixon, R. & Aikhenvald, A. (2002) Word: A typological framework. In: Word: A – 17 2015, pp. 261 – London, UK, April 15 71, ed. M. Purver, M. Sadrzadeh & M. 41. – , ed. R. M. W. Dixon & A. Y. Aikhenvald, pp. 1 cross-linguistic typology Stone. Association for Computational Linguistics. [PGTH] Cambridge University Press. [SCL] Evans, N. & Levinson, S. (2009) The myth of language universals: Language diversity Dominey, P. F., Hoen, M., Blanc, J.-M. & Lelekov-Boissard, T. (2003) Neurological 32:429 and its importance for cognitive science. – Behavioral and Brain Sciences basis of language and sequential cognition: Evidence from simulation, aphasia 92. [aMHC, ADE, rNC] 86:207 25. [SLF] – and ERP studies. Brain and Language Introduction to quantitative genetics Falconer, D. S. (1981) . Longman. [AL] Dumitru, M. L. (2014) Moving stimuli guide retrieval and (in)validation of coordi- Farmer, T. A., Christiansen, M. H. & Monaghan, P. (2006) Phonological typicality 403. [MLD] nation simulations. 15(3):397 Cognitive Processing – in Proceedings of the National uences on-line sentence comprehension. fl Dumitru, M. L., Joergensen, G. H., Cruickshank, A. G. & Altmann, G. T. M. – 103:12203 Academy of Sciences 208. [aMHC] (2013) Language-guided visual processing affects reasoning: The role of ref- Farmer, T. A., Monaghan, P., Misyak, J. B. & Christiansen, M. H. (2011) Phono- erential and spatial anchoring. 71. – 22(2):562 Consciousness and Cognition fl logical typicality in uences sentence processing in predictive contexts: A reply [MLD] to Staub et al. (2009) Journal of Experimental Psychology: Learning, Memory, cues knowledge of alternatives: Evidence Or Dumitru, M. L. & Taylor, A. J. (2014) 25. [aMHC] – 37:1318 and Cognition from priming. Scandinavian Journal of Psychology 55(2):97 – 101. [MLD] Federmeier, K. D. (2007) Thinking ahead: The role and roots of prediction in lan- Dunn, M., Greenhill, S. J., Levinson, S. C. & Gray, R. D. (2011) Evolved structure of guage comprehension. Psychophysiology 44:491 – 505. [aMHC] c trends in word-order universals. fi language shows lineage-speci Nature Fedzechkina, M., Jaeger, T. F. & Newport, E. L. (2012) Language learners re- 82. [aMHC] – 473:79 cient communication. fi structure their input to facilitate ef Proceedings of the Durrant, P. (2013) Formulaicity in an agglutinating language: The case of Turkish. 17902. [MLL] – 109:17897 National Academy of Sciences Corpus Linguistics and Linguistic Theory 38. [aMHC] – 9:1 fl uence of categories fi ths, T. L. & Morgan, J. L. (2009) The in Feldman, N. H., Grif Dryer, M. (1992) The Greenbergian word order correlations. Language 138. – 68:81 on perception: Explaining the perceptual magnet effect as optimal statistical [DPM] – Psychological Review 116:752 inference. 82. [KB] Dryer, M. S. (2009) The branching direction theory of word order correlations Psychological Ferreira, F. (1993) Creation of prosody during sentence production. Universals of language today revisited. In: , ed. S. Scalise, E. Magni & A. Bisetto, Review 100:233. [rNC] – 207. Springer Netherlands. [DPM] pp. 185 Ferreira, F. (2003) The misinterpretation of noncanonical sentences. Cognitive Dyer, C., Ballesteros, M., Ling, W., Matthews, A. & Smith, N. (2015) Transi- Psychology 203. [FF, rNC] 47(2):164 – tion-based dependency parsing with stack long short-term memory. In: Ferreira, F., Bailey, K. G. & Ferraro, V. (2002) Good-enough representations Proceedings of the 53rd Annual Meeting of the Association for Computa- in language comprehension. Current Directions in Psychological Science – 31, tional Linguistics (Volume 1: Long Papers), Beijing, China, July 26 15. [aMHC, FF, rNC] – 11(1):11 2015 – 43. Association for Computational , ed. C. Zong & M. Strube, pp. 334 Ferreira, F. & Swets, B. (2002) How incremental is language production? Evidence Linguistics. [CG-R] Edelman, S. (2008a) Computing the mind: How the mind really works . Oxford from the production of utterances requiring the computation of arithmetic University Press. [AL] 46(1):57 Journal of Memory and Language sums. 84. [DAC, FF, MCM, – Journal of Edelman, S. (2008b) On the nature of minds, or: Truth and consequences. rNC] Experimental and Theoretical AI – 20:181 96. [AL] Ferreira, V. (2008) Ambiguity, accessibility, and a division of labor for communica- 46. [aMHC] – 49:209 Psychology of Learning and Motivation tive success. Edelman, S. (2015) The minority report: Some common assumptions to reconsider Ferrer i Cancho, R. (2004) The Euclidean distance between syntactically linked in the modeling of the brain and behavior. Journal of Experimental and The- 70:056135. [aMHC] Physical Review E words. oretical Arti fi cial Intelligence 27, doi 10.1080/0952813X.2015.1042534. [AL] Ferrer i Cancho, R. & Liu, H. (2014) The risks of mixing dependency lengths from Eggermont, J. (2001) Between sound and perception: Reviewing the search for a 55. [aMHC] – 5:143 Glottotheory sequences of different length. – neural code. Hearing Research 157(1): 1 42. [CRH] Fine, A. B., Jaeger, T. F., Farmer, T. A. & Qian, T. (2013) Rapid expectation Eldredge, N. & S. J. Gould (1972) Punctuated equilibria: An alternative to phyletic 8(10):e77661. PLOS ONE adaptation during syntactic comprehension. 115. – Models in paleobiology , ed. T. J. M. Schopf, pp. 82 gradualism. In: [aMHC] Freeman Cooper. [rNC] Fiske, S. T. & Taylor, S. E. (1984) . Addison-Wesley. [aMHC] Social cognition Elliott, L. L. (1962) Backward and forward masking of probe tones of different Fitneva, S. A., Christiansen, M. H. & Monaghan, P. (2009) From sound to syntax: Journal of the Acoustical Society of America 34:1116 17. – frequencies. Phonological constraints on children ’ s lexical categorization of new words. [aMHC, CJH] 97. [aMHC] – 36:967 Journal of Child Language Ellis, N. C. (2002) Frequency effects in language processing. Studies in Second Fitz, H. (2011) A liquid-state model of variability effects in learning nonadjacent – 88. [aMHC] 24:143 Language Acquisition Proceedings of the 33rd Annual Conference of the Cognitive dependencies. In: 211. Elman, J. L. (1990) Finding structure in time. Cognitive Science 14(2):179 – Science Society, Boston, MA, July 2011 , ed. L. Carlson, C. Hölscher & [aMHC, ADE, rNC] T. Shipley, pp. 897 – 902. Cognitive Science Society. [SLF, rNC] Emmorey, K. (1995) Processing the dynamic visual-spatial morphology of signed Flanagan, J. R. & Wing, A. M. (1997) The role of internal models in motor planning Morphological aspects of language processing: Cross-linguistic languages. In: and control: Evidence from grip force adjustments during movements of hand- – , ed. L. M. Feldman, pp. 29 perspectives 54. Erlbaum. [KE] held loads. 28. [aMHC] Journal of Neuroscience – 17:1519 Emmorey, K., Bosworth, R. & Kraljic, T. (2009) Visual feedback and self-monitoring The future in thought and language: Diachronic evidence from Fleischman, S. (1982) 11. [KE] – 61:398 Journal of Memory and Language of sign language. . Cambridge University Press. [aMHC] Romance Endress, A. D. & Potter, M. C. (2014a) Large capacity temporary visual memory. 36. [aMHC] 29:1 Linguistic Inquiry Fodor, J. D. (1998) Unambiguous triggers. – 65. doi: 10.1037/ – . 143(1):548 Journal of Experimental Psychology: General Ford, M., Bresnan, J. W. & Kaplan, R. M. (1982) A competence-based theory of a0033934. [MCP, rNC] syntactic closure. In: The mental representation of grammatical relations , ed. Endress, A. D. & Potter, M. C. (2014b) Something from (almost) nothing: Buildup of – J. W. Bresnan, pp. 727 96. MIT Press. [aMHC] xations. fi Attention, Perception, and object memory from forgettable single Forster, K. I. (1970) Visual perception of rapidly presented word sequences of 76:2413 Psychophysics 23. [ADE] – – 21. [MCP] varying complexity. Perception & Psychophysics 8:215 fi eld, N. J. (2013) Relationship thinking: Enchrony, agency, and human sociality . En Forster, K. I. & Davis, C. (1984) Repetition priming and frequency attenuation in Oxford University Press. [aMHC] lexical access. Journal of Experimental Psychology: Learning, Memory, and Engelmann, F. & Vasishth, S. (2009) Processing grammatical and ungrammatical 698. [MCP] Cognition 10:680 – center embeddings in English and German: A computational model. In: BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 62 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

63 References/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Giraud, A. L. & Poeppel, D. (2012) Cortical oscillations and speech processing: Franck, J., Soare, G., Frauenfelder, U. H. & Rizzi, L. (2010) Object interference: Emerging computational principles and operations. Nature Neuroscience The role of intermediate traces of movement. Journal of Memory and Language 15(4):511 – 17. [GB] – 82. [DAC] 62:166 s Givón, T. (1971) Historical syntax and synchronic morphology: An archaeologist ’ Frank, M., Goldwater, S., Grif fi ths, T. & Tenenbaum, J. (2010) Modeling human 415. [aMHC] fi 7:394 Chicago Linguistic Society eld trip. – 25. [ADE] – 117:107 Cognition performance in statistical word segmentation. Givón, T. (1979) On understanding grammar . Academic Press. [aMHC] Frank, M. C. & Goodman, N. (2012) Predicting pragmatic reasoning in language Gobet, F., Lane, P. C. R., Croker, S., Cheng, P. C. H., Jones, G., Oliver, I. & Pine, J. Science games. 98. [MLL] – 336:998 Trends in Cognitive M. (2001) Chunking mechanisms in human learning. Frank, M. C. & Goodman, N. D. (2014) Inferring word meanings by assuming that – 5:236 Sciences 43. [aMHC] Cognitive Psychology 75:90 speakers are informative. – 96. [MLL] Gold, E. M. (1967) Language identi fi cation in the limit. Information and Control ’ Frank, M. C., Goodman, N. D. & Tenenbaum, J. B. (2009) Using speakers refer- 74. [aMHC] – 10:447 ential intentions to model early cross-situational word learning. Psychological Goldberg, A. (2006) Oxford University Press. [aMHC, Constructions at work. 85. [MLL] – 20:579 Science rNC] Frank, S. L. & Bod, R. (2011) Insensitivity of the human sentence-processing system Goldinger, S. D. (1998) Echoes of echoes? An episodic theory of lexical access. 34. [SLF, rNC] to hierarchical structure. Psychological Science 22:829 – Psychological Review 105:251 – 79. [aMHC] Frank, S. L., Bod, R. & Christiansen, M. H. (2012) How hierarchical is language use? Goldstein, M. H., Waterfall, H. R., Lotem, A., Halpern, J., Schwade, J., Onnis, L. & – Proceedings of the Royal Society B: Biological Sciences 31. [aMHC] 279:4522 Edelman, S. (2010) General cognitive principles for learning structure in time Frank, S. L., Trompenaars, T. & Vasishth, S. (2016) Cross-linguistic differences in and space. 58. [AL] – 14:249 Trends in Cognitive Sciences processing double-embedded relative clauses: Working-memory constraints or : The preverbal negotiation of failed ” I beg your pardon? “ Golinkoff, R. M. (1986) – language statistics? Cognitive Science . 40:554 78. [rNC] messages. – 76. [GB] 13(3):455 Journal of Child Language Frazier, L. & Fodor, J. D. (1978) The sausage machine: A new two-stage parsing Golinkoff, R. M., Hirsh-Pasek, K., Bloom, L., Smith, L., Woodward, A., Akhtar, N., model. Cognition 6:291 – 25. [aMHC, DAC, rNC] Tomasello, M. & Hollich, G., eds. (2000) Becoming a word learner: A debate on Frazier, L. & Rayner, K. (1982) Making and correcting errors during sentence . Oxford University Press. [aMHC] lexical acquisition comprehension: Eye movements in the analysis of structurally ambiguous sen- Psychological Gómez, R. L. (2002) Variability and detection of invariant structure. Cognitive Psychology 14(2):178 – tences. 10. [FF] 36. [rNC] 13:431 Science – Trends in French, R. M. (1999) Catastrophic forgetting in connectionist networks. Gómez-Rodríguez, C. & Nivre, J. (2013) Divisible transition systems and mul- 3:128 Cognitive Sciences 35. [aMHC, rNC] – tiplanar dependency parsing. Computational Linguistics 39(4):799 – 45. Frost, R., Armstrong, B. C., Siegelman, N. & Christiansen, M. H. (2015) Domain [CG-R] city: The paradox of statistical learning. fi generality versus modality speci Trends Gómez-Rodríguez, C., Sartorio, F. & Satta, G. (2014) A polynomial-time dynamic in Cognitive Sciences 25. [rNC] – 19:117 oracle for non-projective dependency parsing. In: Proceedings of the 2014 Fuster, J. & Alexander, G. (1971) Neuron activity related to short-term memory. Conference on Empirical Methods on Natural Language Processing, Doha, – 173:652 Science 54. [CRH] Qatar, October 25 – 29, 2014 , ed. A. Moschitti, B. Pang & W. Daelemans, Fuster, J. M. (1995) Temporal processing. Annals of the New York Academy of – 27. Association for Computational Linguistics. [CG-R] pp. 917 769:173 – 82. [GB] Science Goodman, N. D. & Stuhlmüller, A. (2013) Knowledge and implicature: Modeling Trends in Neuroscience 20:451 – Fuster, J. M. (1997) Network memory. 59. [CJH] – language understanding as social cognition. Topics in Cognitive Science 5:173 Gahl, S. & Garnsey, S. M. (2004) Knowledge of grammar, knowledge of usage: Syntactic 75. [aMHC] 84. [MLL] probabilities affect pronunciation variation. Language 80:748 – Gallace, A., Tan, H. Z. & Spence, C. (2006) The failure to detect tactile change: A Goodwin, C. (1979) The interactive construction of a sentence in natural conversa- tactile analogue of visual change blindness. Psychonomic Bulletin & Review tion. In: , ed. G. Psathas, pp. Everyday language: Studies in ethnomethodology 121. Irvington Publishers. [PGTH] 97 – 13:300 – 303. [aMHC] Gordon, R. G. (2005) Ethnologue: Languages of the world, 15th edition . SIL In- Gärdenfors, P. (2004) Cooperation and the evolution of symbolic communication. ternational. [TB] – In: Evolution of communication systems , ed. D. K. Oller & U. Griebel, pp. 237 Gorrell, P. (1995) Syntax and parsing . Cambridge University Press. [aMHC] 56. MIT Press. [AB] Goswami, U. (2015) Sensory theories of developmental dyslexia: Three challenges cial fi Handbook of logic in arti Gärdenfors, P. & Rott, H. (1995) Belief revision. In: Nature Reviews Neuroscience – 54. [GB] 16:43 for research. intelligence and logic programming, vol. 4 – 132. Oxford , ed. D. Gabbay, pp. 35 Gould, S. J. & Vrba, E. S. (1982) Exaptation: A missing term in the science of form. University Press. [rNC] 15. [AB] – 8:4 Paleobiology Garnham, A. & Oakhill, J. (1985) On-line resolution of anaphoric pronouns: Graesser, A. C., Singer, M. & Trabasso, T. (1994) Constructing inferences during Effects of inference making and verb semantics. British Journal of Psychology narrative text comprehension. Psychological Review 101(3):371 – 95. [GB] 93. [rNC] – 76:385 Animals in translation Grandin, T. (2005) . Scribner. [CB] Language Garrett, M. F. (1980) Levels of processing in sentence production. In: Graybiel, A. M. (1998) The basal ganglia and chunking of action repertoires. Neu- production: Vol. 1. Speech and talk , ed. B. Butterworth. pp. 177 – 221. Academic – 36. [aMHC] robiology of Learning and Memory 70:119 Press. [DAC, rNC] Handbook Green, S. & Marler, P. (1979) The analysis of animal communication. In: Gee, J. & Grosjean, F. (1983) Performance structures: A psycholinguistics and lin- of behavioral neurobiology: Vol. 3. Social behavior and communication , ed. 58. [MLD] Cognitive Psychology – 15:411 guistic appraisal. P. Marler & J. G. Vandenbergh, pp. 73 158. Plenum Press. [AL] – Gerstner, W., Kistler, W. M., Naud, R. & Paninski, L. (2014) Neuronal dynamics: Greenberg, J. (1963) Some universals of grammar with particular reference to the From single neurons to networks and models of cognition . Cambridge University order of meaningful elements. In: Universals of language , ed. J. Greenberg, Press. [SLF] 113. MIT Press. [DPM] – pp. 73 s early sentence com- ’ Gertner, Y. & Fisher, C. (2012) Predicted errors in children Greenwald, A. G. (1970) Sensory feedback mechanisms in performance control: 124:85 Cognition 94. [rNC] – prehension. Psychological Review With special reference to the ideo-motor mechanism. Gibson, E. (1998) Linguistic complexity: Locality of syntactic dependencies. 99. [AB] – 77:73 68:1 Cognition 76. [aMHC] – Greenwald, A. G. (1972) On doing two things at once: Time sharing as a function of Gibson, E., Bergen, L. & Piantadosi, S. T. (2013) Rational integration of noisy evi- Journal of Experimental Psychology ideomotor compatibility. 57. [AB] – 94:52 Proceedings dence and prior semantic expectations in sentence interpretation. Gregoromichelaki, E., Kempson, R., Howes, C. & Eshghi, A. (2013) On making of the National Academy of Sciences 56. [aMHC] – 110:8051 syntax dynamic: The challenge of compound utterances and the architecture of Gibson, E. & Thomas, J. (1999) Memory limitations and structural forgetting: The the grammar. In: Alignment in communication: Towards a new theory of perception of complex ungrammatical sentences as grammatical. Language and , ed. I. Wachsmuth, J. de Ruiter, P. Jaecks & S. Kopp, pp. 57 communication 86. – Cognitive Processes 14:225 – 48. [rNC] John Benjamins. [RK] LinguisticInquiry Gibson,E.&Wexler,K.(1994)Triggers. 54. [aMHC,rNC] – 25:407 Gregoromichelaki, E., Kempson, R., Purver, M., Mills, G., Cann, R., Meyer-Viol, W. Gil, D. (2009) How much grammar does it take to sail a boat? In: Language com- & Healey, P. (2011) Incrementality and intention-recognition in utterance plexity as an evolving variable, ed. G. Sampson, D. Gil & P. Trudgill, pp. 19 – 33. http://dad.uni-bie- 33. Available at: – 2:199 Dialogue and Discourse processing. Oxford University Press. [CB] lefeld.de/index.php/dad/article/view/363/1460 . [RK] . Macmillan. [RM] Applied motion study Gilbreth, F. B. & Gilbreth, L. M. (1919) Gregory, R. L. (2005) The Medawar lecture 2001 knowledge for vision: Vision for Gildea, D. & Temperley, D. (2010) Do grammars minimize dependency length? knowledge. Philosophical Transactions of the Royal Society B: Biological Sci- Cognitive Science 310. [aMHC] – 34:286 – 360:1231 ences 51. [rNC] Gimenes, M., Rigalleau, F. & Gaonac h, D. (2009) When a missing verb makes a French ’ Grice, H. P. (1967) . William James Lectures. Manuscript, Logic and conversation sentence more acceptable. Language and Cognitive Processes 49. [rNC] – 24:440 Harvard University. [aMHC] Giraud, A. L., Kleinschmidt, A., Poeppel, D., Lund, T. E., Frackowiak, R. S. J. & Syntax and Semantics , ed. P. Cole & Grice, H. P. (1975) Logic and conversation. In: Laufs, H. (2007) Endogenous cortical rhythms determine cerebral specializa- tion for speech perception and production. Neuron – 56(6):1127 34. [GB] J. Morgan, pp. 41 – 58. Academic Press. [MLL] 63 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

64 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language References/ Psychological Grif fi n, Z. M. & Bock, K. (2000) What the eyes say about speaking. Hauser, M. D., Chomsky, N. & Fitch, W. T. (2002) The faculty of language: What is it, 79. [aMHC] – – 79. [SCL] 4:274 Science 298:1569 who has it, and how did it evolve? Science Hauser, M. D., Yang, C., Berwick, R. C., Tattersall, I., Ryan, M., Watumull, J., ths, T. L. & Tenenbaum, J. B. (2009) Theory-based causal induction. Grif Psycho- fi Chomsky, N. & Lewontin, R. (2014) The mystery of language evolution. 716. [aMHC] – 116:661 logical Review Frontiers of Psychology 5:401. doi:10.3389/fpsyg.2014.00401. [CB] Grosjean, F. (1985) The recognition of words after their acoustic offset: Evidence Hawkins, J. A. (1983) Academic Press. [DPM] Word order universals. 10. [KB] and implications. Perception and Psychophysics 38:299 – Hawkins, J. A. (1990) A parsing theory of word order universals. Linguistic Inquiry Grossberg, S. (1973) Contour enhancement, short-term memory, and constancies in 21:223 – 61. [DPM] – Studies in Applied Mathematics reverberating neural networks. 57. 52:213 ciency and complexity in grammars . Oxford University fi Ef Hawkins, J. A. (2004) [SG] Press. [aMHC, DPM, WO] Grossberg, S. (1978a) A theory of human memory: Self-organization and perfor- Hawkins, J. A. (2009) Language universals and the performance-grammar mance of sensory- motor codes, maps, and plans. In: Progress in theoretical , ed. M. H. Christiansen, correspondence hypothesis. In: Language universals 74. Academic Press. [SG] , ed. R. Rosen & F. Snell, pp. 233 – biology, volume 5 C. Collins & S. Edelman, pp. 54 – 78. Oxford University Press. [aMHC] Grossberg, S. (1978b) Behavioral contrast in short-term memory: Serial binary . Oxford University ciency fi Cross-linguistic variation and ef Hawkins, J. A. (2014) memory models or parallel continuous memory models? Journal of Mathe- Press. [WO] 3:199 – 19. [SG] matical Psychology Proceedings Hay, J. (2000) Morphological adjacency constraints: A synthesis. Grossberg, S. (1986) The adaptive self-organization of serial order in behavior: of Student Conference in Linguistics 9. MIT Working Papers in Linguistics Speech, language, and motor control. In: Pattern recognition by humans and 36:17 – 29. [aMHC] machines, vol. 1: Speech perception , ed. E. C. Schwab & H. C. Nusbaum, Healey, P. G. T., Eshghi, A., Howes, C. & Purver, M. (2011) Making a contribution: pp. 187 – 94. Academic Press. [SG] Processing clari cation requests in dialogue. In: Proceedings of the 21st Annual fi Grossberg, S. (2003) Resonant neural dynamics of speech perception. Journal of . Poitiers, France, July 11 – 13 Meeting of the Society for Text and Discourse 45. [SG] Phonetics 31:423 – 2011, [PGTH] Grossberg, S. (2013) Adaptive resonance theory: How a brain learns to consciously Healey, P. G. T., Purver, M. & Howes, C. (2014) Divergence in dialogue. PLOS Neural Networks attend, learn, and recognize a changing world. 47. [SG] – 37:1 ONE 9(6):e98598. doi: 10.1371/journal.pone.0098598. [PGTH] Grossberg, S., Boardman, I. & Cohen, C. (1997) Neural dynamics of variable-rate Heathcote, A., Brown, S. & Mewhort, D. J. K. (2000) The power law repealed: speech categorization. Journal of Experimental Psychology: Human Perception Psychonomic Bulletin and Review The case for an exponential law of practice. 503. [SG] and Performance 23:418 – 207. [aMHC] 7:185 – Grossberg, S. & Kazerounian, S. (2011) Laminar cortical dynamics of conscious . Wiley. [CRH] Hebb, D. (1949) The organization of behavior speech perception: A neural model of phonemic restoration using subsequent Heider, P., Dery, J. & Roland, D. (2014) The processing of object relative clauses: it context in noise. Journal of the Acoustical Society of America 60. – 130:440 ne-grained frequency account. fi Evidence against a Journal of Memory and [SG] Language 75:58 – 76. [aMHC] Grossberg, S. & Myers, C. W. (2000) The resonant dynamics of speech perception: . Cambridge World lexicon of grammaticalization Heine, B. & Kuteva, T. (2002) Psychological Interword integration and duration-dependent backward effects. University Press. [aMHC] Review 107:735 – 67. [SG] Heine, B. & Kuteva, T. (2007) . Oxford University Press. The genesis of grammar Grossberg, S. & Pearson, L. (2008) Laminar cortical dynamics of cognitive and [aMHC] motor working memory, sequence learning and performance: Toward a uni fi ed Hickok, G. & Poeppel, D. (2000) Towards a functional anatomy of speech percep- Psychological Review 115:677 – 32. theory of how the cerebral cortex works. – 38. [SCL] tion. Trends in Cognitive Sciences 4:131 [SG] Hinaut, X. & Dominey, P. F. (2013) Real-time parallel processing of grammatical s linguistic knowledge. Gualmini, A. & Crain, S. (2005) The structure of children ’ 74. [DPM] structure in the fronto-striatal system: A recurrent network simulation study Linguistic Inquiry – 36(3):463 Gurevich, O., Johnson, M. A. & Goldberg, A. E. (2010) Incidental verbatim memory PLOS ONE 8(2):e52946. [SLF] using reservoir computing. Language and Cognition – 2:45 for language. 78. [aMHC] Hinojosa, J. A., Moreno, E. M., Casado, P., Munõz, F. & Pozo, M. A. (2005) Syn- Haber, R. N. (1983) The impending demise of the icon: The role of iconic processes Neuroscience Letters tactic expectancy: An event-related potentials study. in information processing theories of perception. Behavioral and Brain Sciences 378:34 – 39. [aMHC] 6:1 11. [aMHC] – Hintzman, D. L. (1988) Judgments of frequency and recognition memory in a 51. – Hagoort, P. (2009) Re fl ections on the neurobiology of syntax. In: Biological foun- multiple-trace memory model. 95:528 Psychological Review [aMHC, rNC] dations and origin of syntax. Strüngmann Forum Reports, volume 3 , ed. D. Learning and relearning in Boltzmann Hinton, G. E. & Sejnowski, T. J. (1986) – 96. MIT Press. [aMHC] Bickerton & E. Szathmáry, pp. 279 , ed. machines Graphical models: Foundations of neural computation . In: Journal Hagstrom, P. & Rhee, R. (1997) The dependency locality theory in Korean. – 76. MIT Press. [aMHC] M. J. Irwin & T. J. Sejnowski, pp. 45 26:189 – of Psycholinguistic Research 206. [rNC] Lexical priming: A new theory of words and language Hoey, M. (2005) . Psychology fl Hahn, U. & Nakisa, R. C. (2000) German in ection: Single route or dual route? Press. [aMHC] Cognitive Psychology – 60. [aMHC] 41:313 Language Hofmeister, P. & Sag, I. A. (2010) Cognitive constraints and island effects. Hakes, D. T., Evans, J. S. & Brannon, L. L (1976) Understanding sentences with – 86:366 415. [aMHC] Memory and Cognition – 90. [rNC] 4:283 relative clauses. Language and Cognitive Holmes, V. M. (1988) Hesitations and sentence planning. Proceedings Hale, J. (2001) A probabilistic Earley parser as a psycholinguistic model. 61. [rNC] Processes – 3:323 of the Second Meeting of the North American Chapter of the Association for Hommel, B., Müsseler, J., Aschersleben, G. & Prinz, W. (2001) The theory of event – , pp. 159 66. Asso- – Computational Linguistics, Pittsburgh, PA, June 2 7, 2001 Behavioral and coding (TEC): A framework for perception and action planning. ciation for Computational Linguistics. [aMHC] 78. [AB] – 24:849 Brain Sciences – 30:609 Cognitive Science Hale, J. (2006) Uncertainty about the rest of the sentence. Honey, C. J., Thesen, T., Donner, T. H., Silbert, L. J., Carlson, C. E., Devinsky, O., 42. [aMHC] Doyle, W. K., Rubin, N., Heeger, D. J. & Hasson, U. (2012) Slow cortical Hall, M. & Bavelier, D. (2009) Short-term memory stages in sign versus speech: The dynamics and the accumulation of information over long timescales. Neuron 120:54 source of the serial span discrepancy. 66. [KE] – Cognition 76(2):423 34. [aMHC, CJH] – Hamilton, H. W. & Deese, J. (1971) Comprehensibility and subject-verb relations in Hoover, M. L. (1992) Sentence processing strategies in Spanish and English. Journal complex sentences. Journal of Verbal Learning and Verbal Behavior – 10:163 99. [rNC] – 21:275 of Psycholinguistic Research 70. [rNC] Hopper, P. J. & Traugott, E. C. (1993) . Cambridge University Grammaticalization Haspelmath, M. (1999) Why is grammaticalization irreversible? Linguistics 37:1043 – Press. [aMHC, rNC] 68. [aMHC] Horn, L. (1984) Toward a new taxonomy for pragmatic inference: Q-based and R- Hasson, U., Chen, J. & Honey, C. J. (2015) Hierarchical process memory: Memory based implicature. In: Meaning, form, and use in context , ed. D. Schiffrin, pp. as an integral component of information processing. Trends in Cognitive Sci- 11 – 42. Georgetown University Press. [MLL] 13. doi:10.1016/j.tics.2015.04.006. [CJH, rNC] – 19(6):304 ences Horn, L. R. (1972) On the semantic properties of logical operators in English. Un- Hasson, U., Yang, E., Vallines, I., Heeger, D. J. & Rubin, N. (2008) A hierarchy of published doctoral dissertation, UCLA. [MLL] temporal receptive windows in human cortex. The Journal of Neuroscience 28 Horst, J. S. & Samuelson, L. K. (2008) Fast mapping but poor retention in 24-month- 50. [aMHC, CJH] – (10):2539 old infants. Infancy 13:128 – 57. [rNC] Hatori, J., Matsuzaki, T., Miyao, Y. & Tsujii, J. (2012) Incremental joint approach to Hough, J. & Purver, M. (2012) Processing self-repairs in an incremental type-theo- word segmentation, POS tagging, and dependency parsing in Chinese. In: retic dialogue system. In: Proceedings of the 16th SemDial Workshop on the Proceedings of the 50th Annual Meeting of the Association for Computational Semantics and Pragmatics of Dialogue (SeineDial), Paris, France, September – 14, 2012 , ed. H. Linguistics (Volume 1: Long Papers), Jeju Island, Korea, July 8 19 – 21, 2012, pp. 136 – 44, ed. S. Brown-Schmidt, J. Ginzburg & S. Larsson. 24. Association for – Li, C-Y. Lin, M. Osborne, G. G. Lee & J. C. Park, pp. 1216 SemDial [PGTH] Computational Linguistics. [CG-R] BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 64 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

65 References/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Hough, J. & Purver, M. (2013) Modelling expectation in the self-repair processing of , ed. J. L. Bybee & P. Frequency and the emergence of linguistic structure annotat-, um, listeners. In: Proceedings of the 17th SemDial Workshop on the 54. John Benjamins. [aMHC] Hopper, pp. 229 – Semantics and Pragmatics of Dialogue (DialDam), Amsterdam, Netherlands, Jurafsky, D., Martin, J. H., Kehler, A., Vander Linden, K. & Ward, N. (2000) 101, ed. R. Fernández & A. Isard. SemDial. – December 16 – 18, 2013, pp. 92 Speech and language processing: An introduction to natural language pro- [PGTH] . Prentice Hall. cessing, computational linguistics, and speech recognition Howes, A., Lewis, R. L. & Vera, A. (2009) Rational adaptation under task and pro- [aMHC] cessing constraints: Implications for testing theories of cognition and action. xations to com- fi Just, M. & Carpenter, P. (1980) A theory of reading: From eye Psychological Review 116:717 – 51. [KB] 54. [CRH] – 87(4):123 Psychological Review prehension. Howes, C., Purver, M., McCabe, R., Healey, P. G. & Lavelle, M. (2012) Helping Just, M. A. & Carpenter, P. A. (1992) A capacity theory of comprehension: Individual Psychological Review differences in working memory. 99:122. [rNC] the medicine go down: Repair and adherence in patient-clinician dialogues. Kam, X.-N. C. & Fodor, J. D. (2013) Children ’ s acquisition of syntax: Simple models In: Proceedings of the 16th SemDial Workshop on the Semantics and are too simple. In: , ed. M. Piattelli-Palmarini & Rich languages from poor inputs Pragmatics of Dialogue (SeineDial), Paris, France, September 19 – 21 2012, – R. C. Berwick, pp. 43 60. Oxford University Press. [DPM] 56, ed. S. Brown-Schmidt, J. Ginzburg & S. Larsson. SemDial. pp. 155 – Kam, X. N. C., Stoyneshka, I., Tornyova, L., Fodor, J. D. & Sakas, W. G. (2008) [PGTH] Cognitive Science 32(4):771 Bigrams and the richness of the stimulus. – 87. Hruschka, D., Christiansen, M. H., Blythe, R. A., Croft, W., Heggarty, P., Mufwene, [FHW, rNC] S. S., Pierrehumbert, J. H. & Poplack, S. (2009) Building social cognitive models Kamide, Y. (2008) Anticipatory processes in sentence processing. Language and of language change. 69. [aMHC] Trends in Cognitive Sciences 13:464 – – 70. [aMHC] 2:647 Linguistic Compass Hsiao, Y., Gao, Y. & MacDonald, M. C. (2014) Agent-patient similarity affects Kamide, Y., Altmann, G. T. M. & Haywood, S. L. (2003) The time-course of pre- sentence structure in language production: Evidence from subject omissions in diction in incremental sentence processing: Evidence from anticipatory eye Mandarin. 5:1015. [MCM] Frontiers in Psychology – movements. Journal of Memory and Language 49:133 56. [MLD] Hsu, A., Chater, N. & Vitányi, P. (2011) The probabilistic analysis of language ac- Punish- Kamin, L. J. (1969) Predictability, surprise, attention and conditioning. In: Cognition quisition: Theoretical, computational, and experimental analysis. 96. – , ed. B. A. Campbell, R. M. Church, pp. 279 ment and aversive behavior 120:380 90. [aMHC] – Appleton-Century-Crofts. [aMHC] Huang, C.-T. J. (1982) Logical relations in Chinese and the theory of grammar. Journal of Karlsson, F. (2007) Constraints on multiple center-embedding of clauses. Unpublished doctoral dissertation. Departments of Linguistics and Philosophy, 92. [aMHC, rNC] Linguistics – 43:365 Massachusetts Institute of Technology. [ADE] Kashima, Y., Bekkering, H. & Kashima, E. S. (2013) Communicative intentions can Hudson Kam, C. L. & Newport, E. L. (2005) Regularizing unpredictable variation: modulate the linguistic perception-action link. Behavioral and Brain Sciences The roles of adult and child learners in language formation and change. Lan- 36:33 – 34. [AB] – 95. [aMHC] guage Learning and Development 1:151 Talking minds: The study Katz, J. J. (1984) An outline of Platonist grammar. In: Hurford, J. (1999) The evolution of language and languages. In: The evolution of of language in cognitive science , ed. T. G. Bever, J. M. Carroll & L. A. Miller, , ed. R. Dunbar, C. Knight & C. Power, pp. 173 93. Edinburgh Uni- – culture – pp. 17 48. MIT Press. [CB] versity Press. [aMHC] Katz, J. J. (1996) The un fi nished Chomskyan revolution. Mind and Language Huyck, C. (2009) A psycholinguistic model of natural language parsing implemented 11:270 94. [CB] – – 30. [CRH] Cognitive Neurodynamics 3(4):316 in simulated neurons. . MIT Press. [CB] Katz, J. J. (1998) Realistic rationalism Indefrey, P. & Levelt, W. J. M. (2004) The spatial and temporal signatures of word Sense, reference, and philosophy Katz, J. J. (2004) . Oxford University Press. [CB] – 2):101 – 92(1 Cognition production components. 44. [SCL] Linguistics Katz, J. J. & Postal, P. M. (1991) Realism vs. conceptualism in linguistics. fi cation of sequentially presented pic- Intraub, H. (1981) Rapid conceptual identi 54. [CB] – 14:515 and Philosophy Journal of Experimental Psychology: Human Perception and Performance tures. Kazerounian, S. & Grossberg, S. (2014) Real-time learning of predictive recognition 10. [MCP] 7:604 – Frontiers categories that chunk sequences of items stored in working memory. wanna contraction in second language Ito, Y. (2005) A psycholinguistic approach to doi: 10.3389/fpsyg.2014.01053. [SG] in Psychology: Language Sciences. acquisition. Unpublished doctoral dissertation. University of Hawaii at Manoa. Kelly, B., Wigglesworth, G., Nordlinger, R. & Blythe, J. (2014) The acquisition of Department of Second Language Studies. [WO] 64. Language and Linguistics Compass 8(2):51 polysynthetic languages. – . Oxford. [SCL] Jackendoff, R. (2002) Foundations of language [UL, rNC] Jackendoff, R. (2007) A parallel architecture perspective on language processing. ect general fl Kemp, C. & Regier, T. (2012) Kinship categories across languages re Brain Research 1146:2 – 22. [aMHC, MLD] Science 336:1049 – 54. [MLL] communicative principles. Jacoby, L. L., Baker, J. G. & Brooks, L. R. (1989) The priority of the speci fi c: Epi- Journal of Experimental Psychology: cation. Kempson, R., Gregoromichelaki, E. & Howes, C. (2011) The dynamics of lexical sodic effects in picture identi fi 15:275 Learning, Memory, and Cognition – 81. [aMHC] interfaces . CSLI Press. [RK] Kempson, R. & Kiaer, J. (2010) Multiple long-distance scrambling: Syntax as re- Jaeger, H. & Haas, H. (2004) Harnessing nonlinearity: Predicting chaotic systems 6:127 92. [RK] fl ections of processing. Journal of Linguistics – 304:78 Science and saving energy in wireless communication. – 80. [SLF] fl ow of Dynamic syntax: The Kempson, R., Meyer-Viol W. & Gabbay, D. (2001) Jaeger, T. (2010) Redundancy and reduction: Speakers manage syntactic information . Blackwell. [aMHC, RK] language understanding density. Cognitive Psychology 61:23 – 62. [aMHC] Kendrick, K. H. & Torreira, F. (2015) The timing and construction of preference: A : Processing complexity and Jaeger, T. F. & Tily, H. (2011) On language “ utility ” quantitative study. 52(4):255 – 89. [SCL] Discourse Processes fi communicative ef ciency. Wiley Interdisciplinary Reviews: Cognitive Science Kimball, J. (1973) Seven principles of surface structure parsing in natural language. 35. [aMHC] 2:323 – Cognition 2:15 – 47. [aMHC] 45. [WO] -contraction. to Jaeggli, O. (1980) Remarks on 11:239 Linguistic Inquiry – Kirby, S., Cornish, H. & Smith, K. (2008) Cumulative cultural evolution in the lab- Jensen, M. S., Yao, R., Street, W. N. & Simons, D. J. (2011) Change blindness and oratory: An experimental approach to the origins of structure in human lan- Wiley Interdisciplinary Reviews: Cognitive Science inattentional blindness. 105:10681 Proceedings of the National Academy of Sciences guage. 85. – 2:529 46. [aMHC] – [aMHC, MLL, rNC] Johnson, J. S. & Newport, E. L. (1989) Critical period effects in second language Kirby, S., Tamariz, M., Cornish, H. & Smith, K. (2015) Compression and commu- learning: The in fl uence of maturational state on the acquisition of English as a nication drive the evolution of language. 102. [MLL] Cognition 141:87 – http://doi.org/10.1016/0010-0285(89) second language. . Cognitive Psychology Kleinschmidt, D. F. & Jaeger, T. F. (2015) Robust speech perception: Recognize the 90003-0 . [RM] Psychological Review familiar, generalize to the similar, and adapt to the novel. Johnson-Laird, P. N. (1983) Mental models: Towards a cognitive science of language, 122:148 – 203. [KB] inference, and consciousness . Harvard University Press. [aMHC] Lan- Kluender, R. & Kutas, M. (1993) Subjacency as a processing phenomenon. Jolsvai, H., McCauley, S. M. & Christiansen, M. H. (2013) Meaning overrides – 8:573 633. [aMHC, DAC] guage & Cognitive Processes Proceedings of frequency in idiomatic and compositional multiword chunks. In: Koelsch, S. (2005) Neural substrates of processing syntax and semantics in music. the 35th Annual Conference of the Cognitive Science Society, ,Berlin, Germany, Current Opinion in Neurobiology 12. [UL] 15(2):207 – August 3, 2013, , ed. M. Knauff, M. Pauen, N. Sebanz & I. Wachsmuth, July 31 – Journal of Verbal Kolers, P. A. & Roediger, H. L. (1984) Procedures of mind. – pp. 692 97. Cognitive Science Society. [aMHC] 23:425 – 49. [rNC] Learning and Verbal Behavior Jones, G. (2012) Why chunking should be considered as an explanation for Kolodner, J. (1993) Case-based reasoning . Morgan Kaufmann. [rNC] developmental change before short-term memory capacity and processing Kolodny, O., Edelman, S. & Lotem, A. (2014) The evolution of continuous learning speed. Frontiers in Psychology 3:167. doi: 10.3389/fpsyg.2012.00167. of the structure of the environment. Journal of the Royal Society Interface [aMHC, rNC] 11:20131091. [AL, rNC] Jurafsky, D. (1996) A probabilistic model of lexical and syntactic access and disam- Kolodny, O., Edelman, S. & Lotem, A. (2015a) Evolution of protolinguistic abilities 20:137 Cognitive Science – 94. [aMHC] biguation. as a by-product of learning to forage in structured environments. Proceedings of Jurafsky, D., Bell, A., Gregory, M. L. & Raymond, W. D. (2001) Probabilistic rela- the Royal Society of London B 282(1811):20150353. [AL, rNC] tions between words: Evidence from reduction in lexical production. In: 65 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

66 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language References/ , pp. 845 Cognitive Science Society, Austin, TX, July 2014 – 50, Cognitive Science Kolodny, O., Lotem, A. & Edelman, S. (2015b) Learning a generative probabilistic Society. [MLL] Cogni- grammar of experience: A process-level model of language acquisition. Lewis, R. & Vasishth, S. (2005) An activation-based model of sentence processing as 67. [AL, rNC] – tive Science 39:227 29(3):375 – 419. [CRH] Cognitive Science skilled memory retrieval. Konopka, A. E. (2012) Planning ahead: How recent experience with structures and Lewis, R. L., Shvartsman, M. & Singh, S. (2013) The adaptive nature of eye move- Journal of Memory and Lan- words changes the scope of linguistic planning. ments in linguistic tasks: How payoff and architecture shape speed-accuracy 66:143 – guage 62. [aMHC] Topics in Cognitive Science – 5:1 trade-offs. 30. [KB] Kraljic, T. & Brennan, S. (2005) Prosodic disambiguation of syntactic structure: For Lewis, R. L., Vasishth, S. & Van Dyke, J. A. (2006) Computational principles of 231. [aMHC] – 50:194 Cognitive Psychology the speaker or for the addressee? working memory in sentence comprehension. Trends in Cognitive Sciences Kuhl, P. K. (2004) Early language acquisition: Cracking the speech code. Nature 54. [DAC] 10:447 – Reviews. Neuroscience 5(11):831 – 43. http://doi.org/10.1038/nrn1533 . [RM] th, B. C. (1957) The dis- fi Liberman, A. M., Harris, K. S., Hoffman, H. S. & Grif Kuperberg, G. & Jaeger, T. F. (2016) What do we mean by prediction in language crimination of speech sounds within and across phoneme boundaries. Journal of 59. [KB] – 31:32 Language, Cognition, and Neuroscience comprehension? 68. [KB] – Experimental Psychology 54:358 “ and ” “ negatives ” positives Kutas, M., Federmeier, K. D. & Urbach, T. P. (2014) The ’ s trigger experience: Degree-0 learnability. Lightfoot, D. (1989) The child Behav- of prediction in language. In: The Cognitive Neurosciences V , ed. M. S. Gaz- 34. [aMHC] – 12:321 ioral and Brain Sciences zaniga & G. R. Mangun, pp. 649 – 56. MIT Press. [aMHC] . MIT Lightfoot, D. (1991) How to set parameters: Arguments from language change Kutas, M. & Hillyard, S. A. (1980) Reading senseless sentences: Brain potentials Press. [rNC] re fl ect semantic incongruity. 205. [MCP] – 207:203 Science Lim, J.-H. & Christianson, K. (2015) L2 Sensitivity to agreement errors: Evidence Lachmann, M., Számadó, S. & Bergstrom, C. T. (2001) Cost and con ict in animal fl from eye movements during comprehension and translation. Applied Psycho- signals and human language. Proceedings of the National Academy of Science 36:1283 315. [FF] – linguistics 94. [AL] – 98:13189 Lind, J., Enquist, M. & Ghirlanda, S. (2015) Animal memory: A review of delayed Lake, J. K., Humphreys, K. R. & Cardy, S. (2011) Listener vs. speaker-oriented matching-to-sample data. – 58. [AL] 117:52 Behavioural Processes aspects of speech: Studying the dis fl uencies of individuals with autism spectrum Eye movements in reading: fi Loftus, G. R. (1983) Eye xations on text and scenes. In: Psychonomic Bulletin and Review 18(1):135 – 40. [PGTH] disorders. – , ed. K. Rayner, pp. 359 Perceptual and language processes 76. Academic Journal Lakshmanan, U. (2000) The acquisition of relative clauses by Tamil children. Press. [MCP] of Child Language 17. [UL] – 21:587 Psychological Logan, G. D. (1988) Toward an instance theory of automatization. Lakshmanan, U. (2006) Assessing linguistic competence: Verbal in fl ection in child 95:492 Review – 527. [aMHC] Language Assessment Quarterly Tamil. 3(2):171 205. [UL] – Lombardi, L. & Potter, M. C. (1992) The regeneration of syntax in short term Cerebral mecha- Lashley, K. S. (1951) The problem of serial order in behavior. In: 33. [MCP] – 31:713 Journal of Memory and Language memory. – 46. nisms in behavior: The Hixon Symposium , ed. L. A. Jeffress, pp. 112 Lotem, A. & Halpern, J. Y. (2008) A data-acquisition model for learning and cog- Wiley. [aMHC, MCM] nitive development and its implications for autism. Computing and information Perception Lawrence, D. H. (1971a) Temporal numerosity estimates for word lists. science technical reports, Cornell University. Available at: http://hdl.handle.net/ – 78. [MCP] and Psychophysics 10:75 . [AL] 1813/10178 Lawrence, D. H. (1971b) Two studies of visual search for word targets with Perception and Psychophysics controlled rates of presentation. 89. – 10:85 Lotem, A. & Halpern, J. Y. (2012) Coevolution of learning and data-acquisition [MCP] mechanisms: A model for cognitive evolution. Philosophical Transactions of the Lee, E. K., Brown-Schmidt, S. & Watson, D. G. (2013) Ways of looking ahead: Hier- 94. [AL] – 367:2686 Royal Society B archical planning in language production. 62. [DAC, rNC] – 129:544 Cognition Louwerse, M. M., Dale, R., Bard, E. G. & Jeuniaux, P. (2012) Behavior matching in 26. Leger, D. W. (1993) Contextual sources of information and responses to animal Cognitive Science – multimodal communication is synchronized. 36:1404 [aMHC] – 304. [AL] 113:295 Psychological Bulletin communication signals. Luck, S. J. & Vogel, E. K. (1997) The capacity of visual working memory for features Legge, G. E., Pelli, D. G., Rubin, G. S. & Schleske, M. M. (1985) Psychophysics of 81. [rNC] – 390:279 and conjunctions. Nature reading – I. Normal vision. Vision Research 25:239 – 52. [rNC] Luck, S. J., Vogel, E. K. & Shapiro, K. L. (1996) Word meanings can be ac- New directions in Lenneberg, E. H. (1964) A biological perspective of language. In: 383:616 cessed but not reported during the attentional blink. Nature – 18. – the study of language , ed. E. Lenneberg, pp. 65 88. MIT Press. [RM] [MCP] . MIT Press. A generative theory of tonal music Lerdahl, F. & Jackendoff, R. (1983) ̌ ius, M. & Jaeger, H. (2009) Reservoir computing approaches to recurrent evic š Luko [aMHC] 3:127 Computer Science Review neural network training. 49. [SLF] – Lerner, Y., Honey, C. J., Silbert, L. J. & Hasson, U. (2011) Topographic mapping of a Lupyan, G. & Christiansen, M. H. (2002) Case, word order, and language learn- hierarchy of temporal receptive windows using a narrated story. Journal of Proceedings of the 24th ability: Insights from connectionist modeling. In: Neuroscience 31:2906 – 15. [aMHC, CJH] Annual Conference of the Cognitive Science Society, Fairfax, VA, August 2002, . Oxford A history of psycholinguistics: The pre-Chomskyan era Levelt, W. (2012) 601, ed. W. D. Gray & C. Schunn. Erlbaum. [aMHC] – pp. 596 University Press. [DAC] Lupyan, G. & Dale, R. (2010) Language structure is partly determined by social struc- Levelt, W. J. (1993) Speaking: From intention to articulation, vol. 1 . MIT Press. – 5(1):1 PLOS ONE ture. . [TB] http://doi.org/10.1371/journal.pone.0008559 10. [PGTH] Lupyan, G. & Dale, R. A. (2015) The role of adaptation in understanding linguistic Levelt, W. J. M. (2001) Spoken word production: A theory of lexical access. Pro- diversity. In: Language structure and environment: Social, cultural, and natural ceedings of the National Academy of Sciences 98:13464 71. [aMHC] – , ed. R. de Busser & R. J. LaPolla, pp. 287 – 16. John Benjamins. [TB] factors Levinson, S. C. (2000) Presumptive meanings: The theory of generalized conversa- Maass, W., Natschläger, T. & Markram, H. (2002) Real-time computing without . MIT Press. [aMHC, rNC] tional implicature stable states: A new framework for neural computation based on perturbations. Levinson, S. C. (2013) Recursion in pragmatics. 62. [aMHC] – 89:149 Language 60. [SLF] – 14:2531 Neural Computation Levinson, S. C. & Torreira, F. (2015) Timing in turn-taking and its implications for MacDonald, M. C. (1994) Probabilistic constraints and syntactic ambiguity resolu- Frontiers in Psychology processing models of language. 6:731. doi:10.3389/ tion. – 9:157 Language and Cognitive Processes 201. [aMHC, MCM] fpsyg.2015.00731. [SCL] MacDonald, M. C. (2013) How language production shapes language form and Levy, R. (2008) Expectation-based syntactic comprehension. Cognition 106:1126 – comprehension. 4:226. doi: 10.3389/ Frontiers in Psychology 77. [aMHC, ADE] fpsyg.2013.00226. [aMHC, MCM] Levy, R. (2011) Integrating surprisal and uncertain-input models in online sentence MacDonald, M. C. & Christiansen, M. H. (2002) Reassessing working memory: A Proceedings of the comprehension: formal techniques and empirical results. In: Psycho- ) and Waters & Caplan (1996). 1992 comment on Just & Carpenter ( 49th Annual Meeting of the Association for Computational Linguistics, Port- 109:35 54. [aMHC, rNC] logical Review – 65, ed. Y. Matsumoto & R. Mihalcea.. – , pp. 1055 24, 2011 land, OR, June 19 – MacDonald, M. C., Pearlmutter, M. & Seidenberg, M. (1994) The lexical nature of Association for Computational Linguistics. [KB] ambiguity resolution. 703. [aMHC] 101:676 Psychological Review – Levy, R., Bicknell, K., Slattery, T. & Rayner, K. (2009) Eye movement evidence MacKay, D. G. (1987) The organization of perception and action: A theory for lan- that readers maintain and act on uncertainty about past linguistic input. . Springer. [rNC] guage and other cognitive skills – 90. [aMHC, Proceedings of the National Academy of Sciences 106:21086 MacKay, D. J. (2003) Information theory, inference and learning algorithms . Cam- KB, MCM] bridge University Press. [aMHC] Lewis, M. & Frank, M. C. (2015) Conceptual complexity and the evolution of the Magyari, L. & de Ruiter, J. P. (2012) Prediction of turn-ends based on anticipation of lexicon. In: Proceedings of the 37th Annual Meeting of the Cognitive Science upcoming words. Frontiers in Psychology 3:376. doi: 10.3389/ , ed. D. C. Noelle, R. Dale, A. S. Warlaumont, J. Society, Austin, TX, July 2015 fpsyg.2012.00376 [aMHC] 343. Cognitive Yoshimi, T. Matlock, C. D. Jennings & P. P. Magli, pp. 1138 – Mahowald, K., Fedorenko, E., Piantadosi, S. & Gibson, E. (2012) Info/information Science Society. [MLL] ects fl Lewis, M., Sugarman, E. & Frank, M. C. (2014) The structure of the lexicon re theory: Speakers actively choose shorter words in predictable contexts. Cogni- principles of communication. In: Proceedings of the 36th Annual Meeting of the tion 126:313 – 18. BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 66 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

67 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language References/ Major, G. & Tank, D. (2004) Persistent neural activity: Prevalence and mechanisms. Meng, M. & Potter, M. C. (2008) Detecting and remembering pictures with and http://journalofvision. 8(9):7. Available at: Journal of Vision without visual noise. – 14(6):675 Current Opinion in Neurobiology 84. [ADE, rNC] org/8/9/7/ [MCP] recognition of the Mandel, D. R., Jusczyk, P. W. & Pisoni, D. B. (1995) Infants ’ Attention, Meng, M. & Potter, M. C. (2011) An attentional blink for nontargets? 6(5):314 sound patterns of their own names. Psychological Science – 17. [RM] 46. [MCP] Perception, and Psychophysics 73:440 – Mani, N. & Huettig, F. (2012) Prediction during language processing is a piece of Menyhart, O., Kolodny, O., Goldstein, M. H., Devoogd, T. & Edelman, S. (2015) cake Journal of Experimental Psychology: but only for skilled producers. – Juvenile zebra s ’ nches learn the underlying statistical regularities in their father fi Human Perception and Performance 47. [aMHC] – 38:843 song. 6:571. [AL] Frontiers in Psychology Manning, C. D. & Schütze, H. (1999) Foundations of statistical natural language Mermillod, M., Bugaïska, A. & Bonin, P. (2013) The stability-plasticity dilemma: processing . MIT press. [aMHC] Investigating the continuum from catastrophic forgetting to age-limited learn- Marchman, V. A. & Fernald, A. (2008) Speed of word recognition and vocabulary Frontiers in Psychology ing effects. 4:504. doi: 10.3389/fpsyg.2013.00504. knowledge in infancy predict cognitive and language outcomes in later child- [aMHC] 16. [aMHC] – 11:F9 hood. Developmental Science Mery, F., Belay, A. T., So, A. K., Sokolowski, M. B. & Kawecki, T. J. (2007) Natural Theory of syntactic recognition for natural languages . MIT Marcus, M. P. (1980) polymorphism affecting learning and memory in Drosophila. Proceedings of the Press. [aMHC] 104:13051 National Academy of Science – 55. [AL] Markson, L. & Bloom, P. (1997) Evidence against a dedicated system for word Mesulam, M. M. (1998) From sensation to cognition. 52. [GB] – 121(6):1013 Brain learning in children. Nature 385:813 – 15. [ADE] Meyer, D. E. & Schvaneveldt, R. W. (1971) Facilitation in recognizing pairs of Marr, D. (1976) Early processing of visual information. Philosophical Transactions of words: evidence of a dependence between retrieval operations. Journal of Ex- the Royal Society of London, B: Biological Sciences 275:483 519. [aMHC] – – perimental Psychology 34. [aMHC] 90:227 Marr, D. (1982) Vision . W. H. Freeman. [aMHC] Meyer, A. S. (1996) Lexical access in phrase and sentence production: Results from Marslen-Wilson, W. (1987) Functional parallelism in spoken word recognition. picture-word interference experiments. Journal of Memory and Language – 25:71 102. [rNC] Cognition 96. [aMHC, DAC] 35:477 – Marslen-Wilson, W. D. (1975) Sentence perception as an interactive parallel process. Miller, G. A. (1956) The magical number seven, plus or minus two: Some limits on 28. [aMHC] – Science 189:226 97. 63(2):81 – Psychological Review our capacity for processing information. Marslen-Wilson, W. D., Tyler, L. K. & Koster, C. (1993) Integrative processes in [aMHC, SCL, rNC] Journal of Memory and Language – 66. [aMHC] 32:647 utterance resolution. Plans and the structure of Miller, G. A., Galanter, E. & Pribram, K. H. (1960) Martin, R. C. & He, T. (2004) Semantic STM and its role in sentence processing: A . Holt, Rinehart & Winston. [aMHC] behavior – Brain and Language 82. [aMHC] replication. 89:76 Miller, G. A. & Taylor, W. G. (1948) The perception of repeated bursts of noise. Martin, R. C., Shelton, J. R. & Yaffee, L. S. (1994) Language processing and working Journal of the Acoustic Society of America 20:171 – 82. [aMHC] memory: Neuropsychological evidence for separate phonological and semantic Mintz, T. H., Wang, F. H. & Li, J. (2014) Word categorization from distributional Journal of Memory and Language 33:83 – 111. [aMHC] capacities. information: Frames confer more than the sum of their (Bigram) parts. Cog- Maye, J., Werker, J. F. & Gerken, L. (2002) Infant sensitivity to distributional in- – nitive Psychology 75:1 27. [FHW] 111. [RM] – 82:101 formation can affect phonetic discrimination. Cognition Misyak, J. B. & Christiansen, M. H. (2010) When in statistical learning means ” more “ McCauley, S. M. & Christiansen, M. H. (2011) Learning simple statistics for lan- in language: Individual differences in predictive processing of adjacent ” less “ guage comprehension and production: The CAPPUCCINO model. In: Pro- dependencies. In: Proceedings of the 32nd Annual Cognitive Science Society ceedings of the 33rd Annual Conference of the Cognitive Science Society, , ed. R. Catrambone & S. Conference, Portland, OR, August 11 14, 2010 – Boston, MA, July 2011. – 24, ed. L. A. Carlson, C. Hölscher & T. F. pp. 1619 – 91. Cognitive Science Society. [aMHC] Ohlsson, pp. 2686 Shipley. Cognitive Science Society. [aMHC, ADE] Momma, S., Slevc, L. R. & Phillips, C. (2015) The timing of verb planning in active McCauley, S. M. & Christiansen, M. H. (2013) Toward a uni fi ed account of com- and passive sentence production. Poster presented at the 28th annual CUNY prehension and production in language development. Behavioral and Brain 67. [rNC] 36:366 – 21, Conference on Human Sentence Processing, Los Angeles, CA, March 19 – Sciences McCauley, S. M. & Christiansen, M. H. (2014a) Acquiring formulaic language: A 2015. [DAC] computational model. – 36. [rNC] 9:419 Mental Lexicon Momma, S., Slevc, L. R. & Phillips, C. (in press) The timing of verb planning in Journal of Experimental Psychology: Learning, Japanese sentence production. fi McCauley, S. M. & Christiansen, M. H. (2014b) Reappraising lexical speci city . [DAC] Memory, and Cognition Proceedings of the 36th ’ in children s early syntactic combinations. In: Monaghan, P. & Christiansen, M. H. (2008) Integration of multiple probabilistic – pp. 1000 Annual Conference of the Cognitive Science Society. 1005. ed. P. Trends in corpus research: Finding structure in cues in syntax acquisition. In: Bello, M. Guarini, M. McShane, & B. Scassellati. Cognitive Science data – 63. John Benjamins. [aMHC] (TILAR Series), ed. H. Behrens, pp. 139 Society. [aMHC] Monaghan, P. & Christiansen, M. H. (2010) Words in puddles of sound: Modelling Computational investigations McCauley, S. M. & Christiansen, M. H. (2015a) Journal of Child Language psycholinguistic effects in speech segmentation. Submitted. [rNC] of multiword chunks in language learning. 37:545 64. [rNC] – McCauley, S. M. & Christiansen, M. H. (2015b) Individual differences in chunking Mongillo, G., Barak, O. & Tsodyks, M. (2008) Synaptic theory of working memory. ability predict on-line sentence processing. In: Proceedings of the 37th Annual 46. [SLF] Science 319:1543 – Conference of the Cognitive Science Society, Pasadena, CA, July 2015 , pp. Morgan, J. L. & Demuth, K. (1996) Signal to syntax: Bootstrapping from speech to 1553 – 58, ed. D. C. Noelle & R. Dale. Cognitive Science Society. [rNC] grammar in early acquisition. Erlbaum. [aMHC] Language learning as language use: McCauley, S. M. & Christiansen, M. H. (2015c) Morillon, B., Lehongre, K., Frackowiak, R. S. J., Ducorps, A., Kleinschmidt, A., A cross-linguistic model of child language development . Manuscript in prepa- Poeppel, D. & Giraud, A. L. (2010) Neurophysiological origin of human brain ration. [rNC] asymmetry for speech and language. Proceedings of the National Academy of McCauley, S. M., Monaghan, P. & Christiansen, M. H. (2015) Language emergence in 107(43):18688 93. [GB] – Sciences development: A computational perspective. In: The handbook of language emer- . Categorial grammar: Logical syntax, semantics, and processing Morrill, G. (2010) , ed. B. MacWhinney & W. O 36. Wiley-Blackwell. [rNC] – Grady, pp. 415 ’ gence Oxford University Press. [aMHC] McClelland, J. L. (1987) The case for interactionism in language processing. In: Morrill, T. H., Dilley, L. C., McAuley, J. D. & Pitt, M. A. (2014) Distal rhythm in- Attention & performance XII: The psychology of reading , ed. M. Coltheart, pp. fl uences whether or not listeners hear a word in continuous speech: Support for – 35. Erlbaum. [UL] 3 Cognition – 131:69 74. [MCM] a perceptual grouping hypothesis. McClelland, J. L. & Elman, J. L. (1986) The TRACE model of speech perception. Mueller, S. T. & Krawitz, A. (2009) Reconsidering the two-second decay hypothesis in – 86. [rNC] 18(1):1 Cognitive Psychology 25. [AL] verbal working memory. Journal of Mathematical Psychology 53:14 – McCloskey, M. & Cohen, N. J. (1989) Catastrophic interference in connectionist . Con- Language evolution: Contact, competition and change Mufwene, S. (2008) Psychology of Learning and Mo- networks: The sequential learning problem. tinuum International Publishing Group. [aMHC] 24:109 tivation 65. [aMHC] – Murdock B. B., Jr. (1968) Serial order effects in short-term memory. Journal of McElree, B., Foraker, S. & Dyer, L. (2003) Memory structures that subserve sen- 15. [aMHC] Experimental Psychology Monograph Supplement 1 – tence comprehension. 48:67 – Journal of Memory and Language 91. [DAC] Murdock, B. B. (1983) A distributed memory model for serial-order information. McMurray, B., Horst, J. S. & Samuelson, L. K. (2012) Word learning emerges from – 38. [rNC] 90:316 Psychological Review Psy- the interaction of online referent selection and slow associative learning. Nadig, A. S. & Sedivy, J. C. (2002) Evidence of perspective-taking constraints in child- 119:831 chological Review – 77. [rNC] Psychological Science s on-line reference resolution. – 36. [aMHC] 13:329 ren ’ McMurray, B., Tanenhaus, M. K. & Aslin, R. N. (2009) Within-category VOT affects Navon, D. & Miller, J. O. (2002) Queuing or sharing? A critical evaluation of the ” recovery from “ lexical garden-paths: Evidence against phoneme-level inhibi- 44:193 Cognitive Psychology single-bottleneck notion. 251. [aMHC] – – 91. [KB] 60:65 Journal of Memory and Language tion. es fi Neal, R. M. & Hinton, G. E. (1998) A view of the EM algorithm that justi ’ s simplest grammars are creole grammars. Lin- McWhorter, J. (2001) The world Learning in graphical models , ed. incremental, sparse, and other variants. In: guistic Typology http://doi.org/10.1515/lity.2001. 5(2):125 – 66. Available at: M. I. Jordan, pp. 355 – 68. Kluwer. [aMHC] . [TB] 001 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 67 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

68 References/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Orr, D. B., Friedman, H. L. & Williams, J. C. C. (1965) Trainability of listening Neef, M. (2014) Das nächste Paradigma: Realistische Linguistik. Eine Ergänzung Journal of Educational Psychology comprehension of speeded discourse. zum Beitrag Wo stehen wir in der Grammatiktheorie? von Wolfgang Sternefeld – 56:148 56. [aMHC] 20. [CB] und Frank Richter. 124:105 – Muttersprache . Language, music and interaction Orwin, M., Howes, C. & Kempson, R. (2013) Neely, J. H. (1991) Semantic priming effects in visual word recognition: A selective College Publications. [aMHC] fi ndings and theories. In: Basic processes in reading: Visual word review of current Padó, U., Crocker, M. W. & Keller, F. (2009) A probabilistic model of semantic 36. Erlbaum. [MCP] – , ed. D. Besner & G. W. Humphreys, pp. 264 recognition 33:794 Cognitive Science plausibility in sentence processing. 838. [aMHC] – ths, T. L. (2014) A bounded rationality account fi Neumann, R., Rafferty, A. N. & Grif Pallier, C., Devauchelle, A. D. & Dehaene, S. (2011) Cortical representation of the Proceedings of the 36th Annual Conference of the of wishful thinking. In: constituent structure of sentences. Proceedings of the National Academy of , Cognitive Science Society, Quebec City, Canada, July 2014 Quebec City, – 27. [GB] Sciences 108(6):2522 – Quebec, Canada, pp. 1210 15, ed. P. Bello, M. Guarini, M. McShane, & B. Pallier, C., Sebastian-Gallés, N., Dupoux, E., Christophe, A. & Mehler, J. (1998) Scassellati. Cognitive Science Society. [KB] Perceptual adjustment to time-compressed speech: A cross-linguistic study. Nevins, A. (2010) . MIT Press. [aMHC] Locality in vowel harmony Memory and Cognition 26(4):844 51. [rNC] – Newell, A. & Rosenbloom, P. S. (1981) Mechanisms of skill acquisition and the law of Visual Cognition Pani, J. R. (2000) Cognitive description and change blindness. , ed. J. R. Anderson, pp. 1 – Cognitive skills and their acquisition 55. practice. In: 7:107 26. [aMHC] – Erlbaum. [aMHC] Perception and Psycho- Pashler, H. (1988) Familiarity and visual change detection. Cognitive Newport, E. L. (1990) Maturational constraints on language learning. physics 44:369 – 78. [aMHC] – 14:11 28. [RM] Science . MIT Press. [rNC] The psychology of attention Pashler, H. (1998) Nicol, J. L., Forster, K. I. & Veres, C. (1997) Subject-verb agreement processes in Patson, N. D., Darowski, E. S., Moon, N. & Ferreira, F. (2009) Lingering misin- comprehension. 87. [aMHC] – 36:569 Journal of Memory and Language terpretations in garden-path sentences: Evidence from a paraphrasing task. Nieuwland, M. S. & Van Berkum, J. J. (2006) When peanuts fall in love: N400 – 35:280 Journal of Experimental Psychology: Learning, Memory, and Cognition evidence for the power of discourse. Journal of Cognitive Neuroscience 85. [FF] – 11. [CJH] 18(7):1098 Pavani, F. & Turatto, M. (2008) Change perception in complex auditory scenes. Nivre, J. (2003) An ef cient algorithm for projective dependency parsing. In: Pro- fi 29. [aMHC] – 70:619 Perception and Psychophysics ceedings of the 8th International Workshop on Parsing Technologies (IWPT 03), Payne, B. R., Grison, S., Gao, X., Christianson, K., Morrow, D. G. & Stine-Morrow, 60. – Nancy, France, April 23 25, 2003 , ed. H. Bunt & G. van Noord, pp. 149 – E. A. (2014) Aging and individual differences in binding during sentence un- Association for Computational Linguistics. [CG-R] derstanding: Evidence from temporary and global syntactic attachment ambi- Proceedings Nivre, J. (2004) Incrementality in deterministic dependency parsing. In: 73. [FF] – 130:157 Cognition guities. of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Pearlmutter, N. J., Garnsey, S. M. & Bock, K. (1999) Agreement processes in sen- Together , ed. F. Keller, S. Clark, M. Crocker & M. Steedman, pp. 50 57. – 56. [aMHC] – 41:427 Journal of Memory and Language tence comprehension. Association for Computational Linguistics. [CG-R] Pellegrino, F., Coupé, C. & Marsico, E. (2011) A cross-language perspective on Nivre, J. (2008) Algorithms for deterministic incremental dependency parsing. – 58. [aMHC] 87:539 speech information rate. Language Computational Linguistics 34(4):513 – 53. [CG-R] Pelucchi, B., Hay, J. F. & Saffran, J. R. (2009) Learning in reverse: Eight-month-old Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S. & Marsi, infants track backward transitional probabilities. 47. – 113:244 Cognition E. (2007) MaltParser: A language-independent system for data-driven [rNC] 35. [CG-R] dependency parsing. Natural Language Engineering 13(2):95 – outside reestimation from partially Pereira, P. & Schabes, Y. (1992) Inside – nite parameter fi Niyogi, P. & Berwick, R. C. (1996) A language learning model for bracketed corpora. Proceedings of the 30th Annual Meeting on Association 61:161 Cognition spaces. – 93. [aMHC] – – 35, for Computational Linguistics, Newark, DE, June 28 July 2, 1992 , pp. 128 Niyogi, P. & Berwick, R. C. (2009) The proper treatment of language acquisition and ed. H. S. Thompson. Association for Computational Linguistics. change in a population setting. Proceedings of the National Academy of Sciences [aMHC] – 106:10124 29. [ADE] Norman, D. A. & Shallice, T. (1986) Attention to action: Willed and automatic Perfors, A., Tenenbaum, J. B. & Regier, T. (2011) The learnability of abstract syn- Consciousness and self-regulation , ed. R. J. Davidson, control of behavior. In: – 118(3):306 Cognition tactic principles. 38. doi: 10.1016/j.cogni- G. E. Schwartz & D. Shapiro, pp. 1 – 18. Plenum Press. [aMHC] tion.2010.11.001. [DPM] Norris, D., McQueen, J. M., Cutler, A. & Butter fi eld, S. (1997) The possible word ’ acceptabilité des phrases [The acceptability Peterfalvi, J. M. & Locatelli, F (1971) L Cognitive Psychology constraint in the segmentation of continuous speech. 71(2):417 27. [rNC] – Année Psychologique ’ L of sentences]. 34:191 – 43. [KE] Petersson, K. M. & Hagoort, P. (2012) The neurobiology of syntax: Beyond string Nosofsky, R. M. (1986) Attention, similarity, and the identi fi cation – categorization Philosophical Transactions of the Royal Society B – 367:1971 sets. 83. [SLF] 115:39. [aMHC] Journal of Experimental Psychology: General relationship. ’ Petkov, C. I., O Connor, K. N. & Sutter, M. L. (2007) Encoding of illusory continuity Noveck, I. A. & Reboul, A. (2008) Experimental pragmatics: A Gricean turn in the – in primary auditory cortex. Neuron 54(1):153 65. [GB] – 12:425 Trends in Cognitive Sciences study of language. 31. [aMHC] . Palgrave Macmillan. Word frequency and lexical diffusion Phillips, B. S. (2006) Nygaard, L. C., Sommers, M. S. & Pisoni, D. B. (1994) Speech perception as a [aMHC] talker-contingent process. Psychological Science – 46. [aMHC] 5:42 Proceed- fl Phillips, C. (1996) Merge right: An approach to constituency con icts. In: O ’ Grady, W. (2005) Syntactic carpentry: An emergentist approach to syntax . ings of the Fourteenth West Coast Conference on Formal Linguistics, volume 15, Erlbaum. [aMHC, WO] Los Angeles, CA, March 1994 , pp. 381 – 95, ed. J. Camacho, L. Choueiri & M. O Approaches to Bilingualism Grady, W. (2013) The illusion of language acquisition. ’ Watanabe. University of Chicago Press. [aMHC] 85. [aMHC, WO, rNC] 3:253 – – 34:37 Linguistic Inquiry Phillips, C. (2003) Linear order and constituency. 90. The handbook of Grady, W. (2015a) Anaphora and the case for emergentism. In: ’ O [aMHC] , ed. B. MacWhinney & W. O ’ Grady, pp. 100 – 22. Wiley- language emergence Phillips, C. (2013) Some arguments and nonarguments for reductionist accounts of Blackwell. [aMHC, WO, rNC] – syntactic phenomena. Language and Cognitive Processes 28:156 87. [DAC] O – 65:6 32. [WO] Grady, W. (2015b) Processing determinism. ’ Language Learning cient fi Piantadosi, S., Tily, H. & Gibson, E. (2011) Word lengths are optimized for ef Oaksford, M. & Chater, N., eds. (1998) Rational models of cognition. . Oxford Uni- communication. – 108:3526 Proceedings of the National Academy of Sciences versity Press. [aMHC] 29. [aMHC, MLL] . Oxford University Press. Bayesian rationality Oaksford, M. & Chater, N. (2007) Piantadosi, S., Tily, H. & Gibson, E. (2012) The communicative function of ambi- [aMHC] 122:280 Cognition – 91. [aMHC] guity in language. Odling-Smee, F. J., Laland, K. N. & Feldman, M. W. (2003) Niche construction: The Piattelli-Palmarini, M., Hancock, R. & Bever, T. (2008) Language as ergonomic . Princeton University Press. [AL] neglected process in evolution, vol. MPB 37 Behavioral and Brain Sciences 31(5):530 – 31. perfection [Peer Commentary] Ohno, T. & Mito, S. (1988) . Productivity Just-in-time for today and tomorrow [DPM] Press. [aMHC] Pickering, M. J. & Branigan, H. P. (1998) The representation of verbs: Evidence Omaki, A., Davidson-White, I., Goro, T., Lidz, J. & Phillips, C. (2014) No fear of from syntactic priming in language production. Journal of Memory and Lan- s incremental interpretation in English and Japanese. ’ commitment: Children guage 39:633 – 51. [aMHC] 10:206 – 33. [DAC] Language Learning and Development Pickering, M. J. & Garrod, S. (2004) Toward a mechanistic psychology of dialogue. c visualization for the lan- fi Onnis, L. & Spivey, M. J. (2012) Toward a new scienti 27:169 – 226. [aMHC, rNC] Behavioral and Brain Sciences guage sciences. Information 3:124 – 50. [AL] Pickering, M. J. & Garrod, S. (2007) Do people use language production to make Onnis, L., Waterfall, H. R. & Edelman, S. (2008) Learn locally, act globally: Learning predictions during comprehension? Trends in Cognitive Sciences 11:105 – 10. – 30. [AL] 109:423 language from variation set cues. Cognition [aMHC, rNC] Orfanidou, E., Morgan, G., Adam, R. & McQueen, J. (2010) Recognition of signed Pickering, M. J. & Garrod, S. (2013a) An integrated theory of language production 47. [aMHC, AB, – and spoken language: Different sensory inputs, the same segmentation proce- 36: 329 Behavioral and Brain Sciences and comprehension. rNC] – dure. Journal of Memory and Language 62:272 83. [KE] BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 68 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

69 References/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Pickering, M. J. & Garrod, S. (2013b) How tightly are production and comprehen- Poulet, J. F. A. & Hedwig, B. (2006) New insights into corollary discharges mediated by identi 21. [aMHC] ed neural pathways. Trends in Neurosciences 30:14 – fi 92. Available at: – 4:377 Frontiers in Psychology sion interwoven? http://www. Princeton, N. J. & Stromswold, K. (2001) The heritability of language: A review and . [RK] ncbi.nlm.nih.gov/pmc/articles/PMC3636456/ – 23. [AL] Language metaanalysis of twin, adoption, and linkage studies. 77:647 . Laboratory Phonology VII. Pierrehumbert, J. (2002) Word-speci fi c phonetics Pulman,S.G.(1985)Aparser thatdoesn ’ t. ProceedingsoftheSecondEuropeanMeeting Mouton de Gruyter. [aMHC] of the Association for Computational Linguistics, Geneva, Switzerland, March Pietroski, P. (2008) Minimalist meaning, internalist interpretation. Biolinguistics , pp. 128 1985 – 35, Association for Computational Linguistics. [aMHC, rNC] 41. [DPM] – 2(4):317 Pulman, S. G. (1986) Grammars, parsers, and memory limitations. Language and Pine, J. M., Freudenthal, D., Krajewski, G. & Gobet, F. (2013) Do young children 1(3):197 Cognitive Processes 225. [ADE] – s law and the case of the determiner. ’ have adult-like syntactic categories? Zipf Purver, M., Cann, R. & Kempson, R. (2006) Grammars as parsers: Meeting the Cognition 127:345 60. [aMHC] – dialogue challenge. 326. Research in Language and Computation 4(2 – 3):289 – Pinker, S. (1984) Language learnability and language development . Harvard Uni- [PGTH] versity Press. [aMHC] Purver, M., Eshghi, A. & Hough, J. (2011) Incremental semantic construction in a Pinker, S. (1994) . William The language instinct: How the mind creates language Proceedings of the Ninth International Conference on Com- dialogue system. Morrow. [aMHC] – putational Semantics (IWCS),, 14, 2011, pp. 365 – 69, Oxford, UK, January 12 Pinker, S. & Bloom, P. (1990) Natural language and natural selection. Behavioral ed. J. Box & S. Pulman. Association for Computational Linguistics. [PGTH] 13:707 and Brain Sciences – 27. [aMHC] A grammar of con- Quirk, R., Greenbaum, S., Leech, G. N. & Svartvik, J. (1972) Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel . Longman. [DAC] temporary English Cognition distributed processing model of language acquisition. – 193. 28:73 Rabinovich, M., Huerta, R. & Laurent, G. (2008) Transient dynamics for neural [aMHC] 321:48 Science processing. 50. [SLF] – Pinkster, H. (2005) The language of Pliny the Elder. In: The language of Latin Prose , Ratcliff, R. (1990) Connectionist models of recognition memory: Constraints 56. Oxford University – ed. T. Reinhardt, M. Lapidge & J. N. Adams, pp. 239 – imposed by learning and forgetting functions. Psychological Review 97:285 Press. [SCL] 308. [aMHC] Pisoni, D. B. & Lazarus, J. H. (1974) Categorical and noncategorical modes of Eye movements in reading: Perceptual and language pro- Rayner, K., ed. (1983) speech perception along the voicing continuum. Journal of the Acoustical cesses . Academic Press. [MCP] 33. [KB] – 55:328 Society of America Rayner, K., ed. (1992) Eye movements and visual cognition: Scene perception and Pisoni, D. B. & Tash, J. (1974) Reaction times to comparisons within and across . Springer. [MCP] reading 90. [KB] 15:285 Perception & Psychophysics phonetic categories. – Reali, F. & Christiansen, M. H. (2005) Uncovering the richness of the stimulus: Poesio, M & Rieser, H. (2011) An incremental model of anaphora and reference Cognitive Science Structure dependence and indirect statistical evidence. 77. – 2:235 Dialogue and Discourse resolution based on resource situations. – 29(6):1007 28. [DPM, FHW, rNC] . [RK] http://dad.uni-bielefeld.de/index.php/dad/article/view/373/1461 Reali, F. & Christiansen, M. H. (2007a) Processing of relative clauses is made easier Poggio, T. & Edelman, S. (1990) A network that learns to recognize 3D objects. – 23. by frequency of occurrence. 57:1 Journal of Memory and Language 66. [rNC] – 343:263 Nature [aMHC] Postal, P. M. (2003) Remarks on the foundations of linguistics. The Philosophical Reali, F. & Christiansen, M. H. (2007b) Word-chunk frequencies affect the pro- – 51. [CB] Forum 34:233 Quarterly Journal of Experi- cessing of pronominal object-relative clauses. ontology. Postal, P. M. (2009) The incoherence of Chomsky s ’ “ Biolinguistic ” Biol- mental Psychology – 70. [aMHC] 60:161 – inguistics 23. [CB] 3:104 Journal of Exper- Reali, F. & Christiansen, M. H. (2009) On the necessity of an interdisciplinary ap- Potter, M. C. (1976) Short-term conceptual memory for pictures. 2:509 imental Psychology: Human Learning and Memory 22. [MCP] – Language universals proach to language universals. In: , ed. M. H. Christiansen, Potter, M. C. (1984) Rapid serial visual presentation (RSVP): A method for studying 77. Oxford University Press. [rNC] – C. Collins & S. Edelman, pp. 266 language processing. In: , ed. New methods in reading comprehension research Colorless green ideas sleep furiously Reali, F., Dale, R. & Christiansen, M. H. (2005) – D. Kieras & M. Just, pp. 91 18. Erlbaum. [MCP] revisited: A statistical perspective. In: Proceedings of the 27th Annual Meeting of – the Cognitive Science Society, Stresa, Italy, July 2005 26, ed. B. G. , pp. 1821 Memory and Cognition Potter, M. C. (1993) Very short-term conceptual memory. Bara, L. W. Barsalou & M. Bucciarelli. Erlbaum. [rNC] 61. [MCP] – 21:156 fi Reali, F. & Grif ths, T. (2009) The evolution of frequency distributions: Relating Potter, M. C. (2009) Conceptual short term memory. Scholarpedia 5(2):3334. regularization to inductive biases through iterated learning. – 111:317 Cognition [MCP] 28. [MLL, rNC] Potter, M. C. (2012) Conceptual short term memory in perception and Redington, M., Chater, N. & Finch, S. (1998) Distributional information: A powerful Frontiers in Psychology 3:113. doi: 10.3389/fpsyg.2012.00113. thought. 469. Cognitive Science 22:425 – cue for acquiring syntactic categories. [MCP, rNC] [aMHC] Potter, M. C. & Faulconer, B. A. (1975) Time to understand pictures and words. Regier, T., Kay, P. & Khetarpal, N. (2007) Color naming re ects optimal partitions of fl 38. [MCP] 253:437 Nature – Proceedings of the National Academy of Sciences – 104:1436 41. color space. Potter, M. C. & Lombardi, L. (1990) Regeneration in the short-term recall of sen- [MLL] Journal of Memory and Language tences. 29:633 – 54. [MCP] Remez, R. E., Fellowes, J. M. & Rubin, P. E. (1997) Talker identi fi cation based on Potter, M. C. & Lombardi, L. (1998) Syntactic priming in immediate recall of sen- Journal of Experimental Psychology: Human Perception phonetic information. 82. [aMHC, MCP] Journal of Memory and Language – 38:265 tences. and Performance – 66. [aMHC] 23:651 Potter, M. C., Kroll, J. F. & Harris, C. (1980) Comprehension and memory in rapid Remez, R. E., Ferro, D. F., Dubowski, K. R., Meer, J., Broder, R. S. & Davids, M. L. Attention and Performance VIII sequential reading. In: , ed. R. Nickerson, pp. (2010) Is desynchrony tolerance adaptable in the perceptual organization of 395 18. Erlbaum. [MCP] – Attention, Perception, and Psychophysics – 72:2054 speech? 58. [aMHC] Potter, M. C., Kroll, J. F., Yachzel, B., Carpenter, E. & Sherman, J. (1986) Pictures in Richerson, P. J. & Christiansen, M. H., eds. (2013) Cultural evolution: Society, Journal of Experimental Psychology: sentences: Understanding without words. technology, language and religion. MIT Press. [aMHC] General 115:281 – 94. [MCP] Rigotti, M., Barak, O., Warden, M. R., Wang, X. -J., Daw, N. D., Miller, E. K. & Potter, M. C., Moryadas, A., Abrams, I. & Noel, A. (1993) Word perception and Fusi, S. (2013) The importance of mixed selectivity in complex cognitive tasks. misperception in context. Journal of Experimental Psychology: Learning, Nature 497:585 – 90. [SLF] – Memory, and Cognition 19:3 22. [MCP] Rizzi, L. (1990) Relativized minimality . MIT Press. [ADE] Potter, M. C., Staub, A. & O ’ Connor, D. H. (2002) The time course of competition Trends Rizzolatti, G. & Arbib, M. A. (1998) Language within our grasp. [Viewpoint] for attention: Attention is initially labile. Journal of Experimental Psychology: in Neurosciences 21(5):188 – 94. [DPM] – 28:1149 Human Perception and Performance 62. [MCP] Roland, D., Elman, J. & Ferreira, V. (2006) Why is that? Structural prediction and Potter, M. C., Stiefbold, D. & Moryadas, A. (1998) Word selection in reading sen- Cognition ambiguity resolution in a very large corpus of English sentences. Journal of Experimental Psychol- tences: Preceding versus following contexts. – 72. [aMHC] 98:245 ogy: Learning, Memory, and Cognition 24:68 100. [MCP] – Rose, Y. & Brittain, J. (2011) Grammar matters: Evidence from phonological and Potter, M. C., Valian, V. V. & Faulconer, B. A. (1977) Representation of a sentence Selected proceedings morphological development in Northern East Cree. In: and its pragmatic implications: Verbal, imagistic, or abstract? Journal of Verbal of the 4th Conference on Generative Approaches to Language Acquisition 16:1 – 12. [MCP] Learning and Verbal Behavior 208, ed. M. – , pp. 193 North America (GALANA 2010), Somerville, MA Potter, M. C., Wyble, B., Hagmann, C. E. & McCourt, E. S. (2014) Detecting Pirvulescu, M. C. Cuervo, A. T. Pérez-Leroux, J. Steele & N. Strik. Cas- Attention, Perception, and Performance meaning in RSVP at 13 ms per picture. www.lingref.com , document cadilla Proceedings Project. Available at: 76(2):270 – 79. doi: 10.3758/s13414-013-0605-z. [MCP] #2596. [UL] Potter, M. C., Wyble, B., Pandav, R. & Olejarczyk, J. (2010) Picture detection in Ross, J. R. (1967) Constraints on variables in syntax. Unpublished doctoral disser- Journal of Experimental Psychology: Human Per- RSVP: Features or identity? tation. Department of Linguistics, MIT. [ADE] 94. [MCP] – 36:1486 ception and Performance 69 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

70 References/ Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English – 111:E4687 Proceedings of the National Academy of Sciences narrative speech. verbs. In: Parallel distributed processing: Explorations in the microstructure of 96. [aMHC] , ed. J. L. McClelland, cognition. Volume 2: Psychological and biological models Silver, M. R., Grossberg, S., Bullock, D., Histed, M. H. & Miller, E. K. (2011) A 71. MIT Press. – D. E. Rumelhart & the PDP Research Group, pp. 216 neural model of sequential movement planning and control of eye movements: [aMHC] Item-order-rank working memory and saccade selection by the supplementary elds. Neural Networks 26:29 – 58. [SG] Parallel Rumelhart, D. E., McClelland, J. L. & the PDP Research Group (1986a) fi eye Simon, H. A. (1956) Rational choice and the structure of the environment. Psy- distributed processing: Explorations in the microstructure of cognition, volumes – chological Review 63:129 38. [aMHC] . MIT Press. [aMHC] 1 and 2 Models of bounded rationality: Empirically grounded economic Simon, H. A. (1982) Parallel Rumelhart, D. E., McClelland, J. L. & the PDP Research Group (1988) . MIT Press. [aMHC, KB] reason , pp. 354 62. IEEE. [DPM] – distributed processing, vol. 1 403. – 61:393 Simon, H. A. & Chase, W. G. (1973) Skill in chess. American Scientist Rumelhart, D. E., Smolensky, P., McClelland, J. L. & Hinton, G. (1986b) Sequential [rNC] Parallel distributed processing: explora- thought processes in PDP models. In: Simons, D. J. & Levin, D. T. (1998) Failure to detect changes to people during a , ed, J. L. McClelland & D. E. tions in the microstructures of cognition, vol. 2 – Psychonomic Bulletin and Review real-world interaction. 49. [aMHC] 5:644 – 57. MIT Press. [rNC] Rumelhart, pp. 3 17:616 Singer, W. (2013) Cortical dynamics revisited. Trends in Cognitive Sciences – Saad, D., ed. (1998) Cambridge University On-line learning in neural networks. 26. [SLF] Press. [aMHC] Singleton, J. L. & Newport, E. L. (2004) When learners surpass their models: The Sacks, H., Schegloff, E. & Jefferson, G. (1974) Simplest systematics for organization acquisition of American Sign Language from inconsistent input. Cognitive 35. [SCL] – 50(4):696 Language of turn-taking for conversation. Psychology 49:370 – 407. [rNC] Sagarra, N. & Herschensohn, J. (2010) The role of pro fi ciency and working memory Siyanova-Chanturia, A., Conklin, K. & Van Heuven, W. J. B. (2011) Seeing a phrase Lingua in gender and number processing in L1 and L2 Spanish. – 120:2022 “ ” matters: The role of phrasal frequency in the processing of time and again 39. [aMHC] multiword sequences. Journal of Experimental Psychology: Learning, Memory, Sahin, N. T., Pinker, S., Cash, S. S., Schomer, D. & Halgren, E. (2009) Sequential – 37:776 and Cognition 784. [aMHC] s ’ processing of lexical, grammatical, and articulatory information within Broca Slattery, T. J., Sturt, P., Christianson, K., Yoshida, M. & Ferreira, F. (2013) Lingering Science 326:445. [aMHC] area. awed semantic pro- fl misinterpretations of garden path sentences arise from Sandler, W. (1986) The spreading hand autosegment of American Sign Language. – 69:104 Journal of Memory and Language cessing. 20. [FF, rNC] – Sign Language Studies 50:1 28. [KE] Smith, K. & Kirby, S. (2008) Cultural evolution: Implications for understanding the Gesture Sandler, W. (2012) Dedicated gestures and the emergence of sign language. human language faculty and its evolution. Philosophical Transactions of the 307. [rNC] 12:265 – 603. [aMHC] Royal Society B 363:3591 – Sandler, W., Aronoff, M., Meir, I. & Padden, C. (2011) The gradual emergence of Smith, K. & Wonnacott, E. (2010) Eliminating unpredictable variation through it- Natural Language and Linguistic Theory phonological form in a new language. 116:444 erated learning. Cognition – 49. [aMHC] 43. [KE] – 29:503 Smith, M. & Wheeldon, L. (1999) High level processing scope in spoken sentence Sandler, W., Meir, I., Padden, C. & Aronoff, M. (2005) The emergence of grammar: – 73:205 Cognition production. 46. [aMHC, DAC, rNC] Proceedings of the National Academy of Systematic structure in a new language. fl Smith, M. & Wheeldon, L. (2004) Horizontal information ow in spoken sentence Sciences 102:2661 – 65. [aMHC] production. Journal of Experimental Psychology: Learning, Memory, and Sanford, A. J. & Sturt, P. (2002) Depth of processing in language comprehension: – 30:675 Cognition 686. [MCM, rNC] Not noticing the evidence. 6:382 – 86. [rNC] Trends in Cognitive Sciences Snedeker, J. & Trueswell, J. (2003) Using prosody to avoid ambiguity: Effects of Sarma, V. (2003) Noncanonical word order: Topic and focus in adult and child Tamil. Journal of Memory and Language speaker awareness and referential context. Word order and scrambling , ed. S. Karimi, pp. 238 72. Blackwell. [UL] – In: 48:103 – 30. [aMHC] Saxton, M., Houston-Price, C. & Dawson, N. (2005) The prompt hypothesis: Clar- Snijders, L. (2012) Issues concerning constraints on discontinuous NPs in Latin. In: Applied Psycho- i fi cation requests as corrective input for grammatical errors. , July 1, 2012 – Proceedings of the LFG12 Conference, Bali, Indonesia, June 28 linguistics 26(3):393 – 14. [GB] http:// pp. 565 81, ed. M. Butt & T. H. King. CSLI Publications. Available at: – Schegloff, E. (2007) Sequence organization in interaction . Cambridge University web.stanford.edu/group/cslipublications/cslipublications/LFG/17/lfg12.html Press. [SCL] Schegloff, E. A., Jefferson, G. & Sacks, H. (1977) The preference for self-correction [SCL] Language 82. – 53(2):361 in the organization of repair in conversation. Solan, Z., Horn, D., Ruppin, E. & Edelman, S. (2005) Unsupervised learning of [PGTH] – natural languages. Proceedings of the National Academy of Science 102:11629 Schlenker, P. (2010) A phonological condition that targets discontinuous syntactic 34. [AL] 13. [SCL] 22:11 – units: Ma/mon suppletion in French. Snippets Sperling, G. (1960) The information available in brief visual presentations. Psycho- Schlesinger, I. M. (1975) Why a sentence in which a sentence in which a sentence is logical Monographs: General and Applied 74:1 29. [aMHC, KB] – 66. [rNC] Linguistics cult. fi embedded is embedded is dif 153:53 – Sperling, G., Budiansky, J. Spivak, J. G. & Johnson, M. C. (1971) Extremely rapid Schmidt, R. A. & Wrisberg, C. A. (2004) Motor learning and performance, third visual search: The maximum rate of scanning letters for the presence of a . Human Kinetics. [aMHC] edition numeral. 11. [MCP] – 174:307 Science Schultz, W., Dayan, P. & Montague, P. R. (1997) A neural substrate of prediction Staub, A. & Clifton, C., Jr. (2006) Syntactic prediction in language comprehension: 99. [aMHC] and reward. Science 275:1593 – Journal of Experimental Psychology: Learning, Evidence from either ... or. Schwab, E. C., Nusbaum, H. C. & Pisoni, D. B. (1985) Some effects of training on 36. [aMHC] 32:425 Memory, and Cognition – 408. [ADE] the perception of synthetic speech. 27:395 Human Factors – Steedman, M. (1987) Combinatory grammars and parasitic gaps. Natural Language Seidenberg, M. S. (1997) Language acquisition and use: Learning and applying 39. [aMHC] and Linguistic Theory 5:403 – 603. [aMHC] – 275:1599 Science probabilistic constraints. . MIT Press. [aMHC] The syntactic process Steedman, M. (2000) Seidenberg, M. S. & McClelland, J. L. (1989) A distributed, developmental model of Foraging theory . Princeton University Stephens, D. W. & Krebs, J. R. (1986) word recognition and naming. Psychological Review 96:523 – 68. [aMHC] Press. [AL] Seitz, A. R. & Watanabe, T. (2003) Psychophysics: Is subliminal learning really Stephens, G. J., Honey, C. J. & Hasson, U. (2013) A place for time: The spatio- passive? Nature 422:36. [ADE] Journal of temporal structure of neural dynamics during natural audition. Senghas, A., Kita, S. & Özyürek, A. (2004) Children creating core properties of Neurophysiology 26. [aMHC, CJH] – 110(9):2019 Science language: Evidence from an emerging sign language in Nicaragua. Stephens, G. J., Silbert, L. J. & Hasson, U. (2010) Speaker – listener neural coupling – 305:1779 82. [rNC] – PNAS 30. [aMHC] underlies successful communication. 107:14425 Shanks, D. R. & St. John, M. F. (1994) Characteristics of dissociable human learning Stivers, T., En eld, N. J., Brown, P., Englert, C., Hayashi, M., Heinemann, T., fi 17:367 95. [rNC] – Behavioral and Brain Sciences systems. Hoymann, G., Rossano, F., de Ruiter, J. P., Yoon, K.-Y. & Levinson, S. C. Shannon, C. (1948) A mathematical theory of communication. Bell System Technical (2009) Universals and cultural variation in turn-taking in conversation. – 27:623 Journal 56. [aMHC] – Proceedings of the National Academy of Sciences 106:10587 92. Shin, Y. K., Proctor, R. W. & Capaldi, E. J. (2010) A review of contemporary ideo- [aMHC, rNC] Psychological Bulletin 136:943 74. [AB] motor theory. – Studdert-Kennedy, M. (1986) Some developments in research on language behavior. North East Siegel, D. (1978) The adjacency constraint and the theory of morphology. In: Behavioral and social science: Fifty years of discovery: In commemoration of – 97. [aMHC] Linguistics Society 8:189 the fi ftieth anniversary of the “ Ogburn Report, ” recent social trends in the Sigman, M. & Dehaene, S. (2005) Parsing a cognitive task: A characterization of the , ed. N. J. Smelser & D. R. Gerstein, pp. 208 United States – 48. National mind PLOS Biology 3(2):e37. doi:10.1371/journal. s bottleneck. ’ Academy Press. [aMHC] pbio.0030037. [aMHC] Sturt, P. & Crocker, M. W. (1996) Monotonic syntactic processing: A cross-linguistic Silbert, L. J., Honey, C. J., Simony, E., Poeppel, D. & Hasson, U. (2014) Coupled study of attachment and reanalysis. Language and Cognitive Processes 11:449 – neural systems underlie the production and comprehension of naturalistic 94. [aMHC] BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 70 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

71 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language References/ Supalla, S. (1991) Manually coded English: The modality question in signed language Tylén, K., Christensen, P., Roepstorff, A., Lund, T., Østergaard, S. & Donald, M. , ed. P. Siple & Theoretical issues in sign language research development. In: (2015) Brains striving for coherence: Long-term cumulative plot formation in 109. University of Chicago Press. [KE] S. D. Fischer, pp. 85 – 121:106 NeuroImage the default mode network. 14. [rNC] – Supalla, S. & McKee, C. (2002) The role of manually coded English in language Tyler, L. K. & Warren, P. (1987) Local and global structure in spoken language Modality and structure in signed and spoken development of deaf children. In: 57. [FF] comprehension. Journal of Memory and Language 26(6):638 – 65. – , ed. R. P. Meier, K. Cormier & D. Quinto-Pozos, pp. 143 languages sequences on processing Japanese Uehara, K. & Bradley, D. (1996) The effect of -ga fi c-Asia conference on lan- Cambridge University Press. [KE] 11th Paci multiply center-embedded sentences. In: pp. 187 – 196. Seoul: Kyung Hee Uni- guage, information, and computation, Swaab, T., Brown, C. M. & Hagoort, P. (2003) Understanding words in sentence versity. [rNC] 86:326 contexts: The time course of ambiguity resolution. – Brain and Language Ullman, M. T. (2001) The declarative/procedural model of lexicon and grammar. 43. [aMHC] 69. [DPM] – 30:37 Journal of Psycholinguistic Research fi cation of syn- Swets, B., Desmet, T., Clifton, C. & Ferreira, F. (2008) Underspeci Valian, V., Solt, S. & Stewart, J. (2009) Abstract categories or limited-scope formu- Memory and Cognition tactic ambiguities: Evidence from self-paced reading. – s determiners. ’ lae? The case of children Journal of Child Language 36:743 16. [rNC] 36:201 – 78. [aMHC] Swinney, D. A. & Osterhout, L. (1990) Inference generation during auditory language Van Berkum, J. J., Brown, C. M., Zwitserlood, P., Kooijman, V. & Hagoort, P. (2005) comprehension. 33. [GB] Psychology of Learning and Motivation 25:17 – Anticipating upcoming words in discourse: Evidence from ERPs and reading Szostak, C. M. & Pitt, M. A. (2013) The prolonged in fl uence of subsequent context times. Journal of Experimental Psychology: Learning, Memory, and Cognition – on spoken word recognition. Attention, Perception, and Psychophysics 75:1533 31:443 – 67. [aMHC] 1546. [KB] van den Brink, D., Brown, C. M. & Hagoort, P. (2001) Electrophysiological evidence Talmy, L. (2000) MIT Press. [RM] Toward a cognitive semantics. for early contextual in fl uences during spoken-word recognition: N200 versus Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M. & Sedivy, J. C. (1995) Journal of Cognitive Neuroscience N400 effects. 13:967 – 85. [aMHC] Integration of visual and linguistic information in spoken language compre- Van Dyke, J. A. & Johns, C. L. (2012) Memory Interference as a determinant of hension. 34. [aMHC] 268:1632 Science – 11. Language and Linguistics Compass 6:193 – language comprehension. Tenenbaum, J. B., Kemp, C., Grif ths, T. L. & Goodman, N. D. (2011) How to grow fi [MCM, rNC] a mind: Statistics, structure, and abstraction. 85. [aMHC] – 331:1279 Science Van Everbroeck, E. (1999) Language type frequency and learnability: A connec- Thornton, R., MacDonald, M. C. & Gil, M. (1999) Pragmatic constraint on the in- tionist appraisal. In: Proceedings of the 21st Annual Conference of the Cognitive terpretation of complex noun phrases in Spanish and English. Journal of Ex- – , pp. 755 Science Society, Vancouver, British Columbia, Canada, August 1999 – 25:1347 perimental Psychology: Learning, Memory, and Cognition 65. 60, ed. M. Hahn & S. C. Stoness. Erlbaum. [aMHC] [aMHC] van Gompel, R. P. & Liversedge, S. P. (2003) The in fl uence of morphological in- Tincoff, R. & Jusczyk, P. W. (1999) Some beginnings of word comprehension in 6- Journal of Experimental Psy- formation on cataphoric pronoun assignment. Psychological Science 10(2):172 – 75. Available at: month-olds. http://doi.org/10. 39. [aMHC] – 29:128 chology: Learning, Memory, and Cognition . [RM] 1111/1467-9280.00127 van Gompel, R. P. G., Pickering, M. J., Pearson, J. & Jacob, G. (2006) The activation Tincoff, R. & Jusczyk, P. W. (2012) Six-month-olds comprehend words that refer to of inappropriate analyses in garden-path sentences: Evidence from structural http://doi.org/10.1111/j. 44. Available at: – Infancy 17(4):432 parts of the body. 55:335 priming. 62. [FF] Journal of Memory and Language – . [RM] 1532-7078.2011.00084.x van Soelen, I. L. C., Brouwer, R. M., van Leeuwen, M., Kahn, R. S., Hulshoff Pol, H. . First verbs: A case study of early grammatical development Tomasello, M. (1992) E. & Boomsma, D. I. (2011) Heritability of verbal and performance intelligence Cambridge University Press. [aMHC] Twin Research and Human Genetics in a pediatric longitudinal sample. 14:119 – Constructing a language: A usage-based theory of language Tomasello, M. (2003) 28. [AL] acquisition. Harvard University Press. [aMHC, DPM, rNC] Vasishth, S., Suckow, K., Lewis, R. L. & Kern, S. (2010) Short-term forgetting in Tomasello, M. (2006) Acquiring linguistic constructions. In: Handbook of child sentence comprehension: Crosslinguistic evidence from verb- nal structures. fi psychology. 2. Cognition, perception, and language , ed. W. Damon, R. Lerner, Language and Cognitive Processes 25:533 – 67. [rNC] D. Kuhn & R. Siegler, pp. 255 – 98. Wiley. [DPM] ’ s conversa- Tomasello, M., Conti-Ramsden, G. & Ewert, B. (1990) Young children Vicario, C. M., Martino, D. & Koch, G. (2013) Temporal accuracy and variability tions with their mothers and fathers: Differences in breakdown and repair. – 28. Neuroscience 245:121 in the left and right posterior parietal cortex. Journal of Child Language 17(1):115 – 30. [GB] [GB] Townsend, D. J. & Bever, T. G. (2001) Sentence comprehension: The integration of Vogler, C., Gschwind, L., Coyne, D., Freytag, V., Milnik, A., Egli, T., Heck, A., de . MIT Press. [aMHC] habits and rules Quervain, D. J. & Papassotiropoulos, A. (2014) Substantial SNP-based herita- Treisman, A. (1964) Selective attention in man. 16. British Medical Bulletin 20:12 – Translational Psychiatry 4: bility estimates for working memory performance. [aMHC] e438. [AL] Treisman, A. & Schmidt, H. (1982) Illusory conjunctions in the perception of Vouloumanos, A. & Werker, J. F. (2007) Listening to language at birth: Evidence for 41. [aMHC] – Cognitive Psychology 14:107 objects. Developmental Science 64. Available at: – a bias for speech in neonates. 10(2):159 Tremblay, A. & Baayen, H. (2010) Holistic processing of regular four-word se- . [RM] http://doi.org/10.1111/j.1467-7687.2007.00549.x fi ed words: Evidence quences: A behavioral and ERP study of the effects of structure, frequency, and Vroomen, J. & de Gelder, B. (1999) Lexical access of resyllabi 27(3):413 – from phoneme monitoring. 21. [SCL] Memory and Cognition , ed. Perspectives on formulaic language probability on immediate free recall. In: Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M. & 67. Continuum International Publishing. [aMHC] – D. Wood, pp. 151 von der Heydt, R. (2012) A century of Gestalt psychology in visual perception: Tremblay, A., Derwing, B., Libben, G. & Westbury, C. (2011) Processing advantages fi I. Perceptual grouping and Psychological Bulletin ground organization. – gure of lexical bundles: Evidence from self-paced reading and sentence recall tasks. 217. [rNC] – 138:1172 Language Learning 61:569 613. [aMHC] – Wagers, M., Lau, E. & Phillips, C. (2009) Agreement attraction in comprehension: Trotzke, A., Bader, M. & Frazier, L. (2013) Third factors and the performance in- Representations and processes. Journal of Memory and Language 61:206 – 37. 34. [rNC] – 7:1 Biolinguistics terface in language design. [DAC] Sociolinguistic typology: Social determinants of linguistic com- Trudgill, P. (2011) fl exibility of gram- Wagner, V., Jescheniak, J. D. & Schriefers, H. (2010) On the . Oxford University Press. [aMHC, TB] plexity matical advance planning during sentence production: Effects of cognitive load Trueswell, J. C. & Tanenhaus, M. K. (1994) Towards a lexicalist framework of on multiple lexical access. Journal of Experimental Psychology: Learning, Perspectives on sentence constraint-based syntactic ambiguity resolution. In: Memory, and Cognition 36:423 40. [MCM] – processing , ed. C. Clifton, L. Frazier & K. Rayner, pp. 155 – 79. Erlbaum. , ALICE A.I. Founda- Be your own botmaster, second edition Wallace, R. S. (2005) [aMHC] tion. [rNC] Trueswell, J. C., Medina, T. N., Hafri, A. & Gleitman, L. R. (2013) Propose but Wang, F. H. & Mintz, T. H. (under revision) The limits of associative learning in Cognitive Psychol- verify: Fast mapping meets cross-situational word learning. cross-situational word learning. [FHW] – 56. [FHW] 66(1):126 ogy Wang, M. D. (1970) The role of syntactic complexity as a determiner of compre- Trueswell, J. C., Sekerina, I., Hill, N. M. & Logrip, M. L. (1999) The kindergarten- Journal of Verbal Learning and Verbal Behavior 9:398 – 404. hensibility. Cognition path effect: Studying on-line sentence processing in young children. [rNC] 73:89 134. [aMHC] – Language – Wang, W. S.-Y. (1969) Competing changes as a cause of residue. 45:9 Trueswell, J. C., Tanenhaus, M. K. & Garnsey, S. M. (1994) Semantic in fl uences on 25. [aMHC] parsing: Use of thematic role information in syntactic ambiguity resolution. Wang, W. S.-Y., ed. (1977) The lexicon in phonological change . Mouton. [aMHC, 318. [aMHC] – 33:285 Journal of Memory and Language rNC] fi Trueswell, J. C., Tanenhaus, M. K. & Kello, C. (1993) Verb-speci c constraints in Wang, W. S.-Y. & Cheng, C.-C. (1977) Implementation of phonological change: The sentence processing: Separating effects of lexical preference from garden-paths. Shaung-feng Chinese case. In: The lexicon in phonological change , ed. W. S.-Y. 19:528 Journal of Experimental Psychology: Learning, Memory, and Cognition – Wang, pp. 86 – 100. Mouton. [aMHC] 53. [aMHC] BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 71 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

72 Christiansen & Chater: The Now-or-Never bottleneck: A fundamental constraint on language References/ Warren, P. & Marslen-Wilson, W. (1987) Continuous uptake of acoustic cues in Wilson, M. & Emmorey, K. (1998) A “ word length effect ” for sign language: Further spoken word recognition. 75. [aMHC] – 41:262 Perception & Psychophysics evidence on the role of language in structuring working memory. Memory and Wanna Warren, P., Speer, S. & Schafer, A. (2003) -contraction and prosodic dis- 90. [KE] 26(3):584 Cognition – Wellington Working Papers in Linguistics ambiguation in US and NZ English. Wilson, M. & Emmorey, K. (2006) Comparing sign language and speech reveals a 15:31 49. [WO] – 17:682 universal limit on short-term memory capacity. Psychological Science – Science Warren, R. M. (1970) Perceptual restoration of missing speech sounds. 167 83. [aMHC] (3917):392 – 93. [GB, MCM] Winograd, T. (1972) Understanding natural language. 3:1 – Cognitive Psychology Warren, R. M., Obusek, C. J., Farmer, R. M. & Warren, R. P. (1969) Auditory se- 191. [aMHC] – 164:586 Science quence: Confusion of patterns other than speech or music. Wolpert, D. M., Diedrichsen, J. & Flanagan, J. R. (2011) Principles of sensorimotor 87. [aMHC, CJH, rNC] 12:739 learning. Nature Reviews Neuroscience – 51. [aMHC, AB] Warren, R. M. & Sherman, G. L. (1974) Phonemic restorations based on subsequent Wolpert, D. M., Ghahramani, Z. & Flanagan, J. R. (2001) Perspectives and problems context. 16:150 – 56. [MCM] Perception and Psychophysics 94. [AB] – 5:487 Trends in Cognitive Sciences in motor learning. De- Wasow, T. & Arnold, J. (2003) Post-verbal constituent ordering in English. In: Wundt, W. (1904) The psychology of the sentence. In: Language and psychology: terminants of grammatical variation in English , ed. G. Rohdenburg & B. 32. Wiley. Historical aspects of psycholinguistics , ed. A. L. Blumenthal, pp. 9 – 54. Mouton de Gruyter. [aMHC] Mondorf, pp. 119 – [DAC] Watanabe, T., Náñez, J. E. & Sasaki, Y. (2001) Perceptual learning without per- fi Xanthos, A., Lahaa, S., Gillis, S., Stefany, U., Aksu-Koc, A., Christo dou, A., ception. Nature 413:844 48. [ADE] – Gagarina, N., Hrzica, G., Nihan Ketrez, F., Kilani-Schoch, M., Korecky-Kroll, Waters, A. J., Gobet, F. & Leyden, G. (2002) Visuospatial abilities of chess players. K., Kovacevic, M., Laalo, K., Palmovic, M., Pfeiler, B., Voeikova, M. D. & British Journal of Psychology – 65. [rNC] 93(4):557 Dressler, W. U. (2012) On the role of morphological richness in the early de- Waxman, S. R. & Gelman, S. A. (2009) Early word-learning entails reference, not 79. [rNC] – 31:461 First Language ection. fl velopment of noun and verb in Trends in Cognitive Sciences merely associations. 63. [FHW] – 13(6):258 . Oxford University Knowledge and learning in natural language Yang, C. (2002) Weber-Fox, C. M. & Neville, H. J. (1996) Maturational constraints on functional Press. [aMHC] specializations for language processing: ERP and behavioral evidence in bilin- Proceedings of the National Yang, C. (2013) Ontogeny and phylogeny of language. Journal of Cognitive Neuroscience 56. [RM] – gual speakers. 8(3):231 – 110:6324 27. [aMHC] Academy of Sciences Approaches to bootstrapping: Phonological, Weissenborn, J. & Höhle, B., eds. (2001) The art of memory. Yates, F. (1966) Routledge & Kegan Paul. [MLD] . lexical, syntactic and neurophysiological aspects of early language acquisition Yonata, L. (1999) Early metalinguistic competence: Speech monitoring and repair John Benjamins. [aMHC] behavior. 34. [GB] – 35(3):822 Developmental Psychology Werker, J. F., Yeung, H. H. & Yoshida, K. A. (2012) How do infants become experts at Yoshida, K. A., Pons, F., Maye, J. & Werker, J. F. (2010) Distributional phonetic 15(4):420 – 33. Available at: http://doi.org/ – 21(4):221 Current Directions in Psychological Science native-speech perception? Infancy learning at 10 months of age. . [RM] 10.1111/j.1532-7078.2009.00024.x 26. Available at: http://doi.org/10.1177/0963721412449459 . [RM, rNC] Yu, C. & Smith, L. B. (2007) Rapid word learning under uncertainty via cross-sit- Wexler, K. (2002) Lenneberg ’ s dream: Learning, normal language development and – 20. [FHW] uational statistics. Psychological Science 18(5):414 speci Language competence across populations: c language impairment. In: fi Yu, C., Smith, L. B., Klein, K. & Shiffrin, R. M. (2007) Hypothesis testing and as- Towards a de , ed. J. Schaffer & Y. Levy, nition of speci fi c language impairment fi sociative learning in cross-situational word learning: Are they one and the same? pp. 11 60. Erlbaum. [DPM] – Proceedings of the 29th Annual Conference of the Cognitive Science Society, In: Wicha, N. Y. Y., Moreno, E. M. & Kutas, M. (2004) Anticipating words and their , pp. 737 Nashville, TN, August 2007 42, ed. D. S. McNamara & J. G. Trafton. – gender: An event-related brain potential study of semantic integration, gender Cognitive Science Society. [FHW] expectancy, and gender agreement in Spanish sentence reading. Journal of Zhang, Y. & Clark, S. (2011) Syntactic processing using the generalized perceptron Cognitive Neuroscience 16:1272 – 88. [aMHC] 37(1):105 – 51. [CG-R] and beam search. Computational Linguistics Wiener, M., Turkeltaub, P. & Coslett, H. B. (2010) The image of time: A voxel-wise meta-analysis. 49(2):1728 – 40. [GB] NeuroImage Zhu, L., Chen, Y., Torrable, A., Freeman, W. & Yuille, A. L. (2010) Part and ap- Wilbur, R. B. & Nolkn, S. B. (1986) The duration of syllables in American Sign pearance sharing: Recursive compositional models for multi-view multi-object 80. [aMHC] – 29:263 Language and Speech Language. IEEE Computer Society Conference on Computer Vision and detection. In: Wilson, C. (2006) Learning phonology with substantive bias: An experimental and Pattern Recognition (CVPRW 2010), San Francisco, CA, June 2010 , Institute of Cognitive Science 30:945 – 82. [MLL] computational study of velar palatalization. Electrical and Electronics Engineers (IEEE). [aMHC] ” in working “ phonological loop Wilson, M. & Emmorey, K. (1997) A visual-spatial Zipf, G. (1936) The psychobiology of language . Routledge. [MLL] Memory and Cognition memory: Evidence from American Sign Language. Zipf, G. K. (1949) Human behavior and the principle of least effort . Addison- 20. [KE] – 25(3):313 Wesley. [aMHC, MLL] BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 72 http://journals.cambridge.org Downloaded: 03 Jun 2016 IP address: 92.14.75.14

Related documents

RIE Tenant List By Docket Number

RIE Tenant List By Docket Number

SCRIE TENANTS LIST ~ By Docket Number ~ Borough of Bronx SCRIE in the last year; it includes tenants that have a lease expiration date equal or who have received • This report displays information on ...

More info »
07 5123 06 zigbee cluster library specification

07 5123 06 zigbee cluster library specification

ZigBee Cluster Library – 075123 Document ZigBee Cluster Library Specification Revision 6 Draft Version 1.0 - 0125 Chapter Document: 14 5123 - 07 ZigBee Document: 06 - ZigBee Document 07 - 5123 201 4 J...

More info »
CityNT2019TentRoll 1

CityNT2019TentRoll 1

STATE OF NEW YORK 2 0 1 9 T E N T A T I V E A S S E S S M E N T R O L L PAGE 1 VALUATION DATE-JUL 01, 2018 COUNTY - Niagara T A X A B L E SECTION OF THE ROLL - 1 CITY - North Tonawanda TAX MAP NUMBER ...

More info »
Out of Reach 2016

Out of Reach 2016

No Refuge for Low Income Renters MADE POSSIBLE BY THE GENEROSITY OF:

More info »
mar19 medpac entirereport sec

mar19 medpac entirereport sec

MARCH 2019 Report to the Congress: Medicare Payment Policy REPOR G RESS T TO THE CON Medicare Payment Policy | March 2019 Washington, DC 20001 425 I Street, NW • Suite 701 • (202) 220-3700 • Fax: (202...

More info »
18UW ALICE Report LA   USED FOR ISSUU

18UW ALICE Report LA USED FOR ISSUU

ALICE: A STUDY OF 2018 FINANCIAL HARDSHIP REPORT IN LOUISIANA ® L imited, is an acronym for A sset ALICE I ncome C onstrained, E mployed. The United Way ALICE Project is a collaboration of United Ways...

More info »
MER United States 2016

MER United States 2016

Anti-money laundering and counter-terrorist financing measures United States Mutual Evaluation Report December 2016

More info »
me bpd eng

me bpd eng

2017–18 Estimates Parts I and II The Government Expenditure Plan and Main Estimates ESTIMATES ESTIMATES

More info »
HTACoreModel3.0 1

HTACoreModel3.0 1

netHTA JA2 EU DELIVERABLE 8 WP HTA Core Model Version 3.0 ssessment of the f ull for a Diagnostic Technologies, Medical and Surgical Interventions, Pharmaceutical s and Screening Technologies 2015) ha...

More info »
World Report 2018 Book

World Report 2018 Book

A M U H N S T H G I R C W A T H D R E P O O R T| 2 0 1 8 W R L S 7 0 2 O 1 T N E V E F

More info »
Child Maltreatment 2016

Child Maltreatment 2016

Child Maltreatment 2016 th th Y Y 25 27 G G E E N N A A I I R R T T O O R R F F O O P P R R E E U.S. Department of Health & Human Services Administration for Children and Families Administration on Ch...

More info »
ERP 2019

ERP 2019

Economic Report of the President Together with The Annual Report of the Council of Economic Advisers M a rc h 2019

More info »
Thriving on Our Changing Planet: A Decadal Strategy for Earth Observation from Space

Thriving on Our Changing Planet: A Decadal Strategy for Earth Observation from Space

TIONAL ACADEMIES PRESS THE NA This PDF is available at http://nap.edu/24938 SHARE     Thriving on Our Changing Planet: A Decadal Strategy for Earth Observation from Space DET AILS 700 pages | 8.5 ...

More info »
17 8652 GSR2018 FullReport web final

17 8652 GSR2018 FullReport web final

RENE WA BL E S 2018 GLOBAL STATUS REPORT A comprehensive annual overview of the state of renewable energy. 2018

More info »
StateoftheClimate2017 lowres

StateoftheClimate2017 lowres

STATE OF THE CLIMATE I N 2017 Special Supplement to the Bulletin of the American Meteorological Society Vol. 99, No. 8, August 2018

More info »
Measuring the Information Society Report

Measuring the Information Society Report

Internati onal Measuring Telecommunicati on Union the Information Place des Nati ons CH-1211 Geneva 20 Switzerland Society Report ISBN: 978-92-61-21431-9 2016 6 4 0 4 3 3 9 6 1 2 1 4 2 1 9 7 9 8 Print...

More info »
StateoftheClimate2015 lowres

StateoftheClimate2015 lowres

STATE OF THE CLIMATE I N 2015 Special Supplement to the Bulletin of the American Meteorological Society Vol. 97, No. 8, August 2016

More info »
jun18 medpacreporttocongress sec

jun18 medpacreporttocongress sec

JUNE 2018 Report to the Congress: Medicare and the Health Care Delivery System T TO THE CON REPOR G RESS Medicare and the Health Care Delivery System | June 2018 Washington, DC 20001 Suite 701 • 425 I...

More info »
Out of Reach 2018

Out of Reach 2018

2018 of OUT REACH THE HIGH COST OF HOUSING MADE POSSIBLE BY THE GENEROSITY OF:

More info »
PCI DSS v3 2 ROC Reporting Template

PCI DSS v3 2 ROC Reporting Template

Payment Card Industry (PCI) Data Security Standard Report on Compliance PCI DSS v3.2 Report on Template for Compliance Revision 1.0 April 2016

More info »