Testing What Has Been Taught: Helpful, High Quality Assessments Start with a Strong Curriculum, by Laura Hamilton, American Educator, Winter 2010 11, Vol. 34, No. 4, AFT

Transcript

1 Testing What Has Been Taught Helpful, High-Quality Assessments Start with a Strong Curriculum hold unrealistic expectations for what these assessments can and By Laura S. Hamilton cannot do. - In light of the recently developed Common Core State Stan n recent years, standardized, large-scale tests of student dards and the ongoing work to develop assessments aligned to achievement have been given a central role in federal, state, those standards, now is a good time to pause and consider our and local efforts to improve K–12 education. Despite the state and federal assessment policies. If we are to actually improve widespread enthusiasm for assessment-based reforms, I - schools, researchers and policymakers must address a few essen many of the current and proposed uses of large-scale assess - tial questions: How many purposes can one assessment serve? ments are based on unverified assumptions about the extent to Can assessments meaningfully be aligned to standards, or is which they will actually lead to improved teaching and learning, something more detailed, like a curriculum, necessary to guide and insufficient attention has been paid to the characteristics of both teachers and assessment developers? What would the key - assessment programs that are likely to promote desired out features of an assessment system designed to increase student comes. Moreover, advocates of assessment-based reform often learning and improve instruction be? While current assessment knowledge is not sufficient to fully answer these questions, in this Laura S. Hamilton is a senior behavioral scientist with the RAND Corpo- - article I offer an overview of what is known and several sugges ration and an adjunct associate professor in the University of Pittsburgh’s tions for improving our approach to assessment. Learning Sciences and Policy program. She has directed several large studies, including an investigation of the implementation of standards- based accountability in response to No Child Left Behind. She is currently Purposes of Assessment working with the National Center on Performance Incentives to investi- Large-scale assessments of student achievement are currently gate teachers’ responses to pay-for-performance programs, and she serves being used to serve a number of purposes in K–12 education. on the committee that is revising the Standards for Educational and Psy- Broadly speaking, these purposes can be described as focusing chological Testing. ILLUSTRATIONS BY PAUL ZWOLAK 47 AMERICAN EDUCATOR | WINTER 2010–2011

2 this is not the only form, and may not even be the most common. on providing information, imposing accountability, or some Reallocation also takes the form of increases in time spent engaging combination of the two. Increasingly, policymakers and others in instructional activities that are directed toward what is tested and are placing multiple demands on large-scale testing programs how it is tested—such as focusing on short reading passages with to serve a wide variety of information and accountability pur - closed-ended comprehension questions—and decreases in time poses, and to inform decision making and induce change at spent on activities that are not tested—such as reading novels or different levels of the education system. Unfortunately, tests are writing extended essays. Because most large-scale tests rely on seldom designed to address multiple purposes at once. Policy - multiple-choice items or other formats that tend to emphasize makers and the public must recognize that when a test designed discrete skills and knowledge rather than complex, extended prob - for one purpose (e.g., to identify students’ strengths and weak - lems, reallocation is likely to reduce the amount of class time and nesses in algebra) is used for another purpose (e.g., to decide resources devoted to these more complex skills and processes.* which students will be promoted to ninth grade or which teach - Reallocation is often thought of as something teachers do, but ers will receive bonuses), the resulting test scores may not pro - the decisions that lead to reallocation are often made at higher vide valid information for both purposes. The use of the test to levels of the education system. Teachers make decisions for purposes other than report drawing on a variety of instructional those for which it was validated is gener - 1 resources (such as curriculum and pacing ally unwarranted. - guides, test-preparation materials, profes sional development, and mandatory interim assessments), and school, district, and state Research indicates that administrators often design these resources 5 to emphasize tested content. Worse, these teachers and other staff resources are not always well aligned or reallocate time and designed in ways that promote high-quality instruction. For example, while some teachers resources toward tested - have access to high-quality formative assess ment systems that are linked to their local content. curricula and provide clear guidance for next steps, others obtain their interim data from mandatory assessments that do not provide - Efforts to validate large-scale assess formative feedback and may not be well ments are not able to keep pace with the aligned with what they are teaching. public policies expanding their use. The key lesson of all this research is that Though many policymakers are not heed - what is tested influences what is taught, in significant and some - - ing researchers’ warnings, there is evidence that most such assess . For example, one well- times unexpected, problematic ways any of their purposes adequately. At the ments may not be serving documented problem is score inflation. Scores on high-stakes classroom level, teachers tend to find that most accountability- tests tend to increase much more rapidly than scores on low- or - focused tests are less useful than other information (such as home no-stakes tests, as educators alter their instruction to better pre- work, teacher-developed tests, or classroom observations) for pare students for the high-stakes test. Some of these score informing instruction. In addition, the attachment of high stakes increases are legitimate and welcomed; some are the result of to existing tests has led to unintended and probably undesirable anything from drilling in test-taking strategies to outright cheat- consequences (discussed below). ing. The term “score inflation” refers to any score increase that is The Effects of High-Stakes Testing not caused by an increase in students’ learning of the skills and knowledge that the test is intended to measure. Because much of today’s policy debate focuses on externally Since at least the 1980s, one popular “solution” to the some - mandated assessments for use as tools of accountability, we can times negative influence of testing on teaching has been calls for apply lessons learned from the past few decades, when account- “tests worth teaching to,” based on the notion that if tests were of ability testing became nearly ubiquitous in public K–12 education. - high quality and measured complex skills and process, instruc - In brief, research (conducted by various individuals and organiza - tion would follow suit. This idea resulted in the wave of perfor tions across numerous districts, states, and nations) indicates that mance-based assessments in the 1990s. Evidence from some teachers and other school and district staff reallocate resources states’ performance-based assessment programs suggests that (including time) toward tested content and away from untested 2 these assessments can lead to some of the desired outcomes, content. This reallocation occurs across subjects, across topics 6 such as increased emphasis on problem solving, within subjects, and even across students when the performance but for the most of some students counts more than that of others for account - part these efforts have failed to lead to fundamental changes in 7 ability purposes (e.g., some schools have provided extra help to how teachers deliver instruction. Most states have backed away 3 students just below the cut score for proficient). It is worth pointing out that the findings regarding reallocation in response to * The form of resource reallocation that has probably generated high-stakes performance measures are not limited to education. They have been the most concern is the excessive emphasis on test-taking skills; it observed in sectors as varied as health care, transportation, and emergency 4 consumes time that should be spent teaching content. However, preparedness. AMERICAN EDUCATOR | WINTER 2010–2011 48

3 Building a Better Assessment System from performance-based assessment because of costs and tech- - nical problems (e.g., states that implemented portfolio assess There is no research evidence to tell us definitively how to build ments found that scoring tended to be inconsistent and an assessment system that will promote student learning and be 8 expensive ). Moreover, evidence suggests that simply adopting resistant to the negative consequences that are common in high- performance-based assessment does not eliminate the problems stakes testing programs. One promising approach is to start with 9 of narrowing what is taught or score inflation. Although some - a detailed, coherent curriculum that is aligned with rigorous con have claimed that the Advanced Placement (AP) and Interna - - tent standards, and then build an assessment system that mea tional Baccalaureate (IB) programs might be considered success - sures the skills and knowledge emphasized in the curriculum. (Of ful implementations of the idea of tests worth teaching to, both course, using curriculum to guide assessment development of those programs’ exams are aligned to well-defined course would require a more consistent curriculum policy than currently content. So, while their tests are generally high in quality and exists in our states, a topic discussed throughout this issue of doing well on these tests is a legitimate goal of AP and IB courses, .) While it’s inevitable that assessment will American Educator the key to these programs appears to be well-aligned instruc - continue to drive instructional decisions, the less desirable con- tional materials and assessments—not assessments alone. sequences may be mitigated by providing educators with a high- This brings us to another popular “solution”: standards. A number of factors have contributed to the appeal of standards- based teaching. One of these may have been the negative influ - ence of high-stakes testing as a result of the minimum-competency While assessment will continue to drive testing movement. Standards may have seemed like a logical way to counter the narrowing of the curriculum and emphasis on instruction, the consequences may be lower-order, tested skills and content. However, efforts to promote more cognitively demanding instruction by building complex mitigated by providing educators a skills and knowledge into state or district content standards have high-quality curriculum and supports been thwarted by the very tests used to assess those standards. Most states claim that their assessments are aligned with their like sample lesson plans and time to standards, but these ostensibly aligned tests often sample only a † subset of the standards, with disproportionate emphasis on the confer with colleagues. 10 lower-level content that is easier to test. Because standards and high-stakes tests are not fully aligned, educators understandably - tend to rely more on the tests than on the standards for instruc 11 quality curriculum and a set of supports like sample lesson plans tional guidance. and quizzes, ongoing professional development, and more time After 20 years of trying to align standards and tests, it is time to to confer with colleagues. Ensuring that all the components are question whether this is even possible—at least in a meaningful well aligned should give teachers confidence that if they teach the way. Most standards are not highly specific or detailed. Typically, - curriculum effectively, the result will be improved student learn - they are broad outcome statements that are wide open to inter ing as measured by the assessments. pretation. Assessments, however, are highly specific and detailed. - The tendency to engage in practices that narrow the curricu Herein lies the problem with assessments aligned to standards: a lum and cause score inflation stems in large part from a belief teacher may faithfully and effectively teach to the standards all among educators that delivering the entire existing curriculum year and her students may learn a great deal, but her students may (or standards, in districts and schools that do not have a curricu- still do poorly on the test simply because the teacher and the test lum) will not ensure adequate coverage of the tested material. developer interpreted the standards differently. A curriculum, by Teachers and principals understand that many aspects of their specifying what knowledge and skills to teach and to test, could curricula/standards are not included on the accountability tests reduce the severity of this problem. and that some of the tested material is not included in the curri- 12 cula/standards (at least for that grade level). A better-aligned learly, assessment-based reforms (1) have not fully system, modeled in part after the AP and IB programs (combined - achieved policymakers’ goals, and (2) have led to unin with some of the other suggestions discussed below), might help tended consequences. These findings raise concerns to assuage teachers’ concerns about coverage and enable them about the extent to which assessment can be viewed as C a means for improving educational outcomes. At the same time, to worry less about what is likely to be on the test. This idea is not inconsistent with earlier notions of standards- assessment clearly plays an important role in providing informa - 13 based reform, tion that helps teachers and other educators improve. Moreover, which advocated for alignment among not just because testing affects what is taught, assessment has the poten- - standards and assessment, but standards, assessment, curricu if it is designed tial to contribute to positive educational change lum, and professional development. Many advocates of stan - and implemented appropriately. dards-based reform argued that standards should drive the the curriculum and the assessments. While development of both † Another problem is the low quality of the standards themselves, which tend to be this makes sense in theory, in practice most standards are not either too vague to guide instruction or too detailed to be covered in one school year. written at a level of specificity that promotes the development of For more on the problems with most states’ standards, see the Spring 2008 issue of 14 aligned curricula or assessments. To date, no state has even , available at www.aft.org/newspubs/periodicals/ae/spring2008. American Educator 49 AMERICAN EDUCATOR | WINTER 2010–2011

4 as a Reading Test There’s No Such Thing poor reader. You merely lack the domain- how to improve it, and how to test it. ., AND D. H E. By IRSCH, JR specific vocabulary and knowledge of Reading, like riding a bike, is typically ONDISCIO R OBERT P baseball needed to fill in the gaps. Even thought of as a skill we acquire as simple texts, like those on reading tests, children and generally never lose. When It is among the most common of night- are riddled with gaps—domain knowledge you think about your ability to read—if mares. You dream of taking a test for and vocabulary that the writer assumes the you think about it at all—the chances are which you are completely unprepared— reader knows. good that you perceive it as not just a you’ve never studied the material or even Think of reading as a two-lock box, skill, but a readily transferable skill. Once attended the course. For millions of Ameri- requiring two keys to open. The first key is you learn how to read, you can compe- can schoolchildren, it is a nightmare from decoding skills. The second key is vocabu- tently read a novel, a newspaper article, which they cannot wake, a trial visited lary sufficient to understand what is being or the latest memo from your bank. upon them each year when the law decoded. Reading comprehension tests are Reading is reading is reading. Either you requires them to take reading tests with basically vocabulary tests. The verbal can do it, or you cannot. little preparation. Sure, formally preparing portion of the SAT is essentially a vocabu- As explained in the articles on pages 3 for reading tests has become more than lary test. The verbal section of the Armed and 30, this view of reading is only just a ritual for schools. It is practically Forces Qualification Test—which predicts partially correct. The ability to translate their raison d’être! Yet students are not income level, job performance, and much written symbols into sounds, commonly prepared in the way they need to be. else—is chiefly a vocabulary test. So, to lift called “decoding,” is indeed a skill that can Schools and teachers may indeed be us out of our low performance compared be taught and mastered. This explains why making a Herculean effort to raise reading with other nations, narrow the achieve- you are able to “read” nonsense words scores, but for the most part these efforts ment gap between groups, and offer such as “rigfap” or “churbit.” But to be do little to improve reading achievement low-income students a way out of poverty, fully literate is to have the communicative and prepare children for college, a career, all we need to do is greatly increase power of language at your command—to and a lifetime of productive, engaged students’ vocabularies. That’s it. read, write, listen, and speak with citizenship. This wasted effort is not Sounds great, but it is misleadingly . understanding because our teachers are of low quality. facile, since vocabulary size is increased Cognitive scientists describe compre- Rather, too many of our schools have only trivially by explicit word study, and hension as domain specific. If a baseball fundamental misconceptions about most word learning is slow and impercep- fan reads “A-Rod hit into a 6-4-3 double reading comprehension—how it works, tible. But, as Marilyn Jager Adams has play to end the game,” he needs not shown (see page 3), it is much faster when another word to understand that the New E. D. Hirsch, Jr., is the founder of the Core Knowledge teachers stay on a topic long enough to York Yankees lost when Alex Rodriguez Foundation and professor emeritus at the University of inculcate new knowledge, thereby creating came up to bat with a man on first base The Making of Virginia. His most recent book is a familiar context for learning new words. and one out and then hit a ground ball to . Robert Americans: Democracy and Our Schools As a result, the only road to a large the shortstop, who threw to the second Pondiscio is a former fifth-grade teacher; he writes vocabulary is the gradual, cumulative baseman, who relayed to first in time to . Adapted The Core Knowledge Blog about education at acquisition of knowledge . Our minds are catch Rodriguez for the final out. If you’ve with permission from E. D. Hirsch, Jr., and Robert so formed that we can rarely know things never heard of A-Rod or a 6-4-3 double Pondiscio, “There’s No Such Thing as a Reading Test,” without knowing the words for them, nor play and cannot reconstruct the game American Prospect 21, no. 6 (June 2010), can we know words without knowing the situation in your mind’s eye, you are not a www.prospect.org. All rights reserved. assessment-design features is limited, so any effort that relies - developed a statewide curriculum, much less based its assess heavily on assessment as a tool for school improvement should ment on a curriculum. be carried out with caution. Nonetheless, it is worth reviewing Even if a superb curriculum and well-aligned, high-quality what is known and looks promising. Here are four approaches to assessment had been developed, our work would not be done. designing assessment and accountability policies that are likely A sound accountability policy requires multiple sources of infor - to support school improvement. mation and supports: not all of the outcomes that we want First, an accountability system that is designed to reward or schools to promote can be measured easily or cheaply through penalize districts, schools, or individuals on the basis of their large-scale assessments, and not all desired changes can be performance should not rely exclusively on tests. Although there induced through improvements in assessment alone. Decision is extensive research being conducted to guide improvements in makers who understand the strong influence that high-stakes large-scale testing, it is likely that society will continue to expect tests exert may, understandably, wish to rely heavily on assess - schools to promote outcomes (like critical thinking and respon- ment as a means to promote school improvement. For assessment sible citizenship) that cannot be measured well using tests. In to serve this role effectively, it must be designed in a way that addition, even if the perfect assessments could be designed, it is supports rather than detracts from teachers’ efforts to engage in not realistic to expect that it would be practical or desirable to high-quality instruction. Research on the effects of various AMERICAN EDUCATOR | WINTER 2010–2011 50

5 colony of New Amsterdam. He does not know what a custom is; nor does he know who the Dutch were, or even what a colony is. He has never heard of Amster- dam, old or new. Certainly it has never come up in class. Without relevant vocabulary and knowledge, he struggles. Extra drilling in comprehension strategies would not help—he needs someone to teach him about New Amsterdam. His low score comes in and the fi nger- pointing that plagues American education begins. But do not blame the tests. Taxpayers are entitled to know if the schools they support are any good, and reading tests, all things considered, are quite reliable. Do not blame the test would be taken from topics specifi ed in attributes of the things referred to. So writers. Since no state has adopted a the core curriculum in other subjects. there’s just one reliable way to increase the common core curriculum, they have no The benefi ts of such curriculum-based vocabulary size of all students in a class: idea what topics are being taught in reading tests would be many: Tests would offer them a coherent, cumulative school; their job is done when tests show be fairer and offer a better refl ection of education starting in the earliest years (i.e., certain technical characteristics. It is unfair how well a student had learned the no later than kindergarten). to blame teachers, because they are mainly particular year’s curriculum. Tests would we test our children’s reading Today, operating to the best of their abilities also exhibit “consequential validity,” ability without regard to whether we have using the ineffective methods in which meaning they would actually improve given them the vocabulary and knowledge they were trained. And let’s not blame the education. Instead of wasting hours on they need to be successful . Consider a parents of our struggling young man in mind-numbing test prep and reading- reasonable, simple, even elegant alterna- the South Bronx. Is it unreasonable for strategy lessons of limited value, the best tive: tying the content of reading tests to them to assume that a child who dutifully test-preparation strategy would be specifi c curricular content. Here’s how it goes to school every day will gain access to learning the material in the curriculum. would work. Let’s say a state (or the the same rich, enabling vocabulary and By contrast, let’s imagine what it is like nation) adopted a specifi c, content-rich, knowledge that more affl uent children to be a fourth-grade boy in a struggling grade-by-grade core curriculum. And let’s take for granted? This boy’s parents did South Bronx elementary school, sitting for say the fourth-grade science curriculum not decide to minimize social studies and a high-stakes reading test. Because his included the circulatory system, atoms and science instruction, thereby minimizing the school has large numbers of students below molecules, electricity, and the earth’s chances that he would have the vocabulary grade level, it has drastically cut back on geologic layers and weather. The reading and knowledge needed to comprehend science, social studies, art, music—even gym test should include not just the fi ction and the passages on the reading test. and recess—to focus on reading and math. poetry that were part of the English Teaching skills, vocabulary, and He has spent much of the year practicing language arts curriculum, but also knowledge is what schools are supposed reading-comprehension strategies. nonfi ction readings on the specifi c science to do. The only unreasonable thing is our The test begins, and the very fi rst topics addressed in the science curriculum. refusal to see reading for what it really is, passage concerns the customs of the Dutch And other passages on the reading test ☐ and to teach and test accordingly. indicators, such as peer and administrator observations and spend the time and money required to administer tests represent- critiques of instruction, is that they might serve a more useful ing the full range of outcomes of interest. Accountability systems professional development function than test scores have, by pro- could supplement tests with non-test-based indicators of pro- viding teachers with clear, constructive feedback on their teach- cesses or outcomes, such as college-preparatory course taking, ing. But if new measures (or rubrics) are used for both professional high school and college graduation rates, and apprenticeship development and accountability purposes, investigations need completion rates. And, these systems could be designed in con- to be designed to examine the validity of scores from those mea- cert with current efforts by several teams of researchers and sures in light of each of those purposes, as well as the conse- practitioners to develop improved test and nontest measures of quences that arise. Some problems, such as the tendency to focus teaching quality. When we look beyond tests alone to meet our on what is measured at the expense of what is not measured, are information and accountability needs, a wide range of better unlikely to be eliminated completely, so it will be important to options become available. monitor for undesirable consequences and modify the system as Of course, any supplemental measure should be evaluated necessary to address them. using the same criteria for validity and reliability that are applied Second, for assessment and accountability to be useful, poli- to test-based measures, and unintended consequences should cymakers must consider ways to improve the quality of informa- be identifi ed and addressed. One potential advantage of nontest 51 AMERICAN EDUCATOR | WINTER 2010–2011

6 tion from the tests themselves, and to mitigate the expected - espite these challenges (and the dozens of more tech negative effects of using tests for high-stakes purposes. In particu- nical challenges that I have not addressed), it is likely lar, designers of testing programs should take steps to reduce the that test-based accountability will be with us for some time. No doubt the policymakers who enthusiastically likelihood of curriculum narrowing and score inflation. As men- D support such accountability are truly committed to school tioned above, basing the test on a detailed curriculum instead of broad standards will probably help. Another promising approach improvement—so they ought to see that heeding educators’ and researchers’ concerns about the purposes, meaningful uses, and - is to design tests to minimize predictability from one administra tion to another, so that focusing instruction on particular item technical limits of assessments is worthwhile. Working together, we can develop a program of large-scale assessment that formats or styles will not be viewed as likely means to raising addresses the information needs of educators, particularly at the scores. A single test administered at one point in time can sample only a fraction of the material in the curriculum, so varying this classroom level, while also contributing to improved account - material over time, along with the types of items designed to mea - ability policies. ☐ Endnotes When we look beyond tests alone 1. American Educational Research Association, American Psychological Association, and Standards for Educational and Psychological National Council on Measurement in Education, Testing (Washington, DC: American Psychological Association, 1999); and Michael T. Kane, to meet our information and “Validation,” in Educational Measurement , 4th ed., ed. Robert L. Brennan (Westport, CT: American Council on Education/Praeger, 2006), 17–64. accountability needs, a wide range 2. For reviews of relevant literature, see Laura S. Hamilton, “Assessment as a Policy Tool,” Review of Research in Education 27, no. 1 (2003): 25–68; Jane Hannaway and Laura S. Hamilton, Accountability Policies: Implications for School and Classroom Practices of better options become available. (Washington, DC: Urban Institute, 2008); and Brian M. Stecher, “Consequences of Large-Scale, High-Stakes Testing on School and Classroom Practice,” in Making Sense of Test-Based Accountability in Education , ed. Laura S. Hamilton, Brian M. Stecher, and Stephen P. Klein (Santa Monica, CA: RAND, 2002). 3. See, for example, Jennifer Booher-Jennings, “Below the Bubble: ‘Educational Triage’ and the Texas Accountability System,” American Educational Research Journal 42, no. 2 (2005): 231–268. 4. Brian M. Stecher, Frank Camm, Cheryl L. Damberg, Laura S. Hamilton, Kathleen J. Mullen, Christopher Nelson, Paul Sorensen, Martin Wachs, Allison Yoh, and Gail L. Zellman, Toward a Culture of Consequences: Performance-Based Accountability Systems for Public Services (Santa Monica, CA: RAND, 2010). 5. Laura S. Hamilton, Brian M. Stecher, Jennifer Lin Russell, Julie A. Marsh, and Jeremy Miles, “Accountability and Teaching Practices: School-Level Actions and Teacher Responses,” in Strong States, Weak Schools: The Benefits and Dilemmas of Centralized Accountability , ed. Research in Sociology of Bruce Fuller, Melissa K. Henne, and Emily Hannum, vol. 16, Education (St. Louis, MO: Emerald Group Publishing, 2008), 31–66; and Brian M. Stecher, Scott Epstein, Laura S. Hamilton, Julie A. Marsh, Abby Robyn, Jennifer Sloan McCombs, sure it, should result in reduced curriculum narrowing and score Jennifer Russell, and Scott Naftel, Pain and Gain: Implementing No Child Left Behind in Three States, 2004–2006 (Santa Monica, CA: RAND, 2008). if teachers had a high-quality curriculum and inflation. In short, 6. Suzanne Lane, Carol S. Parke, and Clement A. Stone, “The Impact of a State Perfor- supporting materials at hand, and if the test were well-aligned but mance-Based Assessment and Accountability Program on Mathematics Instruction and Student Learning: Evidence from Survey Data and School Performance,” Educational unpredictable, then teachers would probably just focus on helping 8, no. 4 (2002): 279–315. Assessment - all students master the skills and knowledge specified in the cur 7. John B. Diamond, “Where the Rubber Meets the Road: Rethinking the Connection 80, Sociology of Education between High-Stakes Testing Policy and Classroom Instruction,” riculum . Of course, the problem of testing higher-order knowl - no. 4 (2007): 285–313; and William A. Firestone, David Mayrowetz, and Janet Fairman, edge and skills would remain, but in the near future technology “Performance-Based Assessment and Instructional Change: The Effects of Testing in Maine Educational Evaluation and Policy Analysis and Maryland,” 20, no. 2 (1998): 95–113. may offer new opportunities to design cost-effective and high- 15 8. See Daniel Koretz, Brian M. Stecher, Stephen P. Klein, and Daniel McCaffrey, “The quality performance-based measures. Vermont Portfolio Assessment Program: Findings and Implications,” Educational Third, any accountability system that seeks to support Measurement: Issues and Practice 13, no. 3 (1994): 5–16. The Validity of Gains in Scores on the Kentucky 9. Daniel Koretz and Sheila Barron, - instructional improvement ought to include a high-quality for (Santa Monica, CA: RAND, 1998). Instructional Results Information System (KIRIS) mative assessment system—one that is aligned with the curricu - Benchmark- 10. Robert Rothman, Jean B. Slattery, Jennifer L. Vranek, and Lauren B. Resnick, lum and provides clear instructional guidance rather than (Los Angeles: National Center for Research on ing and Alignment of Standards and Testing Evaluation, Standards, and Student Testing, 2002). 16 simply predicting students’ scores on the state test. But the 11. Brian M. Stecher and Tammi Chun, School and Classroom Practices during Two Years of assessment itself is just the beginning. The results must be acces - (Los Angeles: National Center for Research on Education Reform in Washington State Evaluation, Standards, and Student Testing, 2001). sible and available in a way that facilitates effective day-to-day Pain and Gain 12. Stecher et al., . use to guide instruction and be accompanied by ongoing profes- 13. Marshall S. Smith and Jennifer O’Day, “Systemic School Reform,” in The Politics of sional development. Curriculum and Testing: The 1990 Yearbook of the Politics of Education Association , ed. Susan H. Fuhrman and Betty Malen (New York: Falmer Press, 1991), 233–267. Finally, a number of other considerations need to be addressed 14. Heidi Glidden, “Common Ground: Clear, Specific Content Holds Teaching, Texts, and when designing the testing components of an accountability 32, no. 1 (Spring 2008): 13–19. American Educator Tests Together,” policy, such as whether to focus the system on student or educator 15. Edys S. Quellmalz and James W. Pellegrino, “Technology and Testing,” Science 323, no. performance, on individual or group performance, on current 5910 (2009): 75–79; and Bill Tucker, Beyond the Bubble: Technology and the Future of Student Assessment (Washington, DC: Education Sector, 2009), 1–9. achievement or growth, and on fixed targets or participant rank- 16. Marianne Perie, Scott Marion, Brian Gong, and Judy Wurtzel, The Role of Interim 17 ings. These need not be such stark tradeoffs, but they do need to Assessments in a Comprehensive Assessment System: A Policy Brief (Washington, DC: Achieve, Aspen Institute, and National Center for the Improvement of Educational be considered. Many policymakers seem to want to say “All of the Assessment, 2007). above,” but such an unfocused and unwieldy accountability sys- 17. See, for example, Michael J. Podgursky and Matthew G. Springer, “Teacher Performance tem would be very unlikely to promote school improvement. 26, no. 4 (2007): 909–950. Journal of Policy Analysis and Management Pay: A Review,” AMERICAN EDUCATOR | WINTER 2010–2011 52

Related documents

CityNT2019TentRoll 1

CityNT2019TentRoll 1

STATE OF NEW YORK 2 0 1 9 T E N T A T I V E A S S E S S M E N T R O L L PAGE 1 VALUATION DATE-JUL 01, 2018 COUNTY - Niagara T A X A B L E SECTION OF THE ROLL - 1 CITY - North Tonawanda TAX MAP NUMBER ...

More info »
doj final opinion

doj final opinion

UNITED STAT ES DIS TRICT COURT IC F OR THE D ISTR T OF CO LU M BIA UNITED STAT F AMERICA, : ES O : : la in t if f, P 99 No. on cti l A vi Ci : 96 (GK) -24 : and : TOBACCO-F UND, : REE KIDS ACTION F : ...

More info »
ayout 1

ayout 1

0465039146-FM:FM 12/5/06 12:25 AM Page i C O D E

More info »
NCDOT Current STIP

NCDOT Current STIP

May 2019 NCDOT Current STIP

More info »
Galaxy Note 3 User Manual

Galaxy Note 3 User Manual

4G LTE SMARTPHONE User Manual Please read this manual before operating your phone, and keep it for future reference.

More info »
WEF GGGR 2017

WEF GGGR 2017

Insight Report The Global Gender Gap Report 2 017

More info »
Microsoft Word   SAS Codebook  2011 2012 NSCH V1 05 10 13

Microsoft Word SAS Codebook 2011 2012 NSCH V1 05 10 13

2011-2012 National Survey of Children’s Health SAS CODE FOR DATA USERS: C HILD H EALTH I NDICATORS AND S UBGROUPS Version 1.0: May 2013

More info »
time is the enemy

time is the enemy

is TIME the ENEMY The surprising truth about why today’s college students graduating ... and wha T needs To change aren’t n 1 Time Is the Enemy

More info »
07 5123 06 zigbee cluster library specification

07 5123 06 zigbee cluster library specification

ZigBee Cluster Library – 075123 Document ZigBee Cluster Library Specification Revision 6 Draft Version 1.0 - 0125 Chapter Document: 14 5123 - 07 ZigBee Document: 06 - ZigBee Document 07 - 5123 201 4 J...

More info »
Quectel EC25EC21 AT Commands Manual V1.2

Quectel EC25EC21 AT Commands Manual V1.2

5&EC21 EC2 Command s Manual AT LTE Module Series Rev. _AT EC25&EC21 _Commands _ Manual_ V 1. 2 Date: 2017 - 11 - 1 4 Status : Released www.quectel.com

More info »
Microsoft Word   Table of Contents Full Report with ES.doc

Microsoft Word Table of Contents Full Report with ES.doc

Stern Review: The Economics of Climate Change PAGE TABLE OF CONTENTS i-xxvii Executive Summary i Preface & Acknowledgements iv Introduction to Review vi Summary of Conclusions Part I Climate change: o...

More info »
Facing Our Future

Facing Our Future

Ou Facing r Fut ure Afte of n the Children i rmath Immigration Enforcement y udr Cha Ajay F a Randy Capps c i n Pedr Manuel Juan a oz g O u ñeda Rosa Mar ia Ca sta r F Rober t Santos u t u r tt Molly ...

More info »
C:\JIM\PLUMBOOK\2012PL~1\LIVE\76304.002

C:\JIM\PLUMBOOK\2012PL~1\LIVE\76304.002

[COMMITTEE PRINT] UNITED STATES GOVERNMENT Policy and Supporting Positions f Committee on Oversight and Government Reform U.S. House of Representatives 112th Congress, 2d Session DECEMBER 1, 2012 Avai...

More info »
A Review of Food Marketing to Children and Adolescents: Follow Up Report

A Review of Food Marketing to Children and Adolescents: Follow Up Report

FEDERAL TRADE COMMISSION A Review of Food Marketing to Children and Adolescents Follow-Up Report Federal Trade Commission December 2012

More info »
Skills volume 1 (eng)  full v12  eBook (04 11 2013)

Skills volume 1 (eng) full v12 eBook (04 11 2013)

OECD Skills Outlook 2013 FirSt rESultS FrOm thE SurvEy OF ADult SkillS 2013

More info »
18690 Midwest Evaluation Outcomes at Ages 23 and 24

18690 Midwest Evaluation Outcomes at Ages 23 and 24

Midwest Evaluation of the Adult Functioning of Former Foster Youth: Outcomes at Ages 23 and 24 Mark E. Courtney Partners for Our Children, University of Washington Amy Dworsky Chapin Hall at the Unive...

More info »
Bandit Algorithms

Bandit Algorithms

Bandit Algorithms ́ Tor Lattimore and Csaba Szepesv ari st Draft of Wednesday 1 May, 2019 Revision: c0525791b66f0f41db4e87204ac91f41693d4365

More info »
HANDBOOK of METAL ETCHANTS

HANDBOOK of METAL ETCHANTS

HANDBOOK of METAL ETCHANTS Editors Perrin Walker William H. Tarn CRC Press Boca Raton Boston London New York Washington, D.C. © 1991 by CRC Press LLC

More info »
Layout 1

Layout 1

fo N on al E nd ow me nt ti r t he A r t s a T o R ea d or Not T o Read nsequ A ue sti on of Nati o nal C o Q ence Res ea rc h Re p or t # 47

More info »
book.dvi

book.dvi

Fuzzy Control Kevin M. Passino Department of Electrical Engineering The Ohio State University Stephen Yurkovich Department of Electrical Engineering The Ohio State University ddison-Wesley Longman, In...

More info »