The Role of Program Quality in Determining Head Start's Impact on Child Development: Third Grade Follow Up to the Head Start Impact Study

Transcript

1 THE ROLE OF PROGRAM QUALITY IN DETERMINING HEAD START’S IMPACT ON CHILD DEVELOPMENT OPRE Report 2014-10 March 2014 Third Grade Follow -Up to the Head Start Impact Study

2 THE ROLE OF PROGRAM QUALITY IN DETERMINING HEAD START’S IMPACT ON CHILD DEVELOPMENT OPRE Report 2014- 10 March 2014 Contract Number: HHSP23320062929YC Project Director: Submitted by: Camilla Heid Laura R. Peck Westat Stephen H. Bell 1600 Research Boulevard Social & Economic Policy Rockville, MD 20850 Abt Associates Submitted to: 4550 Montgomery Avenue Jennifer Brooks, Project Officer Office of Planning, Research and Evaluation Suite 800 North Administration for Children and Families Bethesda, MD 20814 U.S. Department of Health and Human Services This report is in the public domain. Permission to reproduce is not necessary. ). The Role of Program Quality in Suggested citation: Peck, Laura R., and Stephen H. Bell. (2014 10, Washington, DC: Determining Head Start’s Impact on Child Development. OPRE Report #2014- Office of Planning, Research and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. Disclaimer The views expressed in this publication do not necessarily reflect the views or policies of the Office of Planning, Research and Evaluation, the Administration for Children and Families, or the U.S. Department of Health and Human Services. This report and other reports sponsored by the Office of Planning, Research and Evaluation are available at http://www.acf.hhs.gov/programs/opre/index.html .

3 Overview The Head Start Impact Study (HSIS) has shown that having access to Head Start improves children’s preschool experiences and school readiness in certain areas, though few of those advantages persisting through third grade (Puma et al., 2012). Scholars and practitioners alike have wondered whether impacts high quality d to might be larger Head Start as oppose or more persistent for children who participate in lower quality Head Start. In response, this report examines the vital policy question: To what extent does variation in the quality of children’s Head Start experiences affect children’s development? The HSIS experimental evaluation, which involved a nationally representative sample and included rich data at baseline, about programs and across several years of follow -up, provides an ideal source for analyzing the answer to this question. based on the ECERS, Arnett, Further informed by experts in the field, this report uses measures of quality capture three distinct dimensions of the Head Start setting: (1) “resources,” which and teacher reports to are the physical characteristics available in the program; (2) the “interactions” between teacher and child; s “exposure” to academic activities in the c lassroom. S and (3) children’ -fourths of lightly less than three the Head Start children in the study were in high quality classrooms for the resources and interactions quality measures, while on the exposure to academic activities measure , about one-fourth of the Head Start children were in high quality classrooms. Prior research posits that richer resources and more favorable interactions should be associated with better cognitive and social outcomes. The relationship of exposure to academic activities among children of thi s age is less clear, with some reason to think that too much such exposure may not necessarily benefit children. We find little evidence that quality matters to impacts of Head Start using the available quality measures from the study across two age cohorts, three quality dimensions, fiv e outcomes, and several years. The one exception is that for 3- year -old program entrants low exposure quality, defined as less exposure to academic activities -run during Head Start participation, produces better behavioral impacts in the short than more exposure to academic activities. Even so, there is no indication that either high quality Head Start or low quality Head Start in any dimension leads to program impacts lasting into third grade. The analysis of quality makes use of the HSIS experimental evaluation design to capitalize on the fact that children were randomized into treatment and control groups, allowing any predicted quality subgroup of the treatment group to be matched to its counterpart in the control group. The analytic approach we take eliminates plausible rival explanations for observed impacts, an approach we advocate for future research that is otherwise challenged by potential selection bias on post-random assignment mediating factors, such as quality.

4 CONTENTS ... 1 Section 1: Introduction Section 2: Background ... 3 Section 3: Data & Measures ... 5 Section 4: Analytic Methods ... 9 Section 5: Findings ... 15 Section 6: Conclusion ... 23 Works Cited ... 24 Appendix: Added Technical Details & Discussion of Alternative Assumptions ... 26 The Role of Program Quality i n Determining Head Start’s Impact o n Child Development ▌ pg. i

5 Section1: Introduction Section 1: Introduction to Head Start moderately We know from the Head Start Impact Study (HSIS) that having access improves children’s preschool experiences and school readiness in certain areas, with some of those advantages persisting through first grade but few lasting into third grade (Puma et al, 2012). Scholars e wondered whether impacts might be larger for those who participate in and practitioners alike hav To explore this, the current report considers the extent to which the quality of high quality Head Start. a child’s Head Start experience affects children’s development. To evaluate th e impacts of Head Start quality on children’s development, we ask: To what extent does variation in the quality of children’s Head Start result in variation in impacts on children’s development? In other words, does evidence of impact differ for children p articipating in high quality Head Start programs from that for Head Start participants as whole? To what extent might the main study findings understate what the program can accomplish in its strongest form? Despite the importance of these questions to po licy and practice, there are analytic challenges involved in addressing them. Among these challenges are: (1) defining the construct and potentially numerous dimensions of “quality” conceptually, (2) making “quality” as defined with measurable validity and reliability from the study’s data, and (3) determining impacts by variation in Head Start quality given that Head Start quality is undefined in the control group who were not afforded access to Head Start. This report aims to tackle these main challenges and analyzes how the quality of children’s Head Start experience influences the degree to which the program impacts their cognitive and social -emotional development. In response to the first two challenges, we choose to operationalize quality in three dis tinct ways. For each of these quality constructs, we create index measures that collapse several variables, thereby potentially increasing the measures’ reliability and validity. With regard to the third challenge, we know that (a) treatment group children who participate in high quality Head Start are likely to differ from treatment group children who participate in lower quality Head Start in ways that relate to their d Start subsequent outcomes (e.g., social and intellectual development) independently of the Hea program, and (b) the concept of Head Start quality is undefined in the control group. On the first of these two points, children who experience high quality Head Start certainly differ from those who do not in terms of where they live, and to which Head Start centers their families apply for admission to It is also possible that more motivated and organized parents navigate the Head Start the program. oms options in their communities more effectively (e.g., manage to get their children placed in classro with higher quality teachers) while at the same time doing more to expand Head Start’s impact in the way they interact with their children at home. Regarding the non- identifiability of control group children likely to participate in high and low quali ty programs —challenge (b) above—because of randomization we know that any subgroup that exists in one of the two randomly- divided experimental samples must have a counterpart in the other sample. Capitalizing on this property of the experimental design, we sort the HSIS control group on the same preexisting traits as characterize children in the Head Start treatment group who experienced low or high quality, and this serves as a benchmark for measuring the impact of Head Start at various quality levels thro ugh analytic procedures described in depth later in the report. To capitalize on having an experimental design, we use baseline characteristics to identify subgroups of treatment group children who do not participate in Head Start (i.e., no-shows) and who participate at varying The Role of Program Quality i n Determining Head Start’s Impact o n Child Development ▌ pg. 1

6 Section1: Introduction levels of Head Start quality. We then apply a totally symmetric procedure to identify their This allows us to calculate impacts by Head counterparts in the control group (see Peck, 2003, 2013). Start quality subgroupings analogous to calculating impacts on subgroups of children defined (symmetrically in the treatment and control groups) by discrete, individual background characteristics such as sex or parental employment at enrollment. In brief, we find little evidence that quality m atters to impacts of Head Start on selected child outcomes, using the available quality measures from the study. year - The one exception is that for 3- old program entrants low exposure quality, defined as less exposure to academic activities during Head Start participation, produces better behavioral impacts in the short-run than more exposure to academic activities . The remainder of this report proceeds as follows: Section 2 provides background on why we would expect variation in the quality of a child’s preschool experience to be associated with variation in children’s developmental outcomes and the trajectory of those outcomes over time. It also presents background on the HSIS’s design, timing, and site coverage. Section 3 details the data that come from the HSIS and the particular measures we use in this research, including our measures of quality and selected outcomes of interest. Section 4 describes the methodological approach used to analyze the extent to which levels of Head Start quality result in var iation in impacts on children’s development. Section 5 presents the findings, and Section 6 concludes. An Appendix includes additional material on the analytic method, including details of alternative assumptions and results not otherwise presented in the main text. The Role of Program Quality i n Determining Head Start’s Impact o n Child Development ▌ pg. 2

7 Section 2: Background Section 2: Background This section discusses why and how the quality of Head Start should matter to children’s cognitive and behavioral development. We then discuss the background of the Head Start Impact study, the source for this report’s data. Why & How Should Quality Matter? rly childhood education refers to a wide As surveyed in Mashburn et al. (2008), the “quality” of ea range of features that children experience in preschool classrooms and school settings that are presumed to impact their development. Definitions of high quality education may include the nature of children’s experi ences in classrooms (e.g., the furnishings and learning materials accessible to children, the frequency of instructional activities, the interactions between teachers and children), characteristics of teachers (e.g., level of education and field of study), and the nature of the interaction with those teachers (Mashburn et al., 2008). In the context of the HSIS, higher Head Start quality experiences are hypothesized to affect children’s cognitive, academic and social skill development differently from low quality experiences. For example, it is presumed that children will achieve better developmental outcomes —i.e., larger impacts— -maintained if they attend Head Start programs characterized by features such as: well uctionally -supportive interactions furnishings, ample learning materials, instr -rich and emotionally child development or between teachers and children, teachers with bachelors and advanced degrees in early childhood education, and exposure to proficient peers. If these quality factors matter during a term advantages to developmental child’s Head Start experience, then they may provide longer- progress into kindergarten and beyond. Research is mixed regarding which of these quality features influence which developmental outcomes, and for whom. Prior re search establishes strong theoretical and empirical support that physical resources and social interactions are most consistently associated with children’s development. In contrast, teacher proficiency and peer competence appear only indirectly related ashburn et al., 2008). Based on this past research, we expect that the quality of the learning (M environments that children in Head Start experience may affect the extent of program impacts, both at the end of their initial exposure to Head Start and as they continue into school. The Head Start Impact Study The Head Start Impact Study, congressionally -mandated, used a nationally representative sample of Head Start programs and newly entering 3 - and 4- year -old children, and randomized children either to a Head Start group that had access to Head Start services in the initial year or to a control group that could receive any other non -Head Start services available in the community, chosen by their parents (e.g., Puma et al., 2005). About 60 percent of control g roup parents enrolled their children in some other type of preschool program in the first year. In addition, all children in the 3- year -old cohort could receive Head Start services in the second year. Under this randomized design, a simple comparison of ou tcomes for the two groups —treatment and control —yields an unbiased estimate of the impact of access to Head Start in the initial year on children’s psychological development and school readiness (Puma et al., 2005). The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 3

8 Section 2: Background This research design ensures that the tw o groups did not differ in any systematic or unmeasured way It is important to note that, because the control except through their access to Head Start services. -old cohort was given access to Head Start in the second year, the findings for this group in the 3- year age group reflect the added benefit of providing access to Head Start at age three, not the total benefit The study was designed to examine separately two of having access to Head Start for two years. cohorts of children, newly-entering 3- and ne -entering 4- year -olds. This design reflects the wly hypothesis that different program impacts may be associated with different ages of entry into Head Start. In addition to random assignment, the HSIS is set apart from most program evaluations because it includes a nationally representative sample of programs and program participants, making its research -2003, not findings generalizable to the national Head Start program as a whole as it existed in 2002 just to the studied sample of local programs and children. However, the study does not represent Head Start programs serving special populations such as tribal Head Start programs, programs serving migrant and seasonal farm workers and their families, or Early Head Start. Further, the study does not represent the 15 percent of Head Start programs in which the oversubscription for the available Head The study sample, spread Start “slots” was too small to allow for an adequate-sized control group. over 23 different states, consisted of a total of 84 randomly- selected local Head Start grantees/delegate agencies, 383 randomly -selected Head Start centers, and a total of 4,667 newly- entering children, including 2,559 3- year -olds and 2,108 4- year -olds. At each of the included Head Start centers, program staff provided information about the study to parents at the time enrollment applications were distributed. Parents were told that enrollment procedures would be different for the 2002-03 Head Start year and that some decisions regarding enrollment would be made using a lottery Local agency staff implemented their typical -like process. process of reviewing enrollment applications and screening children for admission to Head Start based on criteria approved by their respective Policy Councils. No changes were made to these locally established ranking criteria for prioritizing which families to serve among a greater number of applicants than available, funded program slots. The study collected information on all children determined to be eligible for enrollment in Fall 2002, and an average sample of 27 children per included center was selected from this pool: 16 who were assigned to the Head Start group and 11 who were assigned to the control group (in centers where fewer children than expected were actually available, a smaller sample of children was selected). The randomized children formed two study samples—newly -entering 3- year -olds (to be studied through two years of Head Start participation and beyond) and newly-entering 4- -olds (to be studied year through one year of Head Start participation and beyond). The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 4

9 Section 3: Data & Measures : Data & Measures Section 3 This section identifies the source of our data and details the variables we use to measure Head Start quality and selected key child outcomes. Data Source As noted above, the HSIS collected data from 383 randomly selected Head Start centers within 84 randomly selected Head Start grantee agencies, across 23 states. The respondents included parents, children, teachers, and other care providers. The resulting data set contains records for 4,667 newly year -olds and 2,108 4- year -olds. The data includes follow-up entering children, which include 2,559 3- -year records from the Head Start years (one year for 4 year -olds) as well as -olds and two years for 3- kindergarten and first and third grade years. udy’s enrolled children, their families and The data set includes a rich set of baseline variables on the st the Head Start centers in which they enrolled as well as details on alternative care arrangements they might have had. The follow- up variables are similarly rich, including many measures of children’s development in several domains, and parenting and family experiences. We discuss the specific variables that are relevant to our analysis next. Quality Measurement We posit that three main dimensions of a child’s Head Start experience exist, one “structural” and two “process- related.” The structural measure of quality considers the “resources” which are the physical characteristics of the setting. The process- related measures of quality consider the interactions between teacher and child and exposure to academic activ in the classroom. As informed by the study’s expert ities panel on quality, we contend that these measures capture different dimensions of quality such that we are justified in using each of them, independently, to analyze something about that specific quality We discuss the specific operationalization of each of these quality measures next. experience. Resources We use a measure of resources that represents a facility’s physical structure and its contents, using 17 of the items in the Early Childhood Environment Rating Scale (ECERS), those that form the subscale on The measure includes 17 specific variables, each of which we coded to range from 1 to 7. materials. Specific elements include characterizations about the indoor space and furnishings, space for both gross and fine motor play, private space, child- related display, and the availability of items relating to art, dramatization, nature/science and math/numbers. As an average of the 17 items, the resulting measure is also on a 1- to-7 scale, where we flagged those with an average score of 5 or greater as having “high” quality (recoded as 2) by this measure and those with a lower average score as having “low” quality Those treatment group members that were no- shows have a resulting score of zero. (recoded as 1). Interactions Our next quality measure aims to capture the quality of teacher -child interactions. It is an index computed from 31 variables, eight drawn from the ECERS and 23 from the Arnett Caregiver Interaction Scale. The eight ECERS elements in clude the following: encouraging children to communicate, developing reasoning skills, and staff-child interactions, for example. Each of these could range from 1 to 7 in value. The Arnett elements included the following characteristics of staff interactio ns: kneeling/bending to The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 5

10 Section 3: Data & Measures child’s level, assisting children in making choices, exercising control over children, encouraging new experiences, being attentive when children speak, encouraging prosocial behavior, explaining reasons for Although the child misbehavior, placi ng value on obedience, and speaking warmly to the children. -to-7 range and -to-4 scale, we recoded them so that they would have a 1 original values of these fell on a 1 We defined “high” quality as an average score of 6 be comparable for averaging with the ECERS items. or higher and “low” quality as an average score below 6. As with the other two quality measures, zero is the classification for no -shows (and does not represent anything about quality experience). Exposure The process the frequency of academically - -related measure of exposure, as we have defined it, considers focused activities ren experience in the classroom. The measure contains 19 teacher-reported that child ld(ren) tell a story, discussing variables including the following: showing how to read a book, having chi new words, learning names of letters, practicing letters’ sounds, writing letters and one’s own name, discussing calendar/days of week, counting, playing math games, working with rulers and measuring cups, for example. As with the resources quality measure, each of the items within this scale can range from 1 to 7, and our aggregate measure is an average of those. Those average scores of 6 or greater are identified as having “high” quality (recoded as 2) by this measure, and those with a lower average score are identified as having “low” quality (recoded as 1). Those treatment group members that were no -shows have a resulting score of zero. Some disagreement exists within the field regarding whether greater ic activities appropriate and beneficial; nevertheless, we refer to those with exposure to academ is age- higher scores as experiencing higher quality on the exposure measure. -1 summarizes these three quality measures separately for the 3 - and 4-year old cohorts in the Exhibit 3 year -old cohort and 23 percent treatment group. As earlier noted, it shows that about 17 percent of the 3- of the 4 -year -old cohort never participated in Head Start. This means that, despite having been randomized at the time of their Spring 2002 application to attend Head Start, by Spring 2003 those children had not attended, for even one day. Among those who did attend Head Start, 64 percent of the 3- year year -old cohort experienced high resource quality. Similarly, 7 2 -old cohort and 73 percent of the 4- percent and 79 percent of the two cohorts, respectively, experienced high quality interactions. A smaller proportion of each cohort—27 percent and 25 percent, respectively—experienced high quality exposure. As noted earlier, the field is conflicted on whether more exposure to academic activities is expected to be good for children’s development. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 6

11 Section 3: Data & Measures Exhibit 3 -1. Descriptive Statistics of Three Head Start Quality Measures among Treatment Group Members who Participated in Head Start, by Age Cohort Old Cohort 4- Year Old Cohort -Year 3 Percent Number Percent umber N Head Start Treatment Group 16.6% 243 Never participated in HS 23.2% 276 1,223 76.8% 915 83.4% Participated in HS Among those who participated in HS... (range = 1 -7) Resources High quality (5+) 684 64.2% 567 72.6% Lower quality (<5) 382 35.8% 214 27.4% -7) (range = 1 Interactions High quality (6+) 764 71.7% 617 79.0% Lower quality (<6) 302 28.3% 164 21.0% Exposure (range = 1 -7) 278 188 24.7% 27.4% High quality (6+) 574 72.6% 735 6) Lower quality (< 75.3% : Details of the elements comprising each measure appear in the narrative. Notes —of 5 for resources and 6 for interactions and exposure— We chose the cutoffs we did for the following For the resources measure, all items come from the ECERS, and it is common practice to use 5 reasons. as the threshold above which “high” quality is designated. The other two measures do not have a common convention. The interaction measure draws from a combination of Arne tt and ECERS items, scaling them comparably and summing them. The choice of 6 out of 7 as the threshold for what designates “high” quality seems appropriate because of the distribution of resulting values on this measure: as Exhibit 3-1 shows, about 72 per cent of the 3 -year -old cohort in the treatment group and 79 percent of the 4- year -old cohort in the treatment group had high interactions quality. Since these percentages are already quite high, if we would have lowered the threshold to 5 points on the 7-point scale we would have less high -low variation to examine. As for exposure quality, this measure draws from teacher reports; like the interaction measure, it does not have a field -accepted designated threshold for what one might consider to be “high” qua lity. Furthermore, as noted above, whether more “exposure” to academic activities is helpful As a result, we chose the cut remains in debate. -point of 6 as our threshold in order to create a relatively high bar for designating “high” quality on this measure. Outcome Measures While the Head Start Impact Study explores many outcomes, for this analysis we examine five specific outcomes across two broad domains—cognitive and social -emotional. In the domain of cognitive outcomes, we include the PPVT, and the Woodcock -Johnson Letter Word Identification and Applied Problems variables. -emotional outcomes, we include a As key outcomes representing children’s social measure of Social Skills and Positive Approaches to Learning (which we refer to as “social competence”) and Total Child Behavior Problems. We choose these specific measures in part because of our interest in children’s development across the domains listed but also because they are consistently measured across all points of HSIS follow -up. Therefore we are able to identify the extent to which Head Start quality has differential impacts not just by the end of the Head Start year but also over time. The current Head Start quality analysis considers the first point of follow-up, which is when we know Hea d Start’s overall influence to be strongest and we therefore have the greatest chance of detecting different, stronger impacts from high quality Head Start. We also analyze the role of Head Start quality on children’s The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 7

12 Section 3: Data & Measures low-up, which extends through the end of their third grade year. outcomes in the following years of fol Each of the selected outcome variables is detailed below. Cognitive Domain Within the cognitive domain, “vocabulary knowledge” is a skill that represents children’s oral language -reading skills” focus on letter recognition, an important step toward reading development, “pre proficiency, and “early math skills” include basic numeracy and math skills that are the foundation for more advanced quantitative development. These central cognitive outc omes are measured by the Peabody Picture Vocabulary Test (PPVT -III, adapted), and the Woodcock-Johnson III (WJ3) Letter-Word Identification and Applied Problems subsets, respectively. All three of these come from direct assessments of children and are available in each year of follow -up. Social -Emotional Domain In the social-emotional domain, we consider to variables: the extent to which children engage in an overall Social Skills and Positive Approaches to Learning measure (which we shorten as “social comp etence”) as collected from interviews with parents at each of the follow -up points; and “total” child behaviors that are (1) aggressive or defiant, (2) inattentive or hyperactive, and (3) shy, withdrawn, or depressed. Each of these is described next. Social Skills and Positive Approaches to Learning. Although many measures might represent the social - emotional domain of children’s outcomes, we selected this measure, a composite of several elements as follows. Social skills focus on cooperative and empathic behavior, such as, “makes friends easily,” “comforts or helps others,” and “accepts friends’ ideas in sharing and playing.” Approaches to learning deal with curiosity, imagination, openness to new tasks and challenges, and having a positive attitude about gaining new knowledge and skills. Examples include, “enjoys learning,” “likes to try new things,” The seven items that comprise this scale came from parents’ and “shows imagination in work and play.” judgments whether the behavioral description was “no t true,” “sometimes true,” or “very true” of the child. The scale’s resulting scores can range from zero (meaning all the items were rated “not true” of the child) to 14 (meaning all the items were rated “very true” of the child). Total Child Behavior Problems . Elements in the three subscales of this measure combine together to form the Total Child Behavior Problems scale that we use. Parents were asked to rate their children on items dealing with specific behaviors, and they did so on a three- point scale o f “not true,” “sometimes true,” or “very true.” Example items include the extent to which the child “hits and fights with others,” “can’t The 14 items in the scale result in concentrate, can’t pay attention” and “is unhappy, sad, or depressed.” the possibl e score ranging from zero (all items marked “not true”) to 28 (all items marked “very true”). While the HSIS overall considers health and parenting domains as well, we focus this analysis of the role of Head Start quality specifically on this subset of outcomes in the cognitive and socio-emotional domains because prior theory and evidence indicate these are most proximally related to the quality of care and education. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 8

13 Section 4: Analytic Methods Section 4: Analytic Methods Comparing those children in high quality Head Start with those in low quality Head Start or with no Head Start exposure—all within the study’s treatment group only—would result in impact estimates biased by These samples would differ at baseline on unmeasured charact selection. eristics that lead to different So too would outcomes independently of the effects of different quality Head Start experiences. comparison of treatment group children exposed to a specific level of Head Start quality to the full control prised of children who—had they been randomly assigned to the treatment group— group, which is com would have, in distinct subgroups, not participated in Head Start, participated in low quality Head Start, or participated in high quality Head Start. Differences between these subgroups and the entire control 1 To avoid these group would reflect compositional distinctions, not simply the impact of Head Start. problems and capitalize on the experimental design of the HSIS, we use an approach established in Peck (2003) to create equ ivalent subgroups of treatment and control group members for separate analysis and that therefore results in internally valid (i.e., unbiased) estimates of Head Start’s impact on that 2 Because some misclassification of children is inevitable —e.g., predicting a child who actually subgroup. receives low quality Head Start as likely to receive high quality Head Start —we convert results for This predicted quality subgroups to results for actual quality subgroups under certain assumptions. translates (subject to the validity of the assumptions) the internally valid impact estimates for predicted quality subgroups into externally valid— and more policy relevant —impact estimates for actual quality subgroups. Description of Analytic Procedure ique we use identifies in identical fashion sample members from the treatment and control The techn groups who are predicted to participate in high quality Head Start programs, then estimates impacts on that subpopulation as one would in any experimental subgroup an The symmetry of the alysis. identification procedure ensures that equivalent subgroups are compared and guarantees that the impact estimates are free from differential selection bias or other sources of internal bias. Thus, the symmetric selection of treat ment and control subgroup members within the experimental data ensures unbiasedness of the impact estimates generated for the subgroups examined. However, the subgroup for which the methodology produces unbiased impact estimates —children with the highest predicted probabilities of being in high quality Head Start programs, for example—is not necessarily the subgroup of policy interest actually experience high quality Head Start. The predictive model, while —children who ol groups, is imperfect for both groups, potentially reducing the symmetric for both treatment and contr This is why we develop and apply relevance (i.e., the external validity or generalizability) of the findings. actual rather th an procedures to convert results so that they represent impacts on subgroup predicted members, subject to certain assumptions. 1 ition we conducted the described analysis, comparing low quality Head Start participants in To check this propos the treatment group to the entire control group and then high quality Head Start participants in the treatment group to the entire control group. The results bore li ttle resemblance to the main findings of our analysis described here, which are not subject t o selection biases of this type . 2 Further discussion of this appears as a M etho , Bell and Peck ( 2013) , and d Note in Three Parts in Peck (2013) Harvill, Peck and Bell (2013) . The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 9

14 Section 4: Analytic Methods —explained and justified next —are involved in carrying out this research approach: The following steps Select random subsamples of the treatment group from which to predict the level of Head Start 1. quality. 2. Using baseline characteristics, predict quality. Use the resulting predicted quality variable to identify subgroups symmetrically in the treatment 3. and control groups. Analyze the impact of predicted quality by comparing mean outcomes between the symmetric 4. treatment and control group subgroups created. 5. Convert results for predicted subgroups to represent impacts on actual subgroups under certain assumptions. We also explore an alternative set of assumptions at Step 5 to examine the robustness of the findings to different conversion assumptions. Step 1. Select random subsamples of the treatment group to predict Head Start Quality. retaining the strength of the experimental design. A key feature of this approach to subgroup analysis is In order to do this, an important first step is to select a strategy for ensuring symmetric identification of subgroups. While prior work has used a single external “modeling” subsample to do so, the approach we or use in out-of-sample prediction. Through this take here is to choose several modeling subsamples f process subgroups with equivalent predicted probabilities of participating in Head Start at a particular level of quality are identified in both treatment and control groups. Using the entire treatment group for subgroup prediction at once and for impact analysis could introduce bias because of the better fit that is inevitable for the sample that is used for modeling. This has been referred to elsewhere as “overfitting bias” and can be avoided. To clarify, if the whole treatment group were used for prediction, then the model might more accurately identify the desired subgroup for treatment group cases than for predicted control group cases. This is because the prediction model would mold its parameters to the errors that exist in the outcome data due to random baseline variation between the groups. This would result in some unknown amount and direction of bias that is easily avoidable by 3 In this treatment group. keeping separate the predictive and impact estimation subsamples of the application, we select ten random 90-percent subsets of the treatment group from the combined 3- year -old 4 as elaborated below. year -old cohorts for predictive modeling, and 4- y. Step 2. Using baseline characteristics, predict qualit In this application, we create three distinct quality indicators for all members of the 3- -old and 4- year year -old treatment group cohorts, each with three levels: a value of 0 represents those who never participated in Head Start; a value of 1 represents “low quality” Head Start, among those who participated in the program; and a value of 2 represents “high quality” Head Start, also among those who participated in the program. The specific threshold for dividing high quality from low quality is measure- specific, as 3 Some have argued that the loss of sample size associated with choosing an external, modeling sample imposes too great a cost (e.g., Gibson, 2003); but the problem of potential overfitting bias diminishes as sample size increases, making the step of selecting a random subsample for modeling even more important in smaller samples (Harvill, Peck & Bell, 2013 ). 4 Initial examination of the predictions by cohort showed that the prediction rate was better for the pooled - -cohort prediction, which justifies our choice to pool. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 10

15 Section 4: Analytic Methods -section above. With this categorical quality measure as our dependent defined in our measurement sub variable, we used a generalized logit procedure to predict no -show and quality status, with explanatory d child characteristics as follows: variables including center, family, an Center Characteristics: center of random assignment (series of dummy variables, omitting the dummy • for one center) • home language, both bio- parents at home, primary caregiver’s age, mother’s Family Characteristics: ion, bio -mother’s recent immigrant status, mother’s marital status, mother gave birth to study educat child as a teen • Child Characteristics: sex, age, race, language We expected that the center of random assignment would be the best predictor of the quality of Head Start; we further allow this to proxy other community characteristics that might be associated with higher 5 quality. - and child- level characteristics might also be associated with the quality of Head Other family Start that a child experiences. than basing our decision for which predictor variables to include on Rather arbitrary or theoretical factors, we follow the lead of propensity score methods (to which our treatment group predictive modeling procedure is closely akin) which advocate a “kitchen sink” approach for generating the greatest explanatory power and best correct prediction rate possible. We are uninterested in interpreting any of the coefficients on our explanatory variables from the prediction model but instead have as our goal the best “hit rate”: correctly matching those predicted to be in each of our three subgroups with their actual subgroup experience. With each of the ten 90 -percent subsamples drawn in Step 1, we predict the quality experience of the remaining 10 percent of the sample, both within the treatment and the control group. This involves “out of sample” prediction for the entire sample, eliminating concerns about overfitting and ensuring symmetric -related subgroups within treatment and control arms. Once we have replicated prediction of the quality this process for the entire sample, we concatenate the subsamples together to maintain full use of the entire sample for analysis. Step 3. Use resulting predicted quality variable to identify subgroups. dividual is designated to a subgroup (no-shows, low quality and high quality) Within the sample, each in based on which category (0, 1 or 2) he or she has the highest probability of belonging to, given baseline characteristics. Step 4. Analyze the impact of quality by comparing the treatment and control groups’ mean outcomes, by subgroup. Although this kind of analysis can involve a conventional split-sample subgroup analysis, we follow the HSIS’s existing practice of pooling data and computing subgroups’ impact estimates accordingl y (see Puma et al., 2010b, for details). 5 T o gauge the extent to which our assertion that “the center of random assignment would be the best predictor” of Head Start quality we examined the correct prediction rates based on including only the center dummies and on adding the family and child characteristics to the center dummies. Our conclusion from this side analysis is that indeed the center dummies are the best predictors of quality. In fact, the family and child characteristics The main reason to include the family and child characteristics in the model is alone predict quality very poorly. not to distinguish further between levels of Head Start quality but instead to better identify those individuals who classify as no -shows. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 11

16 Section 4: Analytic Methods Step 5. Convert impacts for predicted quality subgroups to impacts on actual quality subgroups. This final step converts the impact estimates from Step 4, which represent impacts on predicted subgroups, to represent impacts on actual subgroups, under certain assumptions. Here we discuss our preferred assumptions, and the Appendix elaborates on two alternative sets of assumptions and their implications. To design the conversion process, we begin with three equations that posit that the impact on each of the -shows” from here on—low quality participants, three predicted subgroups (non- —called “no participants and high quality participants, respectively) is a weighted sum of the impacts on actual subgroups, where the weig hts are the proportion of each subgroup that are correctly classified into that group. = 푠 푁 퐻 + 푤 푔 퐿 + 퐼 푁 푁 푁 푁 푁 푁 푁 푔 = 푠 퐻 푁 + + 푤 퐿 퐼 퐿 퐿 퐿 퐿 퐿 퐿 퐿 퐿 + 푁 퐼 푤 푠 = + 푔 퐻 퐻 퐻 퐻 퐻 퐻 퐻 퐻 where the following notation applies: I is the impact on predicted no-shows N I is the impact on predicted low quality participants L I is the impact on predicted high quality participants H N is the impact on predicted no-shows who are actual no-shows N N is the impact on predicted low quality participants who are actual no-shows L N is the impact on predicted high quality participants who are actual no-shows H L is the impact on predicted no-shows who are actual low quality participants N L is the impact on predicted low quality participants who are actual low quality participan ts L L is the impact on predicted high quality participants who are actual low quality participants H H is the impact on predicted no -shows who are actual high quality participants N H is the impact on predicted low quality participants who are actual high quality participants L H is the impact on predicted high quality participants who are actual high quality participants H s hows is the proportion of predicted no- shows who are actually no - s N s is the proportion of predicted low quality participants who are actually in the no- s how subgroup L is the proportion of predicted high quality participants who are actually in the no- s how subgroup s H w is the proportion of predicted no-shows who are actual ly in the lo w quality subgroup N is the proportion of predicted low quality participants who are actually in the lo w quality subgroup w L w is the proportion of predicted high quality participants who are actually in the lo w quality subgroup H The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 12

17 Section 4: Analytic Methods is the propor g h quality subgroup tion of predicted no-shows who are actually in the hi g N is the proportion of predicted low quality participants who are actually in the hi g g h quality subgroup L n the hi g is the proportion of predicted high quality participants who are actually i g h quality subgroup H This set of three equations contains nine unknowns, and so some (six) assumptions are necessary in order to solve the system. In this application, we make the following six assumptions: = 0 – the impact on predicted no-shows who are actual no- shows is zero N (1) N shows is zero = 0 – the impact on predicted low quality participants who are actual no- N (2) L = 0 – the impact on predicted high quality participants who are actual no- shows is zero N (3) H = L – the impacts on low quality partic ipants are the same for children predicted to be high (4) L H L quality participants and children predicted to be low quality participants – the impacts on high quality participants are the same for children predicted to be high H = H (5) H L quality participants and chil dren predicted to be low quality participants (6) H - L = H - L – the impact on high quality participants differs from impact on low quality N L L N participants by the same amount whether one looks at high and low quality cases predicted to be no- 6 shows or high and low quality cases predicted to be low quality participants Ultimately, we must rearrange these equations, imposing our assumptions, to express the terms of in —impacts on the actual subgroups—as a function of the elements that are known, the impact s on terest predicted subgroups and the relative proportions of those predicted to be in each group who are actually in each group. The resulting conversions are as follows: ( ) ( ) 푔 ( ) 푔 + + 푤 푤 + 푟푔 −푟 1 푔 1 −푟 푁 퐻 퐻 푁 푁 퐻 −� �퐼 �퐼 퐿 = � 퐿 푁 )( 푔 + −푤 푔 푔 푤 + 푔 ) ( 푤 푤 푁 퐿 푁 푁 퐻 퐻 푁 퐿 ) ( ) ( 푤 ) 푔 ( + 푔 + + 푟푔 푤 −푟 푔 1 푁 푁 퐿 푁 퐿 퐿 + � �퐼 퐻 푔 푔 −푤 )( 푤 + 푔 ) 푤 ( 푁 푁 퐻 퐿 퐿 퐻 ( ) ) ( 푔 푤 ( + + ) 푔 −푝 1 푝푤 + 푤 1 −푝 푤 퐻 푁 퐻 푁 푁 퐻 �퐼 �퐼 −� � = 퐻 푁 퐿 ) 푔 + 푔 푔 + −푤 푤 푔 )( ( 푤 푤 퐻 푁 퐻 푁 퐿 푁 푁 퐿 ( ) ( ) ) 푔 + 푤 + 푔 ( + 푝푤 푤 −푝 1 푤 푁 퐿 푁 퐿 퐿 푁 � �퐼 + 퐻 + 푔 푔 푤 −푤 푔 ) )( 푤 ( 퐻 푁 퐿 퐿 푁 퐻 where shows; and is the proportion of low quality participants who are predicted as no- 1-r is the proportion of high quality participants who are predicted as no shows. 1-p , N , all assumed to and N subgroup is a linear combination of N The impact on the full actual no-show H L N be zero, making the overall impact on the full no-show sample 0, consistent with the conventional Bloom 6 Or whether one looks at high and low quality cases predicted to be high quality participants, once one combines . – L – L = H this final assumption with the previous two assumptions to derive H N H N H The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 13

18 Section 4: Analytic Methods assumption (Bloom, 1984; Puma et al., 2005). The assumptions discussed here are our preferred assump tions and the ones that we use for our analysis. We discuss two alternative sets of assumptions in the Appendix and present results from those analyses. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 14

19 Section 5: Findings Section 5: Findings This section reports the estimated impacts of low and high quality Head Start services on the children By assumption, for who actually received those services, subject to the assumptions described earlier. children in the experimental treatment group who never participated in Head Start no impacts occurred: N -old cohort and in Exhibits 5-6 through 5-10 for = 0, as shown in Exhibi ts 5 year -1 through 5-5 for the 3- When the quality of Head Start services is divided between low and high for each -old cohort. year the 4- of the quality dimensions examined, results get more interesting, as reported in the High, Low, and High- Low Difference rows of the exhibits. Like previous Head Start Impact Study reports involving subgroups, we confine discussion to measured impacts that we are confident (1) differ from zero and differ from trasting subgroup in the same division of the population, or (2) differ from zero in a impacts on a con consistent pattern across multiple years. We do not formally adjust for the increased potential for false positives that arises from conducting many hypothesis tests in exploratory research, but instead make an informal adjustment in our interpretation of results. -1 shows, there is no evidence of statistically significant differences in impact on PPVT score As Exhibit 5 the 3 -year between high and low quality Head Start services for -old cohort. A favorable impact of high resource quality exists in both the first and second years of Head Start for the 3 -year -old cohort, though this was not significantly different from the impact for low resource quality in these years. The effect size that corresponds to these absolute impacts in PPVT test score units ranges from 0.16 to 0.31 (impact divided by the standard deviation of the control group mean). Exhibit 5 -1. Estimated Impacts on PPVT Scores for Actual Subgroups, by Quality Measure, by -Year Follow -Old Cohort, Preferred Assumptions -up Year, 3 End of HS End of HS End of End of First End of Third Year 1 Year 2 Kindergarten Grade Grade 2003 2004 2005 2006 2008 405.7 251.4 Control Group Average 298.3 357.9 339.9 (29.5) (34.3) (36.6) (30.1) (28.4) (standard deviation) Resource a No -shows 0.0 0.0 0 0 0 .0 .0 .0 2.1 5.8 * 1.0 10.7 *** High 4.1 -4.4 -0.8 2.7 4.9 Low -0.9 1.8 -0.7 10.2 5.8 -Low Difference High 5.0 Interaction 0 0 0 0 0 .0 .0 .0 .0 .0 -shows No ** 4.8 4.5 7.6 3.5 0.7 High 11.2 -4.2 -0.5 -0.7 -4.3 Low 1.2 9.0 8.7 -3.5 4.2 High -Low Difference Exposure 0 0 0 0 .0 .0 .0 .0 .0 0 No -shows -5.7 -3.9 5.0 12.5 2.4 High 1.5 2.6 7.1 * 5.2 1.3 Low -0.1 -5.4 5.4 3.6 -10.9 High -Low Difference The impact is the regression Notes: -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. Some high -low differences appear not to sum because of rounding. a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. lly significant: p<0.10 *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistica The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 15

20 Section 5: Findings Word scores for the three -year A similar set of findings appears in examining the WJ3 Letter- -old cohort (Exhibit 5-2), including favorable impacts of high interaction quality and low exposure quality over the school years, the latter continuing into kindergarten. In two cases, a statistically significant two pre- in effectiveness between quality levels exists, favoring high interaction quality in the second difference The lat ter finding suggests that greater Head Start year and low exposure quality in kindergarten. -old cohort, exposure to academic year , as reported by teachers, disadvantages children in the 3- activities In contrast, the children with exposure to at least during their prekindergarten and kindergarten years. academic activities more in Head Start have largely unfavorable impacts on WJ3 Letter Word scores at the end of kindergarten. Effect sizes for the favorable impacts referenced here range from 0.21 to 0.35. Exhibit 5 - 2. Estimated Impacts on WJ3 Letter- Word Scores fo r Actual Subgroups, by Quality -Year -Old Cohort, Preferred Assumptions Measure, by Follow -up Year, 3 End of HS End of HS End of End of First End of Third Year 1 Year 2 Kindergarten Grade Grade 2003 2004 2005 2006 2008 8 307.6 482. 330. 1 383. 4 9 Control Group Average 432. (35.3) (27.6) (27.4) (31.6) (29.8) deviation) (standard Resource a 0.0 .0 0.0 No 0.0 0 .0 0 -shows 5.2 *** High 1 .0 1 .5 3 .9 10.3 -2 Low -0 .9 5.0 .7 -1 .8 -5 .2 High -Low Difference 5.3 6 .1 3 .7 3 .3 9 .1 nteraction I -shows 0.0 0.0 0.0 0 .0 No 0.0 3 9.7 7.5 ** 2.2 ** .8 3 .7 High 5.0 -8 .8 Low -7 .2 -8 .6 -6 .9 High 4.7 16.2 * 9.4 12. 4 -Low Difference 5 10. xposure E No -shows 0.0 0.0 0.0 0 .0 0 .0 9 ** -3 .7 -24. 7.2 -8.7 -10. 5 H igh ** 3.3 ** 5.7 * 8.1 9.1 4 .8 L ow High -Low Difference -1.9 -9 .5 -33. 0 * -12.0 -15.3 Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the -low differences appear not to sum because of rounding. number of points on each outcome measure. Some high a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p<0.10 Exhibit 5 -3 shows no noteworthy impacts on WJ3 Applied Problems by quality level for the 3- year -old cohort. One significant difference in impacts arises for social competence, however (Exhibit 5 -4). As with WJ3 Letter -Word scores, children exposed to relatively fewer academic activities as reported by teachers (i.e., lower exposure quality) appear to benefit more from Head Start participation than other children in terms of their social competence in kindergarten. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 16

21 Section 5: Findings -3. Estimated Impacts on WJ3 Applied Problems Scores for Actual Subgroups, by Quality Exhibit 5 Measure, by Follow -up Year, 3 -Old Cohort, Preferred Assumptions -Year End of HS End o f HS End of End of First End of Third Year 2 Year 1 Kindergarten Grade Grade 2004 2003 2005 2006 2008 373.6 399.9 486.5 431.3 Control Group Average 453.7 (20.6) (21.8) (22.8) (27.4) (21.2) (standard deviation) Resource a 0.0 0.0 0.0 0.0 -shows No 0.0 9.8 ** 3.9 0.8 2.0 1.8 h Hig -3.2 -5.8 -3.8 1.6 -2.9 Lo w -Low Difference 7.1 6.5 13.6 0.4 4.7 High teraction In 0.0 0.0 0.0 0.0 0.0 No -shows * 5.1 1.1 5.4 2.5 3.3 Hig h Low -8.3 -7.9 0.2 -7.8 4.8 13.4 8.9 0.6 2.3 11.1 Difference h-Low Hig Exposure 0.0 -shows 0.0 0.0 0.0 No 0.0 2.5 6.6 -5.6 -0.5 -1.2 Hig h 6.2 -0.4 -0.2 2.5 0.9 Low 7.0 -5.4 -3.0 -2.0 -3.7 Hig h-Low Difference -adjusted difference (impact) between the treatment and control groups in the Notes: The impact is the regression -low differences appear not to sum because of rounding. number of points on each outcome measure. Some high a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p<0.10 *** -4. Estimated Impacts on Social Competence for Actual Subgroups, by Quality Measure, Exhibit 5 -Year -Old Cohort, Preferred Assumptions -up Year, 3 by Follow End of HS End of HS End of End of First End of Third Year 2 Year 1 Kindergarten Grade Grade 2003 2004 2005 2006 2008 Control Group Average 12.4 12. 5 0 12. 3 12. 12. 5 (1.7) (1.9) (1.8) (1.8) (1.8) (standard deviation) Resource a 0.0 -shows 0.0 0.0 0 .0 0 .0 No -0.2 0 .4 * 0.2 0 High .3 0 .4 ** Low 0.4 -0 .1 0 .4 -0 .4 0 .0 -0 .3 0 .5 -0.6 0 .7 0 .4 H igh -Low Difference I nteraction -shows 0.0 No 0.0 0.0 0 .0 0 .0 .3 0 0.0 0 .4 0 .3 0 .2 H igh -0 .4 0 .0 -0 .5 0 .2 Low -0.1 -Low Difference 0.1 0 .8 0 .3 High 0 .7 0 .1 Exposure -shows 0.0 No 0.0 0.0 0 .0 0 .0 High 0.0 0 .5 -0 .7 -0 .9 0 .7 0 Low 0 .1 -0.1 .6 ** 0.3 0 .2 .5 0.1 0 .4 -1 .3 * -1.2 0 -Low Difference H igh The impact is the regression Notes: -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. Some high -low differences appear not to sum because of rounding. a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. statistically significant: p<0.10 *** statistically significant: p<0.01; ** statistically significant: p<0.05; * The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 17

22 Section 5: Findings cts on problem behaviors (Exhibit 5-5) with negative signs signal desired reductions in poor Impa -up, behavior. These occur for certain high and low quality subgroups over the first three years of follow consistent with the social competence findings but strong High resource quality is associated with er. favorable impacts over the first three years of follow -up, and these impacts are statistically significantly stronger than impacts on those experiencing low resource quality in two of those years. The reverse there are favorable impacts of low quality Head Start programs in the first occurs for exposure quality: year of the study; and in kindergarten, and those impacts are statistically significantly different from the unfavorable impacts of high exposure equality found in those same years. Effect sizes for statistically significant findings by subgroup range from 0.17 to 0.56 standard deviation units. -5. Estimated Impacts on Problem Behaviors for Actual Subgroups, by Quality Measure, Exhibit 5 by Follow -up Year, 3- Year -Old Cohort, Preferred Assumptions End of HS E nd of HS End of End of First End of Third Year 2 Year 1 Kindergarten Grade Grade 2004 2003 2005 2006 2008 6.2 .8 5 .6 5 Control Group Average 5 .1 5 .0 (3.9) (3.8) (3.9) (3.6) (4.4) (standard deviation) Resource a hows 0.0 0.0 -s 0.0 0 .0 0 .0 No -1.0 ** -1.0 ** -0.5 ** -0 .3 -1.0 High .8 0.4 0 .7 1 .0 * 0.6 0 L ow -1 -1.3 * -1.7 -2 .0 *** -1.2 .1 H -Low Difference igh nteraction I No -shows 0.0 0 .0 0.0 0.0 0 .0 * -0.9 -0 .7 -0.6 0 .1 0 .2 H igh -0.4 1.0 0 .6 -0 .8 -0 .1 L ow .3 0 -0.1 -2.0 -1 .3 0 .8 H igh -Low Difference xposure E -shows 0.0 No 0.0 0.0 0.0 0 .0 High 1.2 * 0.8 2 .2 * 1.2 0 .3 .2 *** -0.9 -1 -1.1 ** -0.6 0 .0 L ow ** 1.7 3 2.4 .4 ** 1.8 0.3 igh H -Low Difference Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. Some high -low differences appear not to sum because of rounding. a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. *** ** statistically significant: p<0.05; * statistically significant: p<0.10 statistically significant: p<0.01; sidering the 3 -old results across all quality dimensions, outcomes, and follow-up years, Con -year in impact between high and low quality subgroups occur no more statistically significant differences 7 The strongest evidence frequently than would be expected by chance when no true diff erences exist. concerns the benefit of less exposure to academic activities : low exposure quality surpasses high exposure quality in generating favorable impacts four times, especially for behavioral development outcomes. 7 the 75 hypothesis tests of differences by quality level conducted on the 3 -year -old cohort (three quality Among types, five follow up years, five outcome measures) , about seven or eight are expected to be statistically significant by chance at the 0 . In fact, seven are statistically significant. .10 significance level The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 18

23 Section 5: Findings xhibits 5 Next, E -old cohort. As Exhibit 5-6 shows, there are no -6 to 5- 10 report the results for the 4 -year statistically significant differences in impacts on PPVT scores for 4 -olds between high and low -year quality subgroups. Even so, in the first year all three high quality subgroups experienced favorable PPVT impacts, with that result echoed two years later, at the end of first grade, for resource and interaction quality. Effect sizes for significant findings range from 0.17 to 0.40. Exhibit 5 -6. Estimated Impacts on PPVT Scores for Actual Subgroups, by Quality Measure, by Follow -Year -up Year, 4 -Old Cohort, Preferred Assumptions End of E nd of End of End of Kindergarten HS Year First Grade Third Grade 2004 2003 2005 2007 1 Control Group Average 290.3 331. 9 363. 7 405. (32.2) (28.7) (35.9) (39.1) (standard deviation) Resource a 0.0 -shows 0.0 No 0.0 0 .0 3.3 6.2 * 1.4 7 .7 *** H igh .2 11. 1 2 0 .1 8.5 Low -Low Difference -2.3 -9 .6 7 .6 High 1 .1 nteraction I 0.0 -shows 0.0 No 0.0 0.0 3.2 7.4 ** 4.7 5 .4 * H igh .1 3.8 2.0 4 1 .9 ow L 1 -Low Difference 2 .7 3.5 .3 1 .4 High xposure E 0 -shows 0.0 0.0 .0 0 .0 No High 14.5 ** 8.9 8 .7 -2 .5 Low 4.2 3 .1 4 .3 4 .5 High 10.3 5 .8 4 .4 -7.0 -Low Difference Notes: -adjusted difference (impact) between the treatment and control groups in the The impact is the regression -low differences appear not to sum because of rounding. number of points on each outcome measure. Some high a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p<0.10 Impacts on WJ3 Letter -Word -7) differ between high and low quality subgroups for resource (Exhibit 5 Contrary to the 3- -olds more favorable -olds, for 4- year and exposure quality in the Head Start year. year impacts on Letter -Word scores occurred when children received low resource quality and high exposure to academic activities . A greater impact from high academic exposure also appears at the end of kindergarten. The findings here for individual subgroups are the most striking in magnitude among all results for both cohorts, ranging from 0.73 to 1.03 in effect size for statistically significant cases. Almost none of the findings for WJ3 Applied Problems are noteworthy for the 4- year -old cohort (Exhibit 5-8). However, a statistically significant favorable effect for low resource quality occurs in kindergarten, unfavorable effect for high resource quality in an effect that is statistically significantly different from an that year. No meaningful impacts by quality level occur for the 4 -year -old cohort on the two socio- emotional outcomes examined—social competence (Exhibit 5 -9) and problem behaviors (Exhibit 5-10). The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 19

24 Section 5: Findings Letter Exhibit 5 -Word Scores for Actual Subgroups, by Quality -7. Estimated Impacts on WJ3 -Old Cohort, Preferred Assumptions Measure, by Follow -Year -up Year, 4 End of End of End of End of Kindergarten HS Year Third Grade First Grade 2004 2003 2005 2007 325.5 Control Group Average 378.2 480.6 433.3 (36.5) (31.6) (28.5) (28.7) deviation) (standard Resource a -shows 0.0 No 0.0 0.0 0.0 2.0 -4.3 2.1 0.1 High 23.1 13.2 6.0 4.7 *** Low -21.1 ** -17.5 High -5.9 -2.6 -Low Difference raction Inte No -shows 0.0 0.0 0.0 0.0 High 1.8 1.7 1.6 8.8 ** 7.0 -0.8 3.6 7.6 Low 1.8 2.6 -1.9 High -Low Difference -6.0 Exposure 0.0 -shows 0.0 0.0 0.0 No 29.4 *** 23.0 * 12.1 9.9 High Low -6.4 -1.2 0.6 1.6 *** -Low Difference * 13.3 9.3 27.8 High 29.4 The impact is the regression -adjusted difference (impact) between the treatment and control groups in the Notes: number of points on each outcome measure. Some high -low differences appear not to sum because of rounding. a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. ** statistically significant: p<0.05; * statistically significant: p<0.10 *** statistically significant: p<0.01; -8. Estimated Impacts on WJ3 Applied Problems Scores for Actual Subgroups, by Quality Exhibit 5 -Year -up Year, 4 Measure, by Follow -Old Cohort, Preferred Assumptions End of E nd of End of End of Kindergarten HS Year First Grade Third Grade 2004 2003 2005 2007 Control Group Average 397.5 426. 3 7 454. 1 487. (21.9) (19.4) (24.0) (19.8) (standard deviation) Resource a 0.0 -shows 0.0 0 .0 No 0.0 0.1 High .4 * 0.1 -0 .2 -5 -1 13.3 .0 1 * 2.9 13. L ow -13.1 -18. 6 ** -Low Difference 0 .8 High -2.7 Interaction -s hows 0.0 No .0 0 .0 0 .0 0 .9 -1 5.3 0 .5 0 .1 H igh -0.3 -1 .0 3 .4 4 .4 L ow -Low Difference 5.6 1 .5 High -3 .3 -6 .3 Exposure -shows 0.0 No 0.0 0.0 0 .0 High -5.0 7 .8 5 .0 -1 .4 Low -2 .3 7.2 -0 .3 -0 .1 High -Low Difference -12.1 10. 1 5 .3 -1.3 Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the -low differences appear not to sum because of rounding. number of points on each outcome measure. Some high a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p< 0.10 The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 20

25 Section 5: Findings -9. Estimated Impacts on Social Competence for Actual Subgroups, by Quality Measure, Exhibit 5 -Year -Old Cohort, Preferred Assumptions by Follow -up Year, 4 End of End of End of End of HS Year Kindergarten First Grade Third Grade 2004 2003 2005 2007 12.5 12.6 12.1 12.6 Control Group Average (1.6) (1.5) (1.9) (1.8) (standard deviation) Resource a 0.0 0.0 0.0 No 0.0 -shows 0.0 High 0.1 0.1 -0.4 -0.3 0.0 -0.2 0.3 Low 0.1 -0.7 0.4 0.4 -L High ow Difference raction Inte -shows 0.0 0.0 0.0 0.0 No 0.0 High 0.0 0.0 0.3 -0.4 -0.4 0.0 -0.4 Low 0.7 0.5 0.1 0.4 -L ow Difference High Exposure 0.0 0.0 0.0 0.0 No -shows -0.6 0.4 -0.3 -0.9 High 0.0 0.0 0.2 0.1 Low -Low Difference -0.2 0.5 -0.8 -1.0 High The impact is the regression -adjusted difference (impact) between the treatment and control groups in the Notes: -low differences appear not to sum because of rounding. number of points on each outcome measure. Some high a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p<0.10 *** Exhibit 5 -10. Estimated Impacts on Problem Behaviors for Actual Subgroups, by Quality Measure, -up Year, 4 -Year -Old Cohort, Preferred Assumptions by Follow End of E nd of End of End of Kindergarten HS Year First Grade Third Grade 2004 2003 2005 2007 Control Group Average 5.6 .2 5 .0 6 5 .1 (3.8) (3.3) (3.8) (4.2) (standard deviation) Resource a No -shows 0.0 0.0 0 .0 0.0 -0.6 0 .4 0 .1 0 .1 High -0 .4 0.3 -1 .0 -2 .2 * L ow -Low Difference -0.9 0 .8 1 .1 High 2 .3 nteraction I 0.0 -shows No 0 .0 0 .0 0.0 High -0.3 0 .2 -0 .5 -0 .7 Low -0 .2 0 .4 -0 .7 -0.3 -Low Difference 0.1 0 .5 -0 .9 High 0 .1 Exposure .0 No hows 0 -s 0 .0 0 .0 0.0 .8 -1 0.0 -0 .7 -1 .3 H igh -0.4 0 .4 .3 0 .1 -0 L ow .5 -1 0.4 -1 .1 -1 .5 H -Low Difference igh The impact is the regression -adjusted difference (impact) between the treatment and control groups in the Notes: -low differences appear not to sum because of rounding. number of points on each outcome measure. Some high a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. statistically significant: p<0.10 *** statistically significant: p<0.01; ** statistically significant: p<0.05; * The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 21

26 Section 5: Findings -year -old cohort as a whole from Exhibit 5-6 through 5-10 combined, Appraising the evidence for the 4 -old cohort; indeed, fewer -year there is less evidence of impact differentials by quality level than for the 3 8 Where the difference in impact is instances than expected by chance when no true differences e xist. statistically significant, it is as likely to favor low quality in the area of resources and high quality in terms of greater exposure to academic activities. A developmentally -based explanation for these findings— year -old cohort from the 3- including why they would differ for the 4- -old cohort—is unclear. year As elaborated in the Appendix, we estimated these impacts as of the end of the first year of follow -up One of these alternative assumptions produces impact estimates using two sets of alternative assumptions. -year -old cohort) as those reported in the first that are substantially the same (identically so for the 4 columns of Exhibits 5-1 through 5-10, which are based on our preferred assumptions. The other alterative assumptions produce impact estimates that, for the 3- year -old cohort, strengthen somewhat the evidence that high resource and interaction quality Head Start can produce better short-run cognitive impacts than low qualit y Head Start, and that less exposure to academic activities can produce more favorable behavioral impacts than greater such exposure. However, these more favorable alternative results may be h quality Head Start participants are twice an artifact of including an assumption that impacts on actual hig as large as impacts on actual low quality Head Start participants (for those children predicted to be high 9 -old cohort—for potentially Something of the same pattern is evident in the 4- year quality participants). the same reason —but less noticeably. 8 of the h ypothesis that the high and low subgroup impacts differ from each other , six are expected With 60 tests .10 significance level. We observe that four to be statistically significant by chance alone at the 0 are. 9 Published standards in the scholarly literature indicate that where alternative assumptions are possible using the current technique and “the policy thrust of the findings varies across plausible scenarios...[the version of the ssumptions in the eyes of the researchers as findings to report]...should be based on the most plausible set of a declared prior to any analysis” (Bell & Peck, 2013). The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 22

27 Section 6: Conclusion Section 6: Conclusion This report examines the influence of Head Start quality on children’s selected developmental cognitive and social -emotional outcomes. Despite the importance to policy and practice of understandi ng the role of quality in influencing children’s developmental progress the HSIS has not previously sought to address it Among these challenges—each primarily because of the analytic challenges involved in doing so. addressed here—are: (1) defining the construct and potentially numerous dimensions of “quality” with validity and reliability from the study’s conceptually, (2) making “quality” as defined measurable data, and (3) determining impacts for varying levels of Head Start quality experience in the treatment group given that Head Start quality is undefined for the control group. We believe our methodology effectively addresses all of the challenges. That said, a point for future research is relevant: while we used expert panel input to determine the abs olute threshold for designating a child’s Head Start experience as “high” quality, in each of our three quality measures, other thresholds might be justified. Moreover, the 10 used in the HSIS’s early 2000’s were the best available at the time, but improvements in measures measuring quality have developed in the intervening decade, justifying alternative measurements of “quality” now than are possible with these data. Applying these analytic innovations to the experimental HSIS evaluation data, we find little evidence that Head Start’s impact varies systematically by the level of quality in the program for the available, limited quality measures. The frequency of statistically significant differences in impacts by quality levels is no greater than one would expect to observe by chance alone when no true differences exist. The one exception to this pattern is the discovery that, for 3 -year -olds, lower exposure to academic activities is associated with more favorable short -run impacts on social development. There is almost no indication that either high or low quality Head Start in any dimension leads to Head Start impacts that last into third grade for either age cohort, consistent with the overall findings of the Head Start Impact Study not disaggregated b y quality level. 10 New measures (Classroom Assessment Scoring System) and ELLCO (Early Langu age and , such as the CLASS Literacy Classroom Observation), were not available at the time of the 2002 HSIS data collection. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 23

28 Works Cited Works Cited Bell, Stephen H. & Laura R. Peck. (2013). “Using Symmetric Predication of Endogenous Subgroups for Causal Inferences about Program Effects under Robust Assumptions.” American Journal of Evaluation , 24(3), 413-426. DOI: 10.1177/1098214013489338 Bloom, Howard S. (1984). Accounting for No-shows in Experimental Evaluation Designs. Evaluation –246. DOI:10.1177/0193841X8400800205 Review , 8(2), 225 ool Experiences Play a Role in Hindering or Downer, Jason & Andrew Mashburn. (2013). “Do Sch Promoting the Persistence of Head Start Impacts on Cognitive and Social Outcomes during Elementary School?” Draft manuscript. Gibson, Christina M. (2003). “Privileging the Participant: The Importance of Subgroup Analysis in Social American Journal of Evaluation , 24(4), 443-469. DOI: Welfare Evaluations.” 10.1177/109821400302400403 Harvill, Eleanor & Stephen H. Bell. (2013). “On Overfitting in Experimental Analysis , Laura R. Peck Symmetrically Predicted Endogenous Subgroups from Randomized Experimental Samples: Part Three of a Method Note in Three Parts.” , 34(4), 545-566. DOI: American Journal of Evaluation 10.1177/1098214013503201 a R. Peck. (in progress). Harvill, Eleanor & Laur “Examining Prediction Quality Implications to Enhance the Social Impact Policy Pathfinder (SPI -Path).” Bethesda, MD: Abt Associates Inc. Unpublished manuscript. Kemple, James J., & Jason C. Snipes. (2000). Career Academies: Impacts on Students’ Engagement and Performance in High School. New York, NY: Manpower Demonstration Research Corporation. Mashburn, Andrew J., Robert C. Pianta, Bridget K. Hamre, Jason T. Downer, Oscar A. Barbarin, Donna Bryant, Margaret Burchinal, Diane (2008). “Measures of Classroom M. Early, & Carollee Howes. Quality in Prekindergarten and Children’s Development of Academic, Language, and Social Skills.” Child Development , 79(3), 732-749. (2003). “Subgroup Analysis in Social Experiments: Measuring Program Impacts Based on Peck, Laura R. -Treatment Choice.” American Journal of Evaluation , 24(2), 157-187. DOI: 10.1016/S1098- Post 2140(03)00031-6 Peck, Laura R. (2013). “On Analysis of Symmetrically -Predicted Endogenous Subgroups: Part One of a Method Note in Three Parts.” American Journal of Evaluation , 34(2), 225-236. DOI: 10.1177/1098214013481666 Puma, Michael, Stephen Bell, Ronna Cook, Camilla Heid & Michael Lopez, et al. (2005). Head Start Impact Study: First Year Findings. Washington, DC: Department of H ealth and Human Services, Administration for Children and Families. Puma, Michael, Stephen Bell, Ronna Cook, Camilla Heid, et al. (2010a). Head Start Impact Study Final Report. Washington, DC: U.S. Department for Health and Human Services, Administration f or Children and Families. Puma, Michael, Stephen Bell, Ronna Cook, Camilla Heid, et al. (2010b). Head Start Impact Study Technical Report. Washington, DC: U.S. Department for Health and Human Services, Administration for Children and Families. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 24

29 Works Cited Puma, Mike, Stephen Bell, Ronna Cook, Camilla Heid, Pam Broene, Frank Jenkins, Andrew Mashburn, and Jason Downer. (2012). Third Grade Follow-up to the Head Start Impact Study: Final Report. Washington, DC: Office of Planning, Research and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services. OPRE Report 2012-45. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 25

30 Appendix A Added Technical Details & Discussion o f Alternative Appendix: Assumptions As discussed in the main body of this report, the technique we use for analyzing the influence of Head Start quality on children’s developmental outcomes identifies predicted high quality sample members from the treatment group and the control group in symmetric fashion, and then estimates impacts on that subpopulation as one would in any experimental subgroup analysis. The symmetry of the selection procedure ensures that equivalent subgroups are compared and guarantees that the impact estimates are mpared to conventional propensity free from differential selection bias or any other sources of bias. Co score matching, for example, the symmetric selection of treatment and control subgroup members within experimental data ensures full internal validity —unbiasedness of the impact estimates generated for the subgroups examined. However, the subgroup for which the methodology produces unbiased impact predicted estimates—children with the highest probabilities of being in high quality Head Start programs—is not necessarily the subgroup of interest—children who experien ce high quality actually The predictive model, while symmetric for both treatment and control groups, is imperfect for Head Start. both groups, potentially reducing the relevance (i.e., external validity or generalizability) of the findings, ely convert the results so that they represent actual rather than predicted which is why we ultimat subgroup members. The body of this report discusses the five steps involved in carrying out this subgroup analysis of the effects of Head Start quality on children’s outcomes. This Appendix provides some additional details about the results of our analytic process—including the correct prediction rate and the predicted subgroups’ estimated impacts—and then elaborates on two possible alternative sets of assumptions, providing and ana lyzing results. Results of Prediction Process As noted in the text, the analysis starts by predicting which individuals would not participate in Head Start or would experience low or high quality Head Start. If there were perfect prediction, then the ate conversion step would be unnecessary. But, our prediction is not perfect. It is, however, better ultim than random, and so we use this observation to justify using this approach. year -old and 4- As explained in Section 4, ten random subsets of the combined 3- -old treatment year groups were used to develop a model predicting membership in the non-participant, low quality, and high quality subgroups. Because we observe both the predicted and actual quality measures within the treatment group, we can assess the p redictive accuracy of the model. The following Exhibits present information on the accurate proportions of the predicted subgroups. We report this information for each of the three measures of quality that we use, following with an exhibit that presents th e notation that identifies each of these elements for its use in our subsequent conversion process. -1 cross-tabulates predicted quality subgroup membership in its rows by actual quality subgroup Exhibit A measurement in its columns. The following percenta ges appear: • Row percentages that allocate members of a given predicted quality subgroup across actual quality categories (e.g., the top left entry in the exhibit indicates that 34.7 percent of predicted non- participants are actual non -participants); The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 26

31 Appendix A Column percentages that allocate members of a given actual quality subgroup across predicted quality • categories (e.g., the top left bracketed entry indicates that 21.6 percent of actual non -participants are predicted as non- participants). Exhibit A -1. Predicted by Actual Resource Quality Actual Resource Quality Predicted -show Low No High Overall Resource Quality -show 23.0 11.8 No 42.3 34.7 [9.1] [21.6] [10.3] 18.5 19.3 28.0 Low 62.3 [9.9] [65.9] [27.2] 10.4 High 60.2 16.2 73.4 [23.8] [51.3] [81.0] 26.4 54.6 100.0 19.0 Overall Diagonal elements in bold represent the correct placement of predicted within actual groups. Notes: The first numbers in each cell represent the proportion of the predicted that are in the actual group (the “row” percent). The numbers in brackets represent the proportion of the actual that are in the predicted group (the “column” percent). n=2,245 The numbers in brackets along the diagonal of the exhibit show that our model correctly predicted 21.6 shows, 65.9 percent of low quality participants, and 81.0 percent of high quality percent of no- The “Overall” rows and columns indicate that the predicted distribution of cases among the participants. three groups (12, 28 and 60 percent for each of the non-participant, low quality and high quality groups, respectively), is not wildly different from the actual distribution (of 19, 26 and 54 percent, respectively). These are unweighted numbers and reflect only the process of our analyzing the subset of cases that are relev ant for this analysis and should not be construed as being nationally representative as weighted data would be. As Exhibits A-2 and A-3 show, the correct prediction rate for the high quality subgroup is 82.8 and 51.4 percent, respectively, for each of the interactions and exposure measures. Our correct placement rates are lower for the interactions and exposure measures than for the resources measure, but overall we conclude that the correct placement rates are acceptable for advancing this method of analyz ing the effects of Head Start quality. The overall hit rate that Another way to quantify the correct placement aggregates across the three groups. we achieve is 66 percent for resource quality, 64 percent for interactions quality, and 63 percent for exposu In general in applying this analytic method, this rate should reflect that there is some re quality. useful prediction taking place: that is, the prediction should be better than a random sorting of the data into three groups, and ideally meaningfully bett er to instill confidence that the building blocks of the analysis—the experimentally-based impacts on predicted subgroups—are a reasonable starting point. We recognize that these terms—“meaningfully better” or “reasonable”—are subjective. In this case we reach the conclusion that the success of the prediction process is sufficient to warrant proceeding with the analysis. Current research is exploring how better to operationalize these constructs of “better” and “reasonable” to generate clear prescription fo r future applications (Harvill & Peck, in progress). The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 27

32 Appendix A -2. Predicted by Actual Interactions Quality Exhibit A Actual Interactions Quality Predicted No Low High -show Overall Interactions Quality 47.2 15.7 -show No 11.1 37.1 [8.5] [21.6] [8.6] Low 17.8 29.1 13.5 57.4 [49.7] [8.6] [12.7] High 71.2 12.1 16.6 70.3 [41.9] [82.8] [65.1] 100.0 19.0 20.5 60.5 Total Notes: Diagonal elements in bold represent the correct placement of predicted within actual groups. The first proportion of the predicted that are in the actual group (the “row” percent). The numbers in each cell represent the numbers in brackets represent the proportion of the actual that are in the predicted group (the “column” percent). n=2,245 -3. Predicted by Actual Exposure Quality Exhibit A Actual Exposure Quality Predicted -show Low No High Overall Exposure Quality 11.6 14.6 47.0 -show No 38.3 [8.0] [22.7] [9.2] 67.6 Low 12.8 17.5 69.7 [40.6] [60.4] [79.7] 15.9 High 31.4 20.8 52.7 [11.0] [16.9] [51.4] 100.0 59.1 21.3 19.6 Total Notes: The first Diagonal elements in bold represent the correct placement of predicted within actual groups. numbers in each cell represent the proportion of the predicted that are in the actual group (the “row” percent). The numbers in brackets represent the proportion of the actual that are in the predicted group (the “column” percent). n=2,178 In addition to these placement percentages that result from our analysis, we report here the notation that we use in representing the conversion of results from p redicted to actual subgroups. Readers should be able to use Exhibit A -4 to identify the elements from Exhibits A-1 through A- 3 that are needed as inputs into the conversion formulae to compute the conversion factors themselves. Exhibit A -4. Predicted by A ctual Quality, Notational Information for Conversion Actual Quality No -show Low High Predicted Quality g s w No -show N N N (1-r) (1-p) g s Low w L L L q s High w g H H H Notes: The first symbol in each cell represents the proportion of the predicted that are in the actual group (the “row” percent). The symbol in parentheses represents the proportion of the actual that are in the predicted group (the “column” percent). Estimated Impacts on Predicted Subgroups As noted in the report, we follow the HSIS existing practice to pool data and compute subgroups’ impact estimates for our analysis of the effects of Head Start on these quality subgroups of interest. Though not The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 28

33 Appendix A of primary interest to most readers, the two exhibits below present the estimated impacts on each of the five outcomes under examination, by year, across the three quality measures, for each cohort. In essence, these are the “building blocks” for the estimates of impact on actual subgroups in a later subsection, using , N H given elsewhere. The estimates in the exhibits reflect comparison of L the formulas for , and -selected subsamples of the treatment and control groups derived from baseline symmetrically characteristics; hence, they are purely experimental and free from selection and other sources of bias. However, they do not fully reflect the non-participant and low and high quality subgroups their labels imply, because predicted members of a subgroup often are not actual members of that subgroup. Exhibit A -5. Estimated Impacts on PPVT Scores for Predicted Subgroups, by Quality Measure, by -Year -Old Cohort, Preferred Assumptions -up Year, 3 Follow End of HS E nd of HS End of End of First End of Third Year 2 Year 1 Kindergarten Grade Grade 2004 2003 2005 2006 2008 Resource No hows 13.4 * -6.4 3 .1 3 .8 3 .0 -s 1.5 4.7 ** 0.3 *** 8.2 2 .6 H igh .0 1.1 -0.6 -0 .7 0 1.8 L ow Interaction -s hows 15.3 ** -8.3 1.6 2.1 No 0 .3 5.9 4.1 * 0.3 2.6 ** 2 .5 High 6.2 0.2 -0.3 -1.2 Low 0 .7 E xposure -shows 9.6 .8 -6.2 5.1 -1 No 0 .3 *** 3.1 -2.2 14.4 3.4 -1 .2 H igh Low 4.8 ** 3.1 -0 .1 2.0 3.1 * Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p <0.10 *** Word Scores for Predicted Subgroups, by Quality Exhibit A -6. Estimated Impacts on WJ3 Letter- -up Year, 3 -Year -Old Cohort, Preferred Assumptions Measure, by Follow End of HS End of HS End of End of First End of Third Year 2 Year 1 Kindergarten Grade Grade 2004 2003 2005 2006 2008 Resource No 4.9 -0.4 3.6 2.4 2.1 -shows -0.1 3.9 *** 0.6 2.1 8.6 High 3.9 0.7 -2.0 -1.2 -2.7 Low Interaction No -shows 2.0 -6.8 ** 1.2 3.1 -0.4 1.9 5.3 ** 0.5 ** 1.3 8.5 High -2.8 -1.7 -3.7 -4.2 5.3 Low Exposure No -shows 0.1 -2.7 -5.2 -4.7 -5.8 ** High ** 0.5 -9.9 9.1 -2.9 -3.2 2.9 7.9 *** 4.2 ** 3.2 1.9 Low Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p <0.10 The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 29

34 Appendix A -7. Estimated Impacts on WJ3 Applied Problems Scores for Predicted Subgroups, by Exhibit A -Old Cohort, Preferred Assumptions Quality Measure, by Follow-up Year, 3- Year End of HS E nd of HS End of End of First End of Third Year 2 Year 1 Kindergarten Grade Grade 2004 2003 2005 2006 2008 Resource 3.9 4.2 2.0 4.8 No -2 .0 -shows ** 2.0 -0.4 7.9 1 .2 1.2 igh H .2 -1.7 Low -3 .9 0.9 -1 -0.8 Interaction -shows No 5.9 1.5 6.4 -1 .1 2.7 High 6.9 ** 2.0 -0.5 1.2 1 .5 -3.9 -4 .5 0 .2 -3 .4 Low 2.2 xposure E -shows 3.7 0.9 1.7 3.0 -4 No .9 High 9.9 ** 3.3 -3.3 0 .3 0 .3 1 * Low -1.2 1.4 4.6 .1 0.5 Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p <0.10 Exhibit A -8. Estimated Impacts on Social Competence for Predicted Subgroups, by Quality -up Year, 3 -Year Measure, by Follow -Old Cohort, Preferred Assumptions End of HS End of HS End of End of First End of Third Year 2 Year 1 Kindergarten Grade Grade 2004 2003 2005 2006 2008 Resource -shows 0.0 No 0.2 0.2 -0.1 0.2 ** 0.2 * 0.0 0.1 0.4 0.2 High 0.1 Low 0.3 -0.2 0.0 0.0 Interaction No -shows 0.0 0.0 0.4 0.1 0.0 High 0.3 ** 0.2 0.1 0.3 ** 0.0 0.3 -0.1 Low 0.1 -0.2 0.0 Exposure No -shows -0.3 0.2 0.2 0.0 -0.1 0.2 0.3 -0.2 -0.4 0.5 ** High Low 0.0 0.1 0.3 ** 0.1 0.3 * Notes: -adjusted difference (impact) between the treatment and control groups in the The impact is the regression number of points on each outcome measure. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p <0.10 The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 30

35 Appendix A -9. Estimated Impacts on Problem Behaviors for Predicted Subgroups, by Quality Exhibit A -Year -Old Cohort, Preferred Assumptions Measure, by Follow -up Year, 3 End of HS E nd of HS End of End of First End of Third Year 2 Year 1 Kindergarten Grade Grade 2004 2003 2005 2006 2008 Resource -0.6 0 .1 No .1 -0.6 0 .6 -shows -0 -0.3 -1.2 *** -0.7 * -0.6 ** -0 .2 igh H 0.0 0.2 0.5 0.4 0 .4 Low nteraction I -shows 0 .1 -0.2 -0 .9 0 .2 No -0.5 -0.6 * -0.4 0.1 *** 0 .1 High -0.9 .0 -0.9 0 .3 0.2 -0.3 0 L ow xposure E -shows -0.6 0 .2 -0.2 * -0.5 No 0 .8 High -0.1 0.1 0.8 0 .5 0 .1 Low -1.0 *** -0.6 -0 .5 -0.2 -0 .1 Notes: -adjusted difference (impact) between the treatment and control groups in the The impact is the regression number of points on each outcome measure. *** statistically significant: p<0.05; * statistically significant: p <0.10 ** statistically significant: p<0.01; Exhibit A -10. Estimated Impacts on PPVT Scores for Predicted Subgroups, by Quality Measure, by Follow -up Year, 4 -Year -Old Cohort, Preferred Assumptions End of End of End of End of Kindergarten HS Year First Grade Third Grade 2004 2003 2005 2007 Resource 1.3 No -4.6 -shows 3.2 -3.6 5.2 ** 3.2 5.5 *** 3.3 High 2.8 8.2 1.5 * 5.8 Low * Interaction -shows 3.8 No 2.5 7.6 * -2.0 High *** 3.5 3.8 * 3.0 * 6.1 3.4 2.5 2.5 Low 2.3 Exposure 3.2 No -2.2 -shows 3.8 0.0 0.4 4.2 6.3 5.9 High 3.1 *** 3.9 * 4.0 Low 5.9 * Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p<0.10 The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 31

36 Appendix A -11. Estimated Impacts on WJ3 Letter -Word Scores for Predicted Subgroups, by Quality Exhibit A -Year -up Year, 4 -Old Cohort, Preferred Assumptions Measure, by Follow End of End of End of End of Kindergarten HS Year First Grade Third Grade 2004 2003 2005 2007 Resource 2.9 No -6.7 -3.6 -shows -2.5 * -1.2 1.8 2.7 5.5 High 14.4 * 7.9 4.9 4.1 *** Low Interaction -shows 4.7 1.3 No -3.7 -1.0 High 8.0 ** 1.1 2.3 2.4 7.5 0.0 3.2 5.2 Low Exposure -shows 7.5 2.1 -5.4 -2.3 No 9.9 * 6.9 5.9 7.7 High ** -1.7 1.6 2.2 7.4 Low The impact is the regression Notes: -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. statistically significant: p< *** ** statistically significant: p<0.05; * statistically significant: p<0.10 0.01; Exhibit A -12. Estimated Impacts on WJ3 Applied Problems Scores for Predicted Subgroups, by Quality Measure, by Follow-up Year, 4- Year -Old Cohort, Preferred Assumptions End of End of End of End of Kindergarten HS Year First Grade Third Grade 2004 2003 2005 2007 Resource -shows 3.7 -4.2 No 0.5 -1.7 * 1.9 -1.9 * 0.5 -0.1 High 7.8 * 8.0 1.9 -0.5 Low Interaction No -shows 6.8 * -2.8 3.0 -2.3 0.6 0.2 -0.5 3.2 High 0.0 1.7 2.3 Low 2.7 Exposure 1.1 -shows No * -2.2 -1.8 7.1 High -1.6 3.7 2.5 -0.6 -0.3 0.4 3.9 0.0 Low Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p <0.10 The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 32

37 Appendix A -13. Estimated Impacts on Social Competence for Predicted Subgroups, by Quality Exhibit A -Year -Old Cohort, Preferred Assumptions Measure, by Follow -up Year, 4 End of E nd of End of End of Kindergarten HS Year First Grade Third Grade 2004 2003 2005 2007 Resource 0.0 -0.1 -0.1 No .1 -shows 0 0 .1 0.1 -0 .2 High 0.0 .1 -0.2 0.1 -0 .1 0 L ow Interaction -shows 0.1 -0.1 No .2 0 .1 -0 .1 0.1 0.2 0 .1 -0 H igh -0.7 ** -0.1 0.0 -0 .3 L ow Exposure -shows 0.2 0.2 0 .1 0 .4 No High 0 .2 -0.3 -0 .5 0.2 0.0 0.1 Low -0 .1 -0.2 Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. statistically significant: p<0.01; ** *** * statistically significant: p<0.10 statistically significant: p<0.05; Exhibit A -14. Estimated Impacts on Problem Behaviors for Predicted Subgroups, by Quality Measure, by Follow -up Year, 4 -Old Cohort, Preferred Assumptions -Year End of End of End of End of Kindergarten HS Year Third Grade First Grade 2004 2003 2007 2005 Resource -shows 1.0 * 0.6 0.5 -0.6 No 0.2 -0.1 -0.2 0.0 High * -0.3 -0.7 -1.3 -0.4 Low Interaction 0.9 -shows No ** 0.1 -0.3 1.1 High -0.1 0.1 -0.4 -0.6 0.0 -0.2 -0.1 -0.6 Low Exposure -shows 0.8 0.6 0.3 No -0.8 -0.7 -0.3 -0.7 -1.0 High -0.4 0.0 0.1 -0.1 Low The impact is the regression -adjusted difference (impact) between the treatment and control groups in the Notes: number of points on each outcome measure. statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p <0.10 *** It is with the information embedded in Exhibits A-5 through A-14 that we impose our conversion factors 11 We elaborate in order to “reallocate” the estimated impacts from the predicted to the actual subgroups. native assumptions for a single year of outcomes, the first year next on the implications of imposing alter of follow-up. 11 Some may find it interesting to note that there is about the same frequency of statistically significant effects : 35 of the 225 tests for the 3 - observed in the predicted subgroups as reported for the converted, actual impacts -old cohort and 23 of the 180 tests for the 4- year -old cohort are statistically significant. year The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 33

38 Appendix A Discussion of Alternative Assumptions We chose the preferred set of assumptions discussed in the main body of this report because we believe Nevertheless, we recognize the possibility that results on the impact of actual Head them to be reasonable. 12 To assess the robustness of the results to the Start quality level may vary under other assumptions. assumptions, we apply two alternative sets of assumptions and re-compute Step 5 of the procedure. In both instances, we retain the first three assumptions from above as non- controversial: shows is zero. = 0 – The impact on predicted no-shows who are actual no- (1’) (1”) N N N (2’) (2”) = 0 – The impact on predicted low quality participa nts who are actual no -shows is zero. L (3’) (3”) N = 0 – The impact on predicted high quality participants who are actual no- shows is zero. H The fourth assumption from before is also retained: (4’) (4”) L = L – The impacts on low quality participants are the same for children predicted to be high L H quality participants and children predicted to be low quality participants. This leaves two assumptions to reconsider. In the first alternative scenario, our goal is to adopt view but sufficiently different from the original final two assumptions that are reasonable in our contrast in the sensitivity analysis: assumptions to provide a strong – The impacts on low quality participants are the same for children predicted to be low = L L (5’) N L children predicted to be no-shows. quality participants and This assumption, in conjunction with (4’), postulates that low quality Head Start has the same impact on -shows, predicted low three types of children with different propensities to participate in it: predicted no quality, and predicted high quality. It seems more reasonable to suppose a relatively weak version of the program has a uniform (and possibly smaller) impact of this sort than that high quality Head Start does. Indeed, with the original assumption (5) replaced by (5’), no assumptions about homogeneous impacts from high quality Head Start participation are made in this scenario. – The impact on predicted high quality participants who are actual high quality participants = 2 L (6’) H H H is two times the impact on predicted high quality participants who are actual low quality participants. Assuming a magnitude relationship of this sort, as opposed to strict equality, puts a new twist into the first alternative scenario. It is in fact no more exacting an assumption than that of pure equality made for the low quality participants, and it is the least extreme simple multiplicative relationship. It involves children of similar background characteristics (for the characteristics that predict high quality participation) but fo r otherwise similar children we assume here that actual high quality services have a larger impact than actual low quality services. 12 2013) consider how the validity of the assumptions can be improved through strategic choices of Bell and Peck ( This work argue s that the best choice of background variables to include in the quality prediction model. predictors are those exogenous variables that most strongly predict membership in the endogenous subgroup of interest (here, either high quality Head Start participation or low quality participation) but that are otherwise unrelated to program impact magnitude. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 34

39 Appendix A Inserting these new assumptions into the derivation results in the following formulas for impacts on the L ) and the H ): actual low quality actual high quality subgroup ( subgroup ( 퐼 퐻 퐿 = 푤 + 2 푔 퐻 퐻 ) ( ) ( 푝−푞 푔 푞 2 − −푞푤 −푝 (1 푔 푔 ) 푔 1 −푝 푤 푔 푁 퐿 푁 퐿 푁 퐿 퐻 + � 퐼 �퐼 퐼 + = 퐻 퐿 푁 퐻 푔 ) 푔 푔 ( 푤 + 2 푔 푔 퐻 퐿 퐿 푁 푁 퐻 where q cipants. is the proportion of high quality participants predicted as low quality parti = 0. As before, N The final alternative scenario examines the sensitivity of the findings in the range between the other two scenarios. Here, we return to the original fifth assumption: (5”) H = – The impacts on high quality participants are the same for children predicted to be high H L H quality participants and children predicted to be low quality participants, and we add to it an assumption from the second scenario: predicted to be low – The impacts on low quality participants are the same for children = L L (6”) N L quality participants and children predicted to be no-shows. This set of assumptions leads to the following formulas for impacts on the actual low quality ) and the actual high quality subgroup ( H ): subgroup ( L 푔 푔 퐿 퐻 퐼 퐼 + − 퐿 = � � � � 퐿 퐻 −푤 푤 푔 ) 푔 ( ( 푤 ) 푔 푔 −푤 퐿 퐻 퐿 퐿 퐻 퐻 퐿 퐻 ) ( ( ) ) ( ) ( −푝 1 ( 푔 푔 ( ) ) 푝푤 + 푝푤 푔 + 푔 1 푤 푤 −푝 1 −푝 퐿 푁 푁 퐻 푁 퐿 푁 퐻 퐻 = � + −� �퐼 � �퐼 �퐼 퐻 퐿 푁 푔 )( −푤 −푤 푔 푔 푔 푔 푔 )( ) 푔 ) 푤 ( 푤 ( 푁 퐿 퐻 퐻 퐻 퐿 푁 퐻 푁 퐿 퐿 As before, = 0. N Sensitivity to Assumptions -15 and A- 16 report the results from imposing the first set of alternative assumptions. Exhibits A Examining the differences and similarities between these results and those in the main text for the preferred set of assumptions, we make the following observations. For the 3- year -old cohort, the alternative assumptions strengthen the case that low quality Head Start can have favorable effects on child development in the first Head Start year. But they also add to the evidence of statistically significant differences in effectiveness by quality level favo ring high quality Head Start. The same general pattern is evident in the 4- year -old cohort under these assumptions, but less noticeably so. The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 35

40 Appendix A -15. Estimated Impacts for Actual Subgroups, by Quality Measure and Outcome, at the Exhibit A -Old Cohort, First Alternative Assumptions end of the first Head Start year (2003), 3- Year Outcomes Social Cognitive -Emotional Social Problem WJ3 Applied WJ3 Letter- Actual Quality P Competence Word Behaviors PVT Problems Resource a 0.0 0.0 0.0 0.0 -shows No 0.0 * 9.0 *** 8.4 *** 4.2 ** 0.1 -0.4 H igh -0.1 4.7 *** 4.0 ** -0.4 *** 4.4 Low *** 0.0 4.6 ** 3.7 * 0.1 0.2 igh -Low Difference H Interaction 0.0 0.0 0.0 0.0 No -shows 0 .0 -0.5 8.2 *** 7.3 ** 4.8 ** -0.0 *** H igh 3.9 *** 4.9 *** 2.7 * -0.0 -0.3 * Low 0 .0 -0.2 2.1 ** -Low Difference * 4.3 High 2.5 Exposure 0.0 0.0 0.0 0.0 No 0 .0 -shows 13.9 15.8 12. 0 -0 .1 -2 .6 *** High .2 * 5.1 ** 2.3 -0.0 -0 Low 5.7 .8 *** -Low Difference 8.2 10.8 High -0.1 -2 9.7 Notes: -adjusted difference (impact) between the treatment and control groups in the The impact is the regression number of points on each outcome measure. a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p<0.10 Exhibit A -16. Estimated Impacts for Actual Subgroups, by Quality Measure and Outcome, at the Old Cohort, First Alternative Assumptions end of the Head Start year (2003), 4-Year- Outcomes Social -Emotional Cognitive - Problem WJ3 Letter Social WJ3 Applied Actual Quality Word Competence Problems Behaviors PPVT Resource a 0.0 -shows 0.0 0.0 0.0 0.0 No 7.0 *** 10.4 *** 5.3 -0.1 0.1 High 0.0 ** * 1.5 3.1 0.3 * 3.6 Low -0.1 0.2 3.8 -Low Difference 7.3 *** High 3.4 Interaction No 0.0 -shows 0.0 0.0 0.0 0.0 3.2 5.9 7.2 *** *** -0.1 -0.2 High Low *** 4.7 ** 2.5 0.0 -0.2 3.3 -Low Difference 2.6 2.5 * 0.7 -0.1 0.0 High Exposure -shows 0.0 No 0.0 0.0 0.0 0.0 -0.6 -5.3 14.6 0.0 5.0 High 0.1 *** *** 12.0 6.5 -0.1 -0.2 Low High -Low Difference -1.5 -17.2 14.4 0.1 -0.5 Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p<0.10 The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 36

41 Appendix A imposing the second set of alternative assumptions in Exhibits A-17 and Next, we report the results from A-18. While we might characterize the second alternative assumptions as being a middle ground between our preferred assumptions and the first alternative assumptions, the results from imposing these assumptions are very similar to the results estimated by using our preferred assumptions—almost -old cohort. This is the case even though the conversion factors identical in the case of the 4 -year themselves appear to be quite different in th In turn, an interpretation of the differences eir structure. between the results generated by imposing the first and second sets of alternative assumptions is likewise identical to the discussion of the differences between the results generated by imposing the preferred assumptions and the first set of alternative assumptions. Overall, the meaning of our conclusions does not differ when we impose these alternative assumptions. Given that one alternative set of assumptions provides modestly different results and the other alternative set provides largely identical results, we feel justified basing our conclusions regarding the impact of being in Head Start by quality subgroup on the preferred assumptions. -17. Estimated Impacts for Actual Subgroups, Exhibit A by Quality Measure and Outcome, at the -Old Cohort, Second Alternative Assumptions Year end of the first Head Start year (2003), 3- Outcomes Cognitive Social -Emotional Social WJ3 Letter - Problem WJ3 Applied Actual Quality Competence Behaviors PPVT Problems Word Resource a 0.0 No -shows 0.0 0.0 0.0 0.0 ** *** 10.7 *** 11.5 10.0 ** -0.2 -1.0 High -4.3 0.3 0.4 4.2 Low 3.1 -1.3 8.4 6.4 14.2 -0.5 High -Low Difference Interaction -shows 0.0 0.0 No 0.0 0.0 0.0 8.0 9.7 ** 5.5 ** 0.0 -0.6 * High -0.5 -0.2 4.7 5.7 Low 10.8 -Low Difference -2.8 4.0 0.8 0.2 -0.1 High Exposure -shows 0.0 No 0.0 0.0 0.0 0.0 1.3 16.5 6.2 2.6 0.1 High -0.1 9.4 ** 6.2 -1.1 *** 5.7 Low -3.6 -Low Difference 10.9 -3.2 0.1 2.4 ** High Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p<0.10 The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 37

42 Appendix A -18. Estimated Impacts for Actual Subgroups, by Quality Measure and Outcome, at the Exhibit A end of the Head Start year (2003) 4-Year- Old Cohort, Second Alternative Assumptions Outcomes Social -Emotional Cognitive WJ3 Applied Problem WJ3 Letter - Social Actual Quality Word Competence Behaviors Problems PPVT Resource a 0.0 -shows 0.0 0.0 0.0 No 0.0 -0.5 6.0 * 1.3 -0.4 0.0 High 8.9 24.5 *** 14.3 -0.3 0.1 Low -14.7 -Low Difference -23.2 *** -3.0 0.4 -0.6 High Interaction 0.0 No 0.0 -shows 0.0 0.0 0.0 -0.2 7.6 *** 8.7 ** 5.3 0.0 High -0.4 -0.5 0.1 7.9 Low 3.5 -Low Difference 4.1 0.9 5.2 0.5 High 0.2 Exposure 0.0 0.0 No -shows 0.0 0.0 0.0 High 14.8 * 28.1 *** -6.9 -0.3 0.3 * 4.0 2.1 7.9 0.0 -0.5 Low -14.8 26.0 ** -0.2 -0.8 10.8 High -Low Difference Notes: The impact is the regression -adjusted difference (impact) between the treatment and control groups in the number of points on each outcome measure. a No statistical significance noted because no -show impact estimates are derived by assumption to be zero. *** statistically significant: p<0.01; ** statistically significant: p<0.05; * statistically significant: p<0.10 The Role Of Program Quality In Determining Head Start’s Impact On Child Development ▌ pg. 38

Related documents

Disparate Access

Disparate Access

Disparate Access February 2016 Head Start and CCDBG Data by Race and Ethnicity By: Stephanie Schmit and Christina Walker Introduction -quality child care and early education can build a High strong fo...

More info »