1 The Multigroup Entropy Index (Also Known as Theil’s H or the Information Theory Index) 1 John Iceland University of Maryland December 2004 1 ontract to the U.S. Census Bureau. These indexes were prepared under c 1

2 Table of Contents ... 3 Summary ... Data Source ... ... 3 ... 4 Race and Ethnicity ... Geographic Areas... ... 5 Residential Pattern Measures ... ... 6 Dual-Group Entropy Indexes ... ... 9 ... 9 References... 2

3 Summary This website contains three sets of re sidential-pattern indi cators for 1980, 1990, and 2000. 1. The highlighted measure is the “multigroup entropy index,” which is also known as the multigroup version of Theil’s H or the multig roup information theory index. This is a measure of “evenness.” 2. “Diversity” scores are also available; thes e are used in the calculation of the multigroup entropy index. A diversity score measures the ex tent to which several groups are present in a metropolitan area, regardless of th eir distribution acro ss census tracts. 3. Dual-group entropy indexes are included here , where the reference group consists of all 2 Two-group entropy indexe s are computed for people not of the main group in question. Non-Hispanic Whites, Non-Hispanic African Americans, Non-Hispanic Asians and Pacific Islanders, Non-Hispanic American I ndians and Alaska Nativ es, Non-Hispanics of other races, and Hispanics. Data Source e 1980, 1990, and 2000 decennial censuses (the These indexes are based on data from th 100 percent data). The main data issues involve d in calculating racial and ethnic residential patterns revolve around the definitio n of racial and ethnic categor ies, geographic boundaries, and residential-pattern measures. 2 Dual-group entropy indexes were also included in the 2002 repo rt by Iceland, Weinberg, and Steinmetz. In that report, the entropy index indicated the segregation of each of several groups (Blacks, Hispanics, Asians and Pacific ka Natives) from non-Hispanic Whites. Islanders, and American Indians and Alas 3

4 Race and Ethnicity t (OMB) issued its Statistical Policy In 1977, the Office of Management and Budge deral data collection on race and ethnicity to Directive 15, which provided the framework for fe federal agencies, including the Census Bureau for the 1980 decennial census. The OMB directed agencies to focus on data collection for four racial groups – White, Ne gro or Black, American Indian, Eskimo, or Aleut; and As ian or Pacific Islander – and one ethnicity – Hispanic, Latino, or censuses asked individua Spanish origin. The questions on the 1980 and 1990 ls to self-identify 3 with one of these four racial groups a nd whether they were Hispanic or not. After much research and public comment in the 1990s, the OMB revised the Nation’s racial classification to include five categories – White, Black or African American, American Indian or Alaska Native, Asian, and Native Hawaiian or other Pacific Islander. An additional major change was to permit the self-identification of individuals as “one or more races.” While a d already been doing so on previous census forms, this new small fraction of the population ha ble in data collection activities. directive made this practice permissi This change naturally challenges researchers to determine the best way to present historically-compatible data. To facilitate comp arisons across time, minority race/ethnicity definitions that could be rather closely reproduce d in the three different decades were used, and which closely approximate 1990 census categories. Six mutually exclusive and exhaustive categories were constructed: Non-Hispanic Wh ites, Non-Hispanic African Americans, Non- Hispanic Asians and Pacific Islanders, Non-Hi spanic American Indians and Alaska Natives, Non-Hispanics of other races, and Hispanics. Having mutually exclusive and exhaustive 3 The Population Censuses have a special disp ensation from OMB to allow individuals to designate “Some Other Race” rather than one of those specifically listed. Because of Congressional directives, the decennial census questions also ask about specific Asian and Pacific Islander races (e.g., Chinese). 4

5 categories is essential for cons tructing a single multiracial inde x. For Census 2000, this involved her Pacific Islander groups. In addition, non- combining the Asian and Native Hawaiian or ot ing of two or more races in 2000 were also Hispanic people who identified themselves as be categorized as “Other” since people could not mark more than one race in 1980 or 1990. Census 2000 figures indicate that 4.6 million, or 1.6 percen t of the population, designated themselves as multiracial (and non-Hispanic). Because of the rela tively small number of multiracial people, the 4 People who in Census 2000 on segregation is small. impact of the creation of this category ch, regardless of thei r response to the race reported being Hispanic were categorized as su question. Geographic Areas Residential pattern indexes ofte n measure the distribution of different groups across units within larger areas. Thus, to measure residentia l patterns, one has to de fine both the appropriate larger area and its component parts. The larger areas here are repres ented by metropolitan areas, as these are reasonable approximations of housi ng markets. These are operationalized by using an statistical areas, referred to hereafter as metropolitan independent and primary metropolit areas, or MAs. To facilitate comparisons over time, the definition of MA boundaries in effect nagement and Budget on June 30, 1999) were during Census 2000 (issued by the Office of Ma 4 As a way of testing the sensitivity of the inform ation theory index calcula ted here to differences in race categories, an altern with the Census 2000 data was ative race classification scheme tested: instead of the six categories described ab ove, eight were constructed. The two extra were created by splitting the Asian and Pacific Island er category into two (Asians, and Native Hawaiians and Other Pacific Islanders), and sp litting the non-Hispanic Other category into non- Hispanic “Other,” and non-Hispanics who mark ed two or more races. The mean entropy index for all 331 metropolitan in 2000 was 0.181 using si x categories, and 0.180 using the eight categories, indicating the very small effect of using these two alternatives. The correlation between the two is over 0.99. 5

6 used. Minor Civil Division-based MAs were us ed in New England. To address the second alysis uses census tracts. Thes e units are designed with the geographic consideration, this an intent of representing neighborhoods, are delineate d with substantial local input, and thereby a reasonable choice from a heuristic perspective. In 2000, there were 331 MAs in the U.S. Fo r this analysis, six MAs were omitted Greenville, NC, Jones boro, AR, Myrtle Beach, (Barnstable-Yarmouth, MA, Flagstaff, AZ-UT, r than 9 census tracts and populations of less SC, and Punta Gorda, FL) because they had fewe than 41,000 in 1980. All other MAs used had popul ations of at least 50,000 in 1980, which is typically one of the criteria for defining an area an MA. Residential Pattern Measures ually referred to as “residenti al segregation” measures in Residential pattern measures, us the social scientific literature, have been the su ch for many years, and a bject of extensive resear number of different measures have been developed over time (e.g., see Massey and Denton, 1988; Iceland, Weinberg, and Steinmetz, 2002) . Reardon and Firebaugh (2002) note that all major reviews of such indexes limit their disc ussion to dichotomous me asures (e.g. Duncan and Duncan, 1955; James and Taeuber, 1985; Masse y and Denton; 1988; White, 1986; Zoloth, 1976; Massey, White, and Phua, 1996). The earliest of the multigroup indexes is the information theory index (H) (sometimes referred to as the entropy index), which was defined by Theil (Theil, 1972; Theil and Finezza, 1971). The entropy index is a measure of “evenness” —the extent to which groups are evenly distributed among organizational units (Masse y and Denton 1988). More specifically, Theil e average difference between a unit’s group described entropy index as a measure of th 6

7 proportions and that of the system as a whole (Theil 1972). H can also be interpreted as the system and the weighted average diversity of difference between the diversity (entropy) of the individual units, expressed as a fraction of the total diversity of the system (Reardon and Firebaugh 2002). The entropy score, which is a measure of diversity, and the entropy index, which oups across neighborhoods, are discus sed below. A measure of the measures the distribution of gr r. The entropy score is defined by the following first is used in the calculation of the latte formulas, from Massey and Denton (1988). Firs t, a metropolitan area’s entropy score is calculated as: r ∑ ΠΠ rr E()ln[1/] = r1 = the whole metropolitan area refers to a particular ra cial/ethnic group’s proportion of where Π r 5 population. All logarithmic calcu lations use the natural log. diversity this partial formula describes the Unlike the entropy index defined below, in a metropolitan area. The higher the number, the more diverse an area. The maximum level of of groups used in the calculations. With six entropy is given by the natural log of the number g 6 or 1.792. The maximum score occurs when racial/ethnic groups, the maximum entropy is lo all groups have equal representation in the geog raphic area, such that with six groups each would comprise about 17 percent of the area’s population. This is typically not referred to as a measure the distribution of these groups across a of “segregation” because it does not measure can be very diverse if all minority groups metropolitan area. A metropolitan area, for example, 5 ) is 0, then the log is set to ar group in a given census tract ( When the proportion of a particul Π r the absence of a group (or multiple groups) should 0. This is the preferred procedure here, as cates more diversity). (where a higher score indi result in a 0 increase in the diversity score 7

8 if all groups live exclusively in their own are present, but also very highly “segregated” neighborhoods. A unit within the metropolitan area, such as a census tract, would analogously have its entropy score, or diversity, defined as: r ∑ ΠΠ iri ri E()ln[1/] = r1 = refers to a particular raci where π al/ethnic group’s proportion of the population in tract i. ri The entropy index is the weighted average deviation of each unit’s entropy from the n of the metropolitan area’s total entropy: metropolitan-wide entropy, expressed as a fractio n t(E E) − ⎡ ⎤ ii ∑ = H ⎢ ⎥ i1 = ET ⎣ ⎦ e metropolitan area population, n is where t of tract i, T is the is th refers to the total population i and E represent tract i's diversit y (entropy) and metropolitan area the number of tracts, and E i diversity, respectively. The entr all areas have the same opy index varies between 0, when itan area (i.e., maximum integra tion), to a high of 1, when all composition as the entire metropol . While the diversity score is influenced by areas contain one group only (maximum segregation) tropolitan area, the entropy index, being a measure the relative size of the various groups in a me evenly groups are distributed across metropolitan of evenness, is not. Rather, it measures how the size of each of the groups. area neighborhoods, regardless of Other multigroup segregation indexes exist, such as a generalized dissimilarity index and s (dissimilarity, gini, view of 6 multigroup indexe an index of relative diversity. In a detailed re relative diversity, normalized exposure), Reardon entropy, squared CV (coefficient of variation), 8

9 and Firebaugh (2002) conclude that the entropy index is clearly the superior measure. They note, only index that obeys the “princ iple of transfers,” (the index for example, that entropy is the declines when an individual of group m moves from unit i to unit j, wh ere the proportion of persons of group m is higher in unit i than in uni t j). The entropy index can also be decomposed easons, the entropy index was calculated here. into its component parts. For these r Dual-Group Entropy Indexes In addition to the multigroup entropy index, indexes for particular groups are also available here. These employ a two-group entropy index (H) calculation, which uses the same formulas specified above, where the distribution of each of six groups in question (Non-Hispanic Whites, Non-Hispanic African Americans, Non- Hispanic Asians and Pacific Islanders, Non- Hispanic American Indians and Alaska Natives, N on-Hispanics of other races, and Hispanics) is compared to the distribution of all other groups combined. In other words, the reference group for these calculations consists of those who are not of the racial/ethnic group being considered. Additional discussion and analyses of these indexes is contained in Iceland (2004). References Duncan, Otis Dudley and Beverly Duncan. 1955. “A methodological analysis of segregation indexes.” American Sociological Review 20: 210-17. Iceland, John. 2004. “Beyond Black and White: Re sidential Segregation in Multiethnic 33, 2 (June): 248-271. America.” Social Science Research 9

10 Iceland, John, Daniel H. Wei nberg, and Erika Steinmetz. 2002. Racial and Ethnic Residential . U.S. Census Bureau, Census Special Segregation in the United States: 1980-2000 S. Government Printing Office. Report, CENSR-3, Washington, DC: U. James, David R. and Karl E. Taeuber. 1985. “Measures of segregation.” Sociological Methodology 14: 1-32. Dimensions of Residential Segregation." Massey, Douglas S. and Nancy A. Denton. 1988. "The 67:281-315. Social Forces Massey, Douglas S., White, Michael J., and Voon Chin Phua. 1996. "The Dimensions of Segregation Revisited." Sociological Methods and Research 25, 2 (November): 172-206. Reardon, Sean F., and Glenn Firebaugh. 2002. “Measures of MultiGroup Segregation.” Sociological Methodology 32, 1 (January): 33-67. Theil, Henri. 1972. . Amsterdam: North-Holland Publishing Statistical decomposition analysis Company. Thiel, Henri and Anthony J. Finezza. 1971. “A note on the measurement of r acial integration of schools by means of informational concepts.” Journal of Mathematical Sociology 1: 187- 94. White, Michael J. 1986. “Segrega tion and diversity measures in population distribution.” Population Index 52: 198-221. Zoloth, Barbara S. 1976. “A lternative measures of school segregation.” Land Economics 52: 278-298. 10