1 Simple Random Sampling Moulinath Banerjee University of Michigan September 11, 2012 1 Simple Random Sampling The goal is to estimate the mean and the variance of a variable of interest in a finite N members of the population by collecting a random sample from it. Suppose there are population, numbered 1 through and let the values assumed by the variable of interest N be x ,x . Not all the x ’s are necessarily distinct (for example, if we are interested ,...,x 1 N i 2 in estimating the proportion of Democrats in a population of voters, we might assign x = 1 i ’th voter is Democrat and 0 if they are Republican. In this case the population i if the proportion of voters is the mean of the x ’s.). We denote the distinct values of the x ’s by i i ,ξ ξ in the population. The population mean ,...,ξ ξ and let n denote the frequency of i 1 i 2 m is given by, μ m n ∑ ∑ 1 1 , n = ξ x = μ i i i N N =1 i =1 i 2 σ is and the population variance m n n ∑ ∑ ∑ 1 1 1 2 2 2 2 2 2 σ = ( x . − n x μ − μ ) = ξ = μ − i i i i N N N =1 =1 i =1 i i We denote the relative frequencies of the ξ where ’s in the population by { p } ,p ,...,p 2 m i 1 p n /N . = i i 1.1 SRSWR: simple random sampling with replacement A sample of size n is collected with replacement from the population. Thus, an individual is drawn (randomly), their x value recorded, and the individual is then returned to the population. Now, a second individual is drawn, and the process continues n times. Let 1

2 X ,...,X be the random variables obtained thus. Then, the X s are an i.i.d. sample ,X 1 i n 2 X such that ( X = ξ from the distribution of a random variable ) = p P for j = 1 , 2 ,...,m . j j Let ) ( n n ∑ ∑ 1 1 2 2 X ( . = ) X E , and ˆ σ X − μ = ˆ i i 1 − n n =1 i =1 i Then, 2 2 σ and E E σ (ˆ ) = μ ) = . μ (ˆ This will be proved in class. The standard CLT can be used to construct a C.I. (confidence interval) for μ . 1.2 SRSWOR: simple random sampling without replacement n is collected without replacement from the population. Thus the first A sample of size member is chosen at random from the population, and once the first member has been chosen, the second member is chosen at random from the remaining − 1 members and so on, till N n ,k ,...,k ) and k members in the sample. A typical sample therefore looks like ( there are 1 2 n × ( N − 1) × ... ( N − n + 1) the number of all possible ordered samples is easily seen to be N n ( N − l + 1)) of being selected. and each ordered sample has equal probability, namely (Π l =1 ,X denote the observed value of the variables for the members in the sample; ,...X X Let 1 2 n thus X = x ,...,X = x is easily seen to have . The X ’s are random variables and X k 1 i 1 n k n 1 marginal distribution given by, ( X ,...,m. = ξ 2 P p , , j = 1 ) = j j 1 In fact each X } has the same distribution as X ξ . To see this note that the event { X = i 1 i 1 i ’th member of the sample is one of the members of the population happens if and only if the ξ . Without loss of generality assume that the first n members whose variable value equals 1 1 = ; in other words x have variable value equal to = x x = ... ξ = ξ . Now, the number of 1 n 2 1 1 1 n − 1 n 1 − i ordered samples in which the ’th member is 1, is precisely Π − ( − l +1) = Π N 1 ). ( N − l l =1 =1 l n − 1 n i ’th member is 1 is precisely Π Thus the chance that the ( N − l . / (Π /N + 1)) = 1 l ( N − ) =1 l =1 l Similarly, the chance that the i ’th member is s where s is between 1 and n . Thus, is 1 /N 1 the chance that X . It can be argued similarly that = ξ p } is simply n { × 1 /N = n /N = 1 1 1 i 1 ) = ( . P ξ j X p for all = j j i We now consider the joint distribution of ( X . We will show that the ,X j ) for i 6 = j i X ,X ,X ). Now, it is not ) for any i 6 = j is the same as that of ( X joint distribution of ( 1 j i 2 difficult to see that, n n r s , = for r , s 6 ξ = ) = P ξ ( ,X X = r 2 s 1 − 1 N N 2

3 and 1 − n n s s ,X P = ξ X ) = ( = ξ . 1 s s 2 1 N N − i ( = ξ Now, consider ,X P = ξ different. Without loss of ) for s 6 = j , and with r and X i r j s and also let x = = x generality let i < j ... = x . ξ = ξ = and let x x = ... = +1 1 n n 2 + n n s r s s s r { X u,k = ξ } ,X v = ξ = } is the disjoint union of the events { k Thus the event = i r j s i j ≤ u ≤ n where 1 and n such pairs. + 1 ≤ v ≤ n n + n n , and there are r s r s s s u k and k = = v is precisely Now, the number of ordered samples that lead to i j ’th = ( N − 2) × ( N − 3) × ... × ( N − 2 − ( n − n i 2) + 1) (since we are fixing the u,v 2 distinct integers j and v respectively and then choosing n − u and the ’th members at N − 2 population members. Hence the required probability is out of the remaining 1). Thus the probability of the /N n ( N − 1) × ... ( N − n + 1) and this is just 1 /N ( N − × u,v event { X 1) which is the same as the probability = ξ − ,X N = ξ ( } is just n /N n 1 × r s j s i r r { ξ that ,X X = ξ can be similarly handled. } . The case when = = s r 1 2 s We are now in a position to study the properties of the sample-based estimates of μ 2 and the sample variance X by ˆ μ = ( and σ + X X + ... + X . We estimate ) /n = μ n 2 1 ∑ n 2 2 X ( X ) − . In the sampling with replacement case, we have by ˆ ( n σ 1)) / = (1 − i i =1 2 2 seen that ˆ are unbiased estimates of μ and σ and ˆ respectively. In the SRSWOR case σ μ and ,X μ ,...,X are identically distributed as X X ) = and it is easy to check that E ( X 1 n 2 1 1 2 X Var( ) = σ . Now, 1 n ∑ 1 1 μ. = nμ × E ( X (ˆ ) = ) = E μ i n n i =1 2 μ . In SRSWR, Var( X Thus as in SRSWR, σ X /n ; however is an unbiased estimator of ) = X ’s are not independent and the correlation this is not the case with SRSWOR because the i X . To factor consequently needs to be taken into account, while computing the variance of X we proceed as follows. compute the variance of ∑ 1 Var( ) = X ) Cov ( X ,X i j 2 n i,j ) ( n ∑ ∑ 1 ) + ) Var( X X ,X Cov( = j i i 2 n =1 i j i = 6 1 − 1 n 2 = + . σ ) Cov( X ,X 2 1 n n Now, 2 X . ,X ( ) = E Cov( X μ X − ) 2 1 2 1 3

4 Also, letting denote the probability that = ξ X and X p = ξ , we have 2 j ij i 1 ∑ ) = X ξ X ξ p ( E j 1 ij 1 2 i,j m m ∑ ∑ p ij = p ξ ξ i i j p i i =1 j =1 ) ( m m ∑ ∑ ξ n i j − p ξ = ξ i i j − N − 1 1 N i =1 j =1 ( ) m m ∑ ∑ N ξ i p ξ ξ = − p j i i j 1 N − 1 N − j =1 i =1 ) ( 2 m m ∑ ∑ 1 N 2 − ξ = p + p ξ i i i i N − N 1 1 − =1 i i =1 N 1 2 2 2 ( + μ σ ) + μ − = − N N 1 − 1 1 2 2 σ − ) + μ = . ( 1 N − Thus, 1 1 2 2 2 2 − − Cov( σ ( ( σ X ) + μ . − μ ,X = ) ) = 2 1 N − N − 1 1 X Now, plugging the above into the expression for Var( ), we get, 1 n − 1 2 X (1 ) = − Var( σ . ) 1 − N n Thus, we get what is called “ a finite population correction factor” for the variance. 2 2 Now, if we try to estimate as before, by s , where σ n ∑ ( ) 1 2 2 − s X = X , i 1 − n =1 i the estimate is no longer unbiased (in contrast to what happens with SRSWR). Rather, 4

5 ) ( n ∑ 1 2 2 E ( ) = s ) X ( X E − i 1 − n i =1 ( ) n ∑ 1 2 2 = X n X E − i 1 − n i =1 ) ( n ∑ 1 2 2 − X ( X ) ( ) E nE = i − 1 n i =1 ( ( )) 1 2 2 2 + Var( nμ nσ − n ) + μ = X 1 n − ) ( n − 1 N 1 2 2 − = nσ n σ N 1 − 1 n n − 1 + N − n n nN − 2 = σ n 1 N − 1 − 1 N 2 1) − σ n ( = 1 n N − 1 − N 2 = σ . − 1 N 5

Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reprod...

More info »201 8 Fourth National Report on Human Exposure to Environmental Chemicals U pdated Tables, March 2018 , Volume One

More info »U.S. Geological Survey Techniques of Water-Resources Investigations Book 9 Handbooks for Water-Resources Investigations National Field Manual for the Collection of Water-Quality Data Chapter A4. COLLE...

More info »Consolidated Assessment and Listing Methodology Toward a Compendium of Best Practices First Edition July 2002 Prepared By: U.S. Environmental Protection Agency Office of Wetlands, Oceans, and Watershe...

More info »Second National Report on Biochemical Indicators of Diet and Nutrition in the U.S. Population Second National Report on Biochemical Indicators of Diet and Nutrition in the U.S. Population 2012 Nationa...

More info »® Acq 4 Software G uide Knowledge Check BIOPAC.COM > Sup port > Manuals for updates For Life Science Research Applications Data Acquisition and Analysis with BIOPAC Hardware Systems Reference Manual f...

More info »U niv rsit y of M a s sa c h us e tts Bos t o n e s h a r Sc o r k s a t U M a ol s Bos t o n W Graduate Doctoral Dissertations Doctoral Dissertations and Masters Theses 6-1-2015 Fluxes of Dissolved O...

More info »g Star t f or Mothe rs and Newbor ns Evaluation: Stron YNTHESIS ROJECT S AR 5 P YE Volume 1 indings -Cutting F ross : C Prepared for: ss Caitlin Cro -Barnet Center fo HS nd Medicaid Innovation, DH r M...

More info »tables Attachment Division 245, including A: Nov. 15-16, 2018, EQC meeting 1 of 121 Page Division 245 CLEANER AIR OREGON 340-245-0005 Purpose and Overview (1) This statement of purpose and overview is...

More info »This PDF is available from The National Academies Press at http://www.nap.edu/catalog.php?record_id=13128 The Health of Lesbian, Gay, Bisexual, and Transgender People: Building a Foundation for Better...

More info »i i “tsa4_trimmed” — 2017/12/8 — 15:01 — page 1 — #1 i i Springer Texts in Statistics Robert H. Shumway David S. Sto er Time Series Analysis and Its Applications With R Examples Fourth Edition i i i ...

More info »QUALITY ASSURANCE PROJECT PLAN eDNA MONITORING OF BIGHEAD AND SILVER CARPS Prepared for: U.S. Fish and Wildlife Service USFWS Midwest Region Bloomington, MN 2019

More info »UNIVERSITY OF CALIFORNIA Los Angeles Evaluating Cutoff Criteria of Model Fit Indices for Latent Variable Models with Binary and Continuous Outcomes A dissertation submitted in partial satisfaction of ...

More info »Chemistry Guide DNA Sequencing by Capillary Electrophoresis Applied Biosystems Chemistry Guide | Second Edition

More info »Computer Vision: Algorithms and Applications Richard Szeliski September 3, 2010 draft c © 2010 Springer This electronic draft is for non-commercial personal use only, and may not be posted or re-distr...

More info »