sampling

Transcript

1 Simple Random Sampling Moulinath Banerjee University of Michigan September 11, 2012 1 Simple Random Sampling The goal is to estimate the mean and the variance of a variable of interest in a finite N members of the population by collecting a random sample from it. Suppose there are population, numbered 1 through and let the values assumed by the variable of interest N be x ,x . Not all the x ’s are necessarily distinct (for example, if we are interested ,...,x 1 N i 2 in estimating the proportion of Democrats in a population of voters, we might assign x = 1 i ’th voter is Democrat and 0 if they are Republican. In this case the population i if the proportion of voters is the mean of the x ’s.). We denote the distinct values of the x ’s by i i ,ξ ξ in the population. The population mean ,...,ξ ξ and let n denote the frequency of i 1 i 2 m is given by, μ m n ∑ ∑ 1 1 , n = ξ x = μ i i i N N =1 i =1 i 2 σ is and the population variance m n n ∑ ∑ ∑ 1 1 1 2 2 2 2 2 2 σ = ( x . − n x μ − μ ) = ξ = μ − i i i i N N N =1 =1 i =1 i i We denote the relative frequencies of the ξ where ’s in the population by { p } ,p ,...,p 2 m i 1 p n /N . = i i 1.1 SRSWR: simple random sampling with replacement A sample of size n is collected with replacement from the population. Thus, an individual is drawn (randomly), their x value recorded, and the individual is then returned to the population. Now, a second individual is drawn, and the process continues n times. Let 1

2 X ,...,X be the random variables obtained thus. Then, the X s are an i.i.d. sample ,X 1 i n 2 X such that ( X = ξ from the distribution of a random variable ) = p P for j = 1 , 2 ,...,m . j j Let ) ( n n ∑ ∑ 1 1 2 2 X ( . = ) X E , and ˆ σ X − μ = ˆ i i 1 − n n =1 i =1 i Then, 2 2 σ and E E σ (ˆ ) = μ ) = . μ (ˆ This will be proved in class. The standard CLT can be used to construct a C.I. (confidence interval) for μ . 1.2 SRSWOR: simple random sampling without replacement n is collected without replacement from the population. Thus the first A sample of size member is chosen at random from the population, and once the first member has been chosen, the second member is chosen at random from the remaining − 1 members and so on, till N n ,k ,...,k ) and k members in the sample. A typical sample therefore looks like ( there are 1 2 n × ( N − 1) × ... ( N − n + 1) the number of all possible ordered samples is easily seen to be N n ( N − l + 1)) of being selected. and each ordered sample has equal probability, namely (Π l =1 ,X denote the observed value of the variables for the members in the sample; ,...X X Let 1 2 n thus X = x ,...,X = x is easily seen to have . The X ’s are random variables and X k 1 i 1 n k n 1 marginal distribution given by, ( X ,...,m. = ξ 2 P p , , j = 1 ) = j j 1 In fact each X } has the same distribution as X ξ . To see this note that the event { X = i 1 i 1 i ’th member of the sample is one of the members of the population happens if and only if the ξ . Without loss of generality assume that the first n members whose variable value equals 1 1 = ; in other words x have variable value equal to = x x = ... ξ = ξ . Now, the number of 1 n 2 1 1 1 n − 1 n 1 − i ordered samples in which the ’th member is 1, is precisely Π − ( − l +1) = Π N 1 ). ( N − l l =1 =1 l n − 1 n i ’th member is 1 is precisely Π Thus the chance that the ( N − l . / (Π /N + 1)) = 1 l ( N − ) =1 l =1 l Similarly, the chance that the i ’th member is s where s is between 1 and n . Thus, is 1 /N 1 the chance that X . It can be argued similarly that = ξ p } is simply n { × 1 /N = n /N = 1 1 1 i 1 ) = ( . P ξ j X p for all = j j i We now consider the joint distribution of ( X . We will show that the ,X j ) for i 6 = j i X ,X ,X ). Now, it is not ) for any i 6 = j is the same as that of ( X joint distribution of ( 1 j i 2 difficult to see that, n n r s , = for r , s 6 ξ = ) = P ξ ( ,X X = r 2 s 1 − 1 N N 2

3 and 1 − n n s s ,X P = ξ X ) = ( = ξ . 1 s s 2 1 N N − i ( = ξ Now, consider ,X P = ξ different. Without loss of ) for s 6 = j , and with r and X i r j s and also let x = = x generality let i < j ... = x . ξ = ξ = and let x x = ... = +1 1 n n 2 + n n s r s s s r { X u,k = ξ } ,X v = ξ = } is the disjoint union of the events { k Thus the event = i r j s i j ≤ u ≤ n where 1 and n such pairs. + 1 ≤ v ≤ n n + n n , and there are r s r s s s u k and k = = v is precisely Now, the number of ordered samples that lead to i j ’th = ( N − 2) × ( N − 3) × ... × ( N − 2 − ( n − n i 2) + 1) (since we are fixing the u,v 2 distinct integers j and v respectively and then choosing n − u and the ’th members at N − 2 population members. Hence the required probability is out of the remaining 1). Thus the probability of the /N n ( N − 1) × ... ( N − n + 1) and this is just 1 /N ( N − × u,v event { X 1) which is the same as the probability = ξ − ,X N = ξ ( } is just n /N n 1 × r s j s i r r { ξ that ,X X = ξ can be similarly handled. } . The case when = = s r 1 2 s We are now in a position to study the properties of the sample-based estimates of μ 2 and the sample variance X by ˆ μ = ( and σ + X X + ... + X . We estimate ) /n = μ n 2 1 ∑ n 2 2 X ( X ) − . In the sampling with replacement case, we have by ˆ ( n σ 1)) / = (1 − i i =1 2 2 seen that ˆ are unbiased estimates of μ and σ and ˆ respectively. In the SRSWOR case σ μ and ,X μ ,...,X are identically distributed as X X ) = and it is easy to check that E ( X 1 n 2 1 1 2 X Var( ) = σ . Now, 1 n ∑ 1 1 μ. = nμ × E ( X (ˆ ) = ) = E μ i n n i =1 2 μ . In SRSWR, Var( X Thus as in SRSWR, σ X /n ; however is an unbiased estimator of ) = X ’s are not independent and the correlation this is not the case with SRSWOR because the i X . To factor consequently needs to be taken into account, while computing the variance of X we proceed as follows. compute the variance of ∑ 1 Var( ) = X ) Cov ( X ,X i j 2 n i,j ) ( n ∑ ∑ 1 ) + ) Var( X X ,X Cov( = j i i 2 n =1 i j i = 6 1 − 1 n 2 = + . σ ) Cov( X ,X 2 1 n n Now, 2 X . ,X ( ) = E Cov( X μ X − ) 2 1 2 1 3

4 Also, letting denote the probability that = ξ X and X p = ξ , we have 2 j ij i 1 ∑ ) = X ξ X ξ p ( E j 1 ij 1 2 i,j m m ∑ ∑ p ij = p ξ ξ i i j p i i =1 j =1 ) ( m m ∑ ∑ ξ n i j − p ξ = ξ i i j − N − 1 1 N i =1 j =1 ( ) m m ∑ ∑ N ξ i p ξ ξ = − p j i i j 1 N − 1 N − j =1 i =1 ) ( 2 m m ∑ ∑ 1 N 2 − ξ = p + p ξ i i i i N − N 1 1 − =1 i i =1 N 1 2 2 2 ( + μ σ ) + μ − = − N N 1 − 1 1 2 2 σ − ) + μ = . ( 1 N − Thus, 1 1 2 2 2 2 − − Cov( σ ( ( σ X ) + μ . − μ ,X = ) ) = 2 1 N − N − 1 1 X Now, plugging the above into the expression for Var( ), we get, 1 n − 1 2 X (1 ) = − Var( σ . ) 1 − N n Thus, we get what is called “ a finite population correction factor” for the variance. 2 2 Now, if we try to estimate as before, by s , where σ n ∑ ( ) 1 2 2 − s X = X , i 1 − n =1 i the estimate is no longer unbiased (in contrast to what happens with SRSWR). Rather, 4

5 ) ( n ∑ 1 2 2 E ( ) = s ) X ( X E − i 1 − n i =1 ( ) n ∑ 1 2 2 = X n X E − i 1 − n i =1 ) ( n ∑ 1 2 2 − X ( X ) ( ) E nE = i − 1 n i =1 ( ( )) 1 2 2 2 + Var( nμ nσ − n ) + μ = X 1 n − ) ( n − 1 N 1 2 2 − = nσ n σ N 1 − 1 n n − 1 + N − n n nN − 2 = σ n 1 N − 1 − 1 N 2 1) − σ n ( = 1 n N − 1 − N 2 = σ . − 1 N 5

Related documents

Numerical Recipes

Numerical Recipes

Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Permission is granted for internet users to make one paper copy for their own personal use. Further reprod...

More info »
Fourth National Report on Human Exposure to Environmental Chemicals Update

Fourth National Report on Human Exposure to Environmental Chemicals Update

201 8 Fourth National Report on Human Exposure to Environmental Chemicals U pdated Tables, March 2018 , Volume One

More info »
508Chap4book.book

508Chap4book.book

U.S. Geological Survey Techniques of Water-Resources Investigations Book 9 Handbooks for Water-Resources Investigations National Field Manual for the Collection of Water-Quality Data Chapter A4. COLLE...

More info »
Consolidated Assessment and Listing Methodology

Consolidated Assessment and Listing Methodology

Consolidated Assessment and Listing Methodology Toward a Compendium of Best Practices First Edition July 2002 Prepared By: U.S. Environmental Protection Agency Office of Wetlands, Oceans, and Watershe...

More info »
Second National Report on Biochemical Indicators of Diet and Nutrition in the U.S. Population

Second National Report on Biochemical Indicators of Diet and Nutrition in the U.S. Population

Second National Report on Biochemical Indicators of Diet and Nutrition in the U.S. Population Second National Report on Biochemical Indicators of Diet and Nutrition in the U.S. Population 2012 Nationa...

More info »
AcqKnowledge 4 Software Guide

AcqKnowledge 4 Software Guide

® Acq 4 Software G uide Knowledge Check BIOPAC.COM > Sup port > Manuals for updates For Life Science Research Applications Data Acquisition and Analysis with BIOPAC Hardware Systems Reference Manual f...

More info »
ADAfaEPoV

ADAfaEPoV

Advanced Data Analysis from an Elementary Point of View Cosma Rohilla Shalizi

More info »
Fluxes of Dissolved Organic Carbon During Storm Events in the Neponset River Watershed

Fluxes of Dissolved Organic Carbon During Storm Events in the Neponset River Watershed

U niv rsit y of M a s sa c h us e tts Bos t o n e s h a r Sc o r k s a t U M a ol s Bos t o n W Graduate Doctoral Dissertations Doctoral Dissertations and Masters Theses 6-1-2015 Fluxes of Dissolved O...

More info »
Pro Tools Reference Guide

Pro Tools Reference Guide

® Pro Tools Reference Guide Version 9.0

More info »
Strong Start for Mothers and Newborns Evaluation: Year 5 Project Synthesis Volume 1: Cross Cutting Findings

Strong Start for Mothers and Newborns Evaluation: Year 5 Project Synthesis Volume 1: Cross Cutting Findings

g Star t f or Mothe rs and Newbor ns Evaluation: Stron YNTHESIS ROJECT S AR 5 P YE Volume 1 indings -Cutting F ross : C Prepared for: ss Caitlin Cro -Barnet Center fo HS nd Medicaid Innovation, DH r M...

More info »
Microsoft Word   A) Division 245.docx

Microsoft Word A) Division 245.docx

tables Attachment Division 245, including A: Nov. 15-16, 2018, EQC meeting 1 of 121 Page Division 245 CLEANER AIR OREGON 340-245-0005 Purpose and Overview (1) This statement of purpose and overview is...

More info »
13128

13128

This PDF is available from The National Academies Press at http://www.nap.edu/catalog.php?record_id=13128 The Health of Lesbian, Gay, Bisexual, and Transgender People: Building a Foundation for Better...

More info »
tsa4

tsa4

i i “tsa4_trimmed” — 2017/12/8 — 15:01 — page 1 — #1 i i Springer Texts in Statistics Robert H. Shumway David S. Sto er Time Series Analysis and Its Applications With R Examples Fourth Edition i i i ...

More info »
Design and Methodology

Design and Methodology

             !"     !% &      '% ())***+  ,&)) )  % -        #  ...

More info »
QAPP

QAPP

QUALITY ASSURANCE PROJECT PLAN eDNA MONITORING OF BIGHEAD AND SILVER CARPS Prepared for: U.S. Fish and Wildlife Service USFWS Midwest Region Bloomington, MN 2019

More info »
Yudissertation

Yudissertation

UNIVERSITY OF CALIFORNIA Los Angeles Evaluating Cutoff Criteria of Model Fit Indices for Latent Variable Models with Binary and Continuous Outcomes A dissertation submitted in partial satisfaction of ...

More info »
DNA Sequencing by Capillary Electrophoresis Chemistry Guide (PN 4305080)

DNA Sequencing by Capillary Electrophoresis Chemistry Guide (PN 4305080)

Chemistry Guide DNA Sequencing by Capillary Electrophoresis Applied Biosystems Chemistry Guide | Second Edition

More info »
Computer Vision: Algorithms and Applications

Computer Vision: Algorithms and Applications

Computer Vision: Algorithms and Applications Richard Szeliski September 3, 2010 draft c © 2010 Springer This electronic draft is for non-commercial personal use only, and may not be posted or re-distr...

More info »