Bases of the mathematical theory of sampling

In practice of statistical observations one distinguishes two kinds of observations: continuous when all objects of a set are studied, and not continuous, sampling (selective) when a part of objects is studied. An example of continuous observation is the population census covering all population of the country. Selective observation is, for example, the spent sociological investigations covering a part of the population of the country, area and etc.

All set of objects (observations) subject to studying is said to be a parent population. A part of objects which is selected for direct studying from a general set, is said to be a sample population or a sample. Numbers of objects (observations) of a parent or sample population are said to be their volumes. A parent population can have both finite and infinite volume.

The essence of a selective method is to make a judgment on properties of a parent population as a whole by some its part (by a sample).

Advantages of sampling:

· It allows to save essentially expenses of resources (material, labor, time);

· It is uniquely possible in a case of an infinite parent population or in a case when the investigation is connected with destruction of observable objects (for example, investigation of durability of electric bulbs, limiting operating modes of devices, etc.);

· At the same expenses of resources one enables a conducting of the profound research due to expansion of the program of research;

· It allows to lower mistakes of registration, i.e. divergences between true and registered values of an attribute.

The basic lack of a sampling – mistakes of research named mistakes of representation. However the inevitable mistakes arising at a sampling of research in connection with studying only of a part of objects can be beforehand appreciated and by means of the correct organization of sample are reduced to practically insignificant quantities.

To have an opportunity to judge on a parent population by sample data, it should be selected randomly.

A sample is representative if it well enough reproduces a parent population.

One distinguishes the following types of samples:

· Properly random sample formed by a random choice of elements without a partition on parts or groups;

· Mechanical sample in which elements from a parent population are selected through the certain interval. For example, if the volume of sample should make 10 % (10-percentage sample) each 10-th its element is selected and etc.;

· Typical (stratified) sample in which one randomly selects elements from typical groups into which by some attribute a parent population is partitioned;

· Serial (nested) sample in which one randomly selects not elements, and the whole groups of a population (series), and these series are subjected to continuous observation.

One uses two methods of forming a sample:

· Selection with replacement (under the circuit of a returned ball) when each element is randomly selected and come back in a general set and can be repeatedly selected;

· Selection without replacement (under the circuit of a non-returned ball) when a selected element does not come back in a general set.

Mathematical theory of sampling is based on properly random sample. Denote:

xi – values of an attribute (random variable X);

N and n – volumes of a parent and sample population;

Ni and ni – numbers of elements of a parent and sample population with the value xi of the attribute;

M and m – numbers of elements of a parent and sample population having the present attribute.

Arithmetic means of distribution of an attribute in a parent and sample populations are parent and sample means respectively, and dispersions of these distributions – parent and sample dispersions. The ratio of the number of elements of a parent and sample populations having some attribute A to their volumes are parent and sample parts respectively.

Name of characteristic Parent population Sample
Mean
Dispersion
Part p = M/N w = m/n

The major problem of sampling is the estimation of parameters (characteristics) of a parent population by sample data.

The theoretical basis of applicability of sampling is made with the law of large numbers, according to which at unlimited increase of volume of sample it is practically authentically that random sample characteristics somehow close come nearer (converge on probability) to the certain parameters of a parent population.

Formulate the problem of estimation of parameters in a general form: Let distribution of an attribute X – parent population – be given by function of probabilities j(xi, q) = P(X = xi) (for a discrete random variable X) or probability density j(x, q) (for a continuous random variable X) which contains an unknown parameter q. For example, it is the parameter l for Poisson distribution or the parameters a and s 2 for the normal law of distribution and etc.

One tries to judge on a parameter q by a sample consisting of values (variants) x1, x2, …, xn. These values can consider as partial values (realizations) of n independent random variables X1, X2, …, Xn each of which has the same law of distribution with the random variable X.

An estimate of a parameter q is any function of results of observations over a random variable X (statistics) by means of which one judges on value of the parameter q :

An estimate of a parameter q is unbiased if its mathematical expectation is equal to estimated parameter, i.e. Otherwise it is biased.

The requirement of unbiasedness guarantees absence of regular mistakes at estimating.

If for a finite volume of sample n, , i.e. displacement of an estimate but then such an estimate is said to be asymptotically unbiased.

An estimate of a parameter q is consistent if it satisfies the law of large numbers, i.e. it converges on probability to estimated parameter:

In case of use of consistent estimates an increasing of sample volume is justified since for this significant mistakes become improbable at estimating. Therefore, only consistent estimates have a practical sense.

An unbiased estimate of a parameter q is efficient if it has the least dispersion among all possible unbiased estimates of the parameter q calculated on samples of the same volume n.

Efficiency of an estimate is determined by the ratio: where and – dispersions of effective and given estimates respectively. The closer е to 1, the more effectively an estimate. If for , such an estimate is said to be asymptotically efficient.

It is desirable as statistical estimates of parameters of a parent population to use the estimates satisfying simultaneously requirements of unbiasedness, consistency and efficiency. However to reach it is not always possible. One can appear that for simplicity of calculations it is expedient to use insignificantly biased estimates or estimates possessing the greater dispersion in comparison with effective estimates, etc.

Glossary

sampling – выборочный метод; sample – выборка

census – перепись; parent population – генеральная совокупность

essence – сущность; judgment – суждение; inevitable– неизбежный

representative – представительный; estimate– оценка

unbiased– несмещенная; unbiasedness – несмещенность

consistent– состоятельная

Exercises for Seminar 13

13.1. Find the group means of a population consisting of two groups:

the first group … xi 0,1 0,4 0,6

ni 3 2 5

the second group … xi 0,1 0,3 0,4

ni 10 4 6

13.2. Distribution of a statistical population is given as follows:

xi 4 7 10 15

ni 10 15 20 5

Find the dispersion of the population using the formula

13.3. Find the intragroup, intergroup and general dispersions of a population consisting of three groups:

the first group … xi 1 2 8

ni 30 15 5

the second group … xi 1 6

ni 10 15

the third group … xi 3 8

ni 20 5

13.4. Find the average value of a random variable given by the following distribution:

X 13,8 13,9 14,1 14,2
N

13.5. Compute D(X) and s(X) for a random variable X given by the following distribution:

X 13,8 13,9 14,1 14,2
N

Exercises for Homework 13

13.6. Distribution of a statistical population is given as follows:

xi 1 4 5

ni 6 11 3

Prove that the sum of products of deviations on the corresponding frequencies is equal to zero.

13.7. Find the intragroup, intergroup and general dispersions of a population consisting of two groups:

the first group … xi 2 7

ni 6 4

the second group… xi 2 7

ni 2 8

13.8. Find the sample and corrected dispersions of the variation series composed of the sample data:

variant … 1 2 5 8 9

frequency… 3 4 6 4 3

13.9. Find the average value, dispersion and mean square deviation of a random variable given by the following distribution:

X 9,8 9,9 10,1 10,2
N

13.10. Determine and D(Y) for the statistical distribution

Y
W 0,10 0,20 0,15 0,25 0,05 0,12 0,08 0,05

L E C T U R E 14

Наши рекомендации