Introduction:
Data means information or a set of given facts. The data is usually collected through census or surveys. Statistics is defined as the collection, presentation, analysis
and interpretation of numerical (statistical) data.
Variable (or variate)
A variable (or variate) which is not capable of assuming all values in a given range is called a discrete variable.
A variable which is capable of assuming all the numerical values in a given range is called a continuous variable. Frequency Distribution:
Let the data regarding the weights (in kgs) of 20 students of a class be given as
This is called the raw data. This is also called an individual series. We note that some of the weights (values of the quantitative variable) are repeated. If there are 3
students having weight 50 kg, then we say the frequency of 50 is 3. Therefore, the number of times the value of the item is repeated is called the frequency of that
value. The table containing the weights and the corresponding frequencies is given as
Tally bars are used to count the number of times the values of the variable has occurred. In the order of magnitude, the frequency distribution is written as follows;
We denote the total number of students, that is the total frequency by n i.e. n = f. Also we denote different values of the variables x as xi and different
frequencies by fi.
The classes are written in two forms.
(i) Inclusive form: In this case, the lower limit of a class is not equal to the upper limit of the previous class. For example: 45 - 49, 50 - 54, 55- 59, 60 - 64 are in
inclusive form.
However, in the class 45 - 49, all items with values greater or equal to 44.5 but less than 49.5 are to be taken. Thus actual limits are 44.5 - 49.5, 49.5-54.5,
54.5- 59.5. 59.5 - 64.5.
(ii) Exclusive form: In this case, the lower limit of a class is equal to the upper limit of the previous class. For example- we may have classes of the form 45 - 50,
50 - 55, 55 - 60, 60 - 65 etc. The value 50 is counted in the class 50 and under 55 and not in 45 and under 50.
In both the forms, the length of classes (upper limit - lower limit) is same.
RELATIVE AND CUMULATIVE FREQUENCY
Relative Frequency:
The relative frequency gives useful information about the data, particularly when the class frequencies are large and total frequency is very large.
Relative frequency = .
Cumulative Frequency of a value (or class of values) is obtained by adding all the frequencies of all values (or classes of values) less than or equal to that under
consideration. Cumulative frequency is an important concept and is useful is determining the measures of location.
TYPES OF AVERAGES
(a) Mean
(i) Arithmetic Mean (ii) Weighted arithmetic mean
(iii) Geometric Mean (iv) Weighted Geometric Mean
(v) Harmonic Mean (vi) Weighted Harmonic Mean
(b) Median
(c) Mode
THE ARITHMETIC MEAN:
The arithmetic mean of a statistical data is defined as the ratio of the sum of all the values of the variable and the total number of items. It is denoted by A.M.
Calculation of Arithmetic Mean:
Let, x1, x2, ...., xn be a set of n observed values of a statistical data. We denote the arithmetic mean or simply the mean by . Therefore, for this individual data,
the arithmetic mean is defined as
= .
when the observations xi, i = 1, 2, ...., n are very large then the arithmetic mean is calculated as follows:
where d= xi - A, i = 1, 2, ...., n.
A is the assumed mean.
(ii) For a Frequency Distribution
(a) Let us consider a frequency distribution. Let xi be the values of the variable and fi be the corresponding frequencies that is, the grouped data is (xi, fi), i = 1, 2,
...., n. If the values of the variables are given as intervals or classes are taken as xi, then, the arithmetic mean of the frequency distribution is defined as
(b) Short - cut Method
The mean of this frequency distribution is
or .
Hence,
where di = xi - a, a is assumed mean.
(c) Step Deviation Method
In this case, define, di =
or, xi = a + hdi
where h is the length of the class intervals and a is the assumed mean.
Then, xifi = (a + hdi) fi
= a fi + h difi.
Thus
where a = assumed mean,
h = length of class interval
fi = frequency of each variable
di = .
Weighted Arithmetic Mean
If w1, w2, w3, ...., wn are the weights assigned to the values x1, x2, x3, ...., xn respectively, then the weighted average is defined as:
Weighted Arithmetic Mean = .
Geometric Mean
If x1, x2, ...., xn are n values of a variable x, none of them being zero, then the geometric mean G is defined as G = (x1x2x3 ..... xn)1/n.
Geometric mean for frequency distribution:
Geometric mean of n values x1, x2, x3, ...., xn of a variable x, occurring with frequency f1, f2, f3, ...., fn respectively is given by
G =
or G = antilog.
Harmonic Mean
The harmonic mean of n items x1, x2, x3,...., xn is defined as:
Harmonic Mean =
Harmonic Mean of Frequency Distribution:
Let x1, x2, x3, ...., xn be n items which occur with frequencies f1, f2, f3, ...., fn respectively. Then their Harmonic Mean is given by:
Harmonic Mean = .
Relation between Arithmetic Mean, Geometric Mean and Harmonic Mean:
The arithmetic mean (A. M.), Geometric mean (G.M.) and Harmonic Mean (H.M.) for a given set of observations of a series are related as under:
A. M G.M H.M
Median:
Median is defined as the middle most or the central value of the variables in a set of observations, when the observations are arranged either in ascending or in
descending order of their magnitudes. It divides the arranged series in two equal parts. Median is a position average, whereas, the arithmetic mean is the calculated
average. When a series consists of an even number of terms, median is the arithmetic mean of the two central items. It is generally denoted by M.
Case I: When n is odd.
In this case th value is the median i.e.
Case II: When n is even.
In this case there are two middle terms th and . The median is the average of these two terms, i.e. th term
Case III: When the series is continuous.
In this case the data is given in the form of a frequency table with class-interval, etc., we prepare the cumulative frequency table and determine the median class
i.e. the class in which the observation lies and the following formula is used to calculate the Median:
M = L + , where
L = lower limit of the class in which the median lies
n = total number of frequencies, i.e., n = f.
f = frequency of the class in which the median lies
C = cumulative frequency of the class preceding the median class
i = width of the class-interval of the class in which the median lies.
Find the median of the wage distribution.
Mode:
Mode is defined as that value in a series which occurs most frequently. In a frequency distribution mode is that variate which has maximum frequency. This
measure is used when it is important to know which values occurs most frequently.
Continuous Frequency Distribution:
i) Modal Class: It is that class in grouped frequency distribution in which the mode lies.
Mode = , where
L = the lower limit of the modal class
i = the width of the modal class
f1 = the frequency of the class preceding modal class
fm = the frequency of the modal class
f2 = the frequency of the class succeeding modal class.
If above formula fails then Mode = , where L, f1, f2, i have usual meanings.
Symmetrical Distribution:
A distribution in which mean, median and mode coincide is called symmetrical distribution.
Relation between Mean, Median and Mode:
Symmetrical distribution:
A distribution in which same number of frequencies is found to be distributed at the same linear distance on either side of the mode. In this case, mean, median and
mode coincide.
Thus, Mean = Median = Mode.
Asymmetrical distribution:
In this distribution, variations do not have symmetry. If the distribution is moderately asymmetrical then mean, median and mode are connected by the formula
Mode = 3 Median - 2 Mean.
Measure of Dispersion:
Dispersion is defined as scatter or spread of the observed valued of a quantitative variable from a central value.
Normally, the following measures of dispersion are used:
(a) Range
(b) Mean Deviation
(c) Standard Deviation
(a) Range:
It is the simplest form of measuring the variation. The range of a set of values is the difference between the largest and the smallest values in the set. Range gives
very limited information. It tells the difference between the extreme values but nothing about the variations between other values
(b) Mean Deviation:
The mean deviation is defined as the arithmetic mean of the absolute values of the deviations of the observed values from mean or median.
Method for Calculation of Mean Deviation
Case-I : For ungrouped data
Let x1, x2, x3,.... , xn be n observations. Then
Mean deviation from mean =
where = mean value of given observations.
n = total number of observations or items.
Mean deviation from median =
where M = median of the given observations.
Case-2: For grouped data
Let x1, x2, x3,...., xn occur with frequencies f1, f2, f3, ...., fn respectively and let .
Then Mean deviation from mean =
where = mean.
Mean deviation from median =
where M = median.
Standard Deviation:
Standard deviation of a given set of observations is defined as the positive square root of the average of squared deviations of all observations taken from their
arithmetic mean. It is generally denoted by Greek alphabet or s.
Variance
The square of the standard deviation is called variance and is denoted by 2.
Method of Calculating Standard Deviation:
(a) For ungrouped data
Direct Method:
Let us consider n observations x1, x2, ...., xn. Let the arithmetic mean of these observations be . Then standard deviation is given by
=
= .
= .
Short Cut Method:
This method is applied to calculate standard deviation, when the mean of the data comes out to be a fraction. In this case, we shift xi by A (assumed mean) i.e.
define di = xi - A and then find the standard deviation.
We have .
Hence,
=
=
=
where di = xi - A,
A = assumed mean,
n = total number of observation.
(b) For grouped data
If a variate x takes values x1, x2, ...., xn with respective frequencies fi, f2, ...., fn then standard deviation is given by
= where .
If class intervals are given, then mid values of class intervals give the values of variate x.
But when the mean has a fractional value, then the following formula is applied to calculate standard deviation
=
where di = xi - A, A assumed mean.
Combined Standard Deviation:
Let 1 and 2 be the standard deviations of the two groups containing n1 and n2 items respectively. Let be their respective A.M. Let x and be
the A.M.and S.D. of the combined group respectively. Then
,
=
where and .
Coefficient of Variation:
For comparing two or more series for variability, we calculate the coefficient of standard deviation and the coefficient of variation.
The coefficient of standard deviation is defined as: coefficient of standard deviation = .
The coefficient of variation is defined as: coefficient of variation = .
Coefficient of variation gives us a measure of scattering (dispersion). Scattering is less if the coefficient of variation is small.