AIEEE Concepts®: Statistics

Statistics

Introduction:

Data means information or a set of given facts. The data is usually collected through census or surveys. Statistics is defined as the collection, presentation, analysis

and interpretation of numerical (statistical) data.

Variable (or variate)

A variable (or variate) which is not capable of assuming all values in a given range is called a discrete variable.

A variable which is capable of assuming all the numerical values in a given range is called a continuous variable. Frequency Distribution:

Let the data regarding the weights (in kgs) of 20 students of a class be given as

This is called the raw data. This is also called an individual series. We note that some of the weights (values of the quantitative variable) are repeated. If there are 3

students having weight 50 kg, then we say the frequency of 50 is 3. Therefore, the number of times the value of the item is repeated is called the frequency of that

value. The table containing the weights and the corresponding frequencies is given as

Tally bars are used to count the number of times the values of the variable has occurred. In the order of magnitude, the frequency distribution is written as follows;

We denote the total number of students, that is the total frequency by n i.e. n =

f. Also we denote different values of the variables x as x_i and different

frequencies by f_i.

The classes are written in two forms.

(i) Inclusive form: In this case, the lower limit of a class is not equal to the upper limit of the previous class. For example: 45 - 49, 50 - 54, 55- 59, 60 - 64 are in

inclusive form.

However, in the class 45 - 49, all items with values greater or equal to 44.5 but less than 49.5 are to be taken. Thus actual limits are 44.5 - 49.5, 49.5-54.5,

54.5- 59.5. 59.5 - 64.5.

(ii) Exclusive form: In this case, the lower limit of a class is equal to the upper limit of the previous class. For example- we may have classes of the form 45 - 50,

50 - 55, 55 - 60, 60 - 65 etc. The value 50 is counted in the class 50 and under 55 and not in 45 and under 50.

In both the forms, the length of classes (upper limit - lower limit) is same.

RELATIVE AND CUMULATIVE FREQUENCY

Relative Frequency:

The relative frequency gives useful information about the data, particularly when the class frequencies are large and total frequency is very large.

Relative frequency =

.

Cumulative Frequency of a value (or class of values) is obtained by adding all the frequencies of all values (or classes of values) less than or equal to that under

consideration. Cumulative frequency is an important concept and is useful is determining the measures of location.

TYPES OF AVERAGES

(a) Mean

(i) Arithmetic Mean (ii) Weighted arithmetic mean

(iii) Geometric Mean (iv) Weighted Geometric Mean

(v) Harmonic Mean (vi) Weighted Harmonic Mean

(b) Median

(c) Mode

THE ARITHMETIC MEAN:

The arithmetic mean of a statistical data is defined as the ratio of the sum of all the values of the variable and the total number of items. It is denoted by A.M.

Calculation of Arithmetic Mean:

Let, x₁, x₂, ...., x_n be a set of n observed values of a statistical data. We denote the arithmetic mean or simply the mean by

. Therefore, for this individual data,

the arithmetic mean is defined as

.

when the observations x_i, i = 1, 2, ...., n are very large then the arithmetic mean is calculated as follows:

where d= x_i - A, i = 1, 2, ...., n.

A is the assumed mean.

(ii) For a Frequency Distribution

(a) Let us consider a frequency distribution. Let x_i be the values of the variable and f_i be the corresponding frequencies that is, the grouped data is (x_i, f_i), i = 1, 2,

...., n. If the values of the variables are given as intervals or classes are taken as x_i, then, the arithmetic mean of the frequency distribution is defined as

(b) Short - cut Method

The mean of this frequency distribution is

.

Hence,

where d_i = x_i - a, a is assumed mean.

(c) Step Deviation Method

In this case, define, d_i =

or, x_i = a + hd_i

where h is the length of the class intervals and a is the assumed mean.

Then,

x_if_i =

(a + hd_i) f_i

= a

f_i + h

d_if_i.

Thus

where a = assumed mean,

h = length of class interval

f_i = frequency of each variable

d_i =

.

Weighted Arithmetic Mean

If w₁, w₂, w₃, ...., w_n are the weights assigned to the values x₁, x₂, x₃, ...., x_n respectively, then the weighted average is defined as:

Weighted Arithmetic Mean =

.

Geometric Mean

If x₁, x₂, ...., x_n are n values of a variable x, none of them being zero, then the geometric mean G is defined as G = (x₁x₂x₃ ..... x_n)^1/n.

Geometric mean for frequency distribution:

Geometric mean of n values x₁, x₂, x₃, ...., x_n of a variable x, occurring with frequency f₁, f₂, f₃, ...., f_n respectively is given by

G =

or G = antilog

.

Harmonic Mean

The harmonic mean of n items x₁, x₂, x₃,...., x_n is defined as:

Harmonic Mean =

Harmonic Mean of Frequency Distribution:

Let x₁, x₂, x₃, ...., x_n be n items which occur with frequencies f₁, f₂, f₃, ...., f_n respectively. Then their Harmonic Mean is given by:

Harmonic Mean =

.

Relation between Arithmetic Mean, Geometric Mean and Harmonic Mean:

The arithmetic mean (A. M.), Geometric mean (G.M.) and Harmonic Mean (H.M.) for a given set of observations of a series are related as under:

A. M

G.M

H.M

Median:

Median is defined as the middle most or the central value of the variables in a set of observations, when the observations are arranged either in ascending or in

descending order of their magnitudes. It divides the arranged series in two equal parts. Median is a position average, whereas, the arithmetic mean is the calculated

average. When a series consists of an even number of terms, median is the arithmetic mean of the two central items. It is generally denoted by M.

Case I: When n is odd.

In this case

th value is the median i.e.

Case II: When n is even.

In this case there are two middle terms

th and

. The median is the average of these two terms, i.e.

th term

Case III: When the series is continuous.

In this case the data is given in the form of a frequency table with class-interval, etc., we prepare the cumulative frequency table and determine the median class

i.e. the class in which the

observation lies and the following formula is used to calculate the Median:
M = L +

, where

L = lower limit of the class in which the median lies

n = total number of frequencies, i.e., n =

f.

f = frequency of the class in which the median lies

C = cumulative frequency of the class preceding the median class

i = width of the class-interval of the class in which the median lies.

Find the median of the wage distribution.

Mode:

Mode is defined as that value in a series which occurs most frequently. In a frequency distribution mode is that variate which has maximum frequency. This

measure is used when it is important to know which values occurs most frequently.

Continuous Frequency Distribution:

i) Modal Class: It is that class in grouped frequency distribution in which the mode lies.

Mode =

, where

L = the lower limit of the modal class

i = the width of the modal class

f₁ = the frequency of the class preceding modal class

f_m = the frequency of the modal class

f₂ = the frequency of the class succeeding modal class.

If above formula fails then Mode =

, where L, f₁, f₂, i have usual meanings.

Symmetrical Distribution:

A distribution in which mean, median and mode coincide is called symmetrical distribution.

Relation between Mean, Median and Mode:

Symmetrical distribution:

A distribution in which same number of frequencies is found to be distributed at the same linear distance on either side of the mode. In this case, mean, median and

mode coincide.

Thus, Mean = Median = Mode.

Asymmetrical distribution:

In this distribution, variations do not have symmetry. If the distribution is moderately asymmetrical then mean, median and mode are connected by the formula

Mode = 3 Median - 2 Mean.

Measure of Dispersion:

Dispersion is defined as scatter or spread of the observed valued of a quantitative variable from a central value.

Normally, the following measures of dispersion are used:

(a) Range

(b) Mean Deviation

(c) Standard Deviation

(a) Range:

It is the simplest form of measuring the variation. The range of a set of values is the difference between the largest and the smallest values in the set. Range gives

very limited information. It tells the difference between the extreme values but nothing about the variations between other values

(b) Mean Deviation:

The mean deviation is defined as the arithmetic mean of the absolute values of the deviations of the observed values from mean or median.

Method for Calculation of Mean Deviation

Case-I : For ungrouped data

Let x₁, x₂, x₃,.... , x_n be n observations. Then

Mean deviation from mean =

where

= mean value of given observations.

n = total number of observations or items.

Mean deviation from median =

where M = median of the given observations.

Case-2: For grouped data
Let x₁, x₂, x₃,...., x_n occur with frequencies f₁, f₂, f₃, ...., f_n respectively and let

.
Then Mean deviation from mean =

where

= mean.
Mean deviation from median =

where M = median.

Standard Deviation:

Standard deviation of a given set of observations is defined as the positive square root of the average of squared deviations of all observations taken from their

arithmetic mean. It is generally denoted by Greek alphabet

or s.

Variance

The square of the standard deviation is called variance and is denoted by

².

Method of Calculating Standard Deviation:

(a) For ungrouped data

Direct Method:

Let us consider n observations x₁, x₂, ...., x_n. Let the arithmetic mean of these observations be

. Then standard deviation is given by

.

=

.

Short Cut Method:

This method is applied to calculate standard deviation, when the mean of the data comes out to be a fraction. In this case, we shift x_i by A (assumed mean) i.e.

define d_i = x_i - A and then find the standard deviation.

We have

.

Hence,

where d_i = x_i - A,

A = assumed mean,

n = total number of observation.

(b) For grouped data

If a variate x takes values x₁, x₂, ...., x_n with respective frequencies f_i, f₂, ...., f_n then standard deviation is given by

where

.

If class intervals are given, then mid values of class intervals give the values of variate x.

But when the mean has a fractional value, then the following formula is applied to calculate standard deviation

where d_i = x_i - A, A assumed mean.

Combined Standard Deviation:

Let

₁ and

₂ be the standard deviations of the two groups containing n₁ and n₂ items respectively. Let

be their respective A.M. Let x and

be

the A.M.and S.D. of the combined group respectively. Then

where

and

.

Coefficient of Variation:

For comparing two or more series for variability, we calculate the coefficient of standard deviation and the coefficient of variation.

The coefficient of standard deviation is defined as: coefficient of standard deviation =

.

The coefficient of variation is defined as: coefficient of variation =

.

Coefficient of variation gives us a measure of scattering (dispersion). Scattering is less if the coefficient of variation is small.

AIEEE Concepts®

A Complete Coverage Over AIEEE Exam

Statistics

Translate:

Online

Editor's Pick

Archives

Maths

Physics

Chemistry