Tuesday, 3 January 2017
MEASURES OF VARIABILITY
INTRODUCTION
The calculation of three
measures of central tendency- the mean, the median, and the mode measures
typical or representative of asset of scores as a whole. To find some measure
of the variability of our scores, that is, of the “scatter” or “spread” of the
separate scores around their central tendency. If a group is homogeneous, that
is, made up of individuals of nearly the same ability, most of its scores will
fall around the same point on the scale, the range will be relatively short and
the variability small. But if the group contains individuals of widely differing
capacities, scores will be strung out from high to low, the range will be
relatively wide and the variability large.
Four measures have been
devised to indicate the variability or dispersion within a set of measures.
These are (1) the range, (2) the quartile deviation or Q, (3) the standard
deviation or SD, and (4) the average deviation. Here we need to focusing on the
following measures of variability, (1) the range, (2) the quartile deviation or
Q, and (3) the standard deviation.
CALCULATION
OF MEASURES OF DISPERSION
1)
THE
RANGE
The range may defined that the
interval between the highest and lowest scores. The range is the most general
measure of spread or scatter, and is computed when we wish to make a rough
comparison of two or more groups for variability. The range takes account of
the extremes of the series of scores only, and is unreliable when the area N is
small or when there are large gaps (zero f’s) in the frequency distribution.
Suppose that the highest score in a distribution is 120 and there is a gap of
20 points before we reach 100, the lowest score is 60, the single high score of
120 increases the range from 40 (100-60) to (120-60).
USE OF RANGE
a)
When the data are too scant or too scattered
to justify the computation of a more precise measure of variability.
b)
When a knowledge of extreme scores or of
total spread is all that is wanted.
MERITS OF RANGE
It can be easily calculated and
understood.
DEMERITS
OF RANGE
a)
It helps us to make only a rough comparison
of two or more groups with respect to the variability of the scores concerned.
b)
It is very greatly affected by fluctuations
in sampling. Its value is never stable.
c)
It takes into account only the two extreme
end scores of a distribution and is unreliable when N is small or when there
are large gaps in the frequency distribution.
d)
The range does not take into account the
composition of a group or a nature of distribution of the scores within the
extremes. The range of a symmetrical and asymmetrical distribution can be identical.
2) THE QUARTILE DEVIATION
The quartile
deviation or Q is one-half the scale distance between the 75th and
25th percentiles in a frequency distribution. The 25th
percentile or Q1 is the first quartile on the score scale, the point
below which lie 25% of the scores. The 75th percentile or Q3
is the third quartile on the score scale, the point below which lie 75% of the
scores. Quartile deviation is expressed by the formula:
Q = (Q3-Q1)/2
To find Q, we must first
compute the 75th and 25th percentiles. These statistics
are found in exactly the same way as was the median, which is, the 50th
percentile or Q2. The only difference is that ¼ of N is counted off
from the low end of the distribution to find Q1 and that ¾ N is
counted off to find Q3. The formulas are:
Q1= l + [i
(N/4 – Cum f1)]/ fq
Q3=
l + [i (3N/4 – Cum f1)] / fq
Where l = the exact lower limit of the interval in which the quartile falls.
i = the length of the interval.
Cum f1 = cumulative
f upto the interval which contains the quartile
fq = the f on the
interval containing the quartiles.
Quartile deviation of the ungrouped data can be calculated as follows:
Scores: 12, 12, 15, 17, 20, 25, 25, 26, 33, 37, and 4
Q = (Q3 - Q1)/2
Since
there are 11 scores the 3rd score is Q1 and the 9th
score is Q3. So,
Q = (9th term – 3rd term)/2
= (33
– 15)/2
= 9
Quartile deviation
from grouped data can be calculated as follows:
Class
|
Frequency
|
Cumulative frequency
|
120 – 139
|
50
|
1000
|
100 – 119
|
150
|
950
|
80 – 99
|
500
|
800
|
60 – 79
|
250
|
300
|
40 – 59
|
50
|
50
|
N = 1000
Q1 = l + [i (N/4 – Cum f1)]/
fq
= 595 + [20(250-50)] / 250
= 75.5
Q3
= l + [i (3N/4 – Cum f1)] / fq
= 79.5 + [20 (750 – 300)] / 500
= 97.5
Q
= (Q3 – Q1) / 2
= (97.5 – 75.5) / 2
= 11
USE OF QUARTILE DEVIATION
a)
When the median is taken as a measure of
central tendency.
b)
When the details of the distribution at either
end is available.
c) When there are scattered or extreme scores
which would influence the standard deviation disproportionately.
MERITS OF QUARTILE DEVIATION
a)
It
is a more representative and trust worthy measure of variability than the
range.
b) It is a good index of score density at the
middle of the distribution.
c)
It
is useful in indicating the skewness of a distribution.
d) Like the median, it is applicable to
open-end distributions.
DEMERITS OF QUARTILE DEVIATION
a)
It
is not capable of further algebraic treatment.
b) It is possible for two distributions to have
equal quartile deviation, but quite dissimilar variability at the lower and
upper 25% scores. This may lead to incorrect conclusions.
c)
It
is unduly affected by a considerable clustering of scores at any one end of a
distribution.
3) THE STANDARD DEVIATION
The standard deviation or
SD is the most stable index of variability and is customarily employed in
experimental work and in research studies. The SD differs from the average
deviation in several respects. In computing the average deviation, we disregard
signs and treat all deviations as positive, where as in finding the SD we avoid
the difficulty of signs by squaring the separate deviations. Again the squared deviations used in computing
the SD are always taken from the mean, never from the median or mode. The
conventional symbol for the SD is the Greek letter Sigma (σ).
The
formula for calculating SD of a few scores is
σ = √ (∑x2 / N)
Let us find
the SD from the following scores:
16, 18, 20, 22, 24
The mean of
the scores = (16+18+20+22+24)/5= 20
Scores
|
Mean
|
Deviation from the mean( x )
|
Square of deviation(x2)
|
16
|
20
|
-4
|
16
|
18
|
20
|
-2
|
4
|
20
|
20
|
0
|
0
|
22
|
20
|
2
|
4
|
24
|
20
|
4
|
16
|
N = 5
|
∑x2 = 40
|
σ = √ (∑x₂/N)
= √ (40/5)
= 2.83
SD
from the grouped data can be obtained by the formula:
σ = √ (∑fx2/N)
Class
|
Midpoint
(X)
|
Frequency
(f)
|
fX
|
Deviation from mean(x)
|
fx
|
fx2
|
0-2
|
1
|
1
|
1
|
-10
|
-10
|
100
|
3-5
|
4
|
3
|
12
|
-7
|
-21
|
147
|
6-8
|
7
|
1
|
7
|
-4
|
-4
|
16
|
9-11
|
10
|
2
|
20
|
-1
|
-2
|
2
|
12-14
|
13
|
3
|
39
|
2
|
6
|
12
|
15-17
|
16
|
3
|
48
|
5
|
15
|
75
|
18-20
|
19
|
2
|
38
|
8
|
16
|
128
|
N=15
|
∑fX =165
|
∑fx2 =480
|
Mean
= ∑fx / N
= 165 / 15 = 11
σ
=
√ (∑fx₂/N)
If
frequencies are large, this procedure may involve complex calculations. So a
short cut method can be used to calculate SD using the formula:
σ =
i √ [ (∑X2/N) - (∑X/N) 2
]
Class
|
Frequency
(f)
|
Deviation
(X)
|
fX
|
fX2
|
50-54
|
3
|
4
|
12
|
48
|
45-49
|
4
|
3
|
12
|
36
|
40-44
|
5
|
2
|
10
|
20
|
35-39
|
8
|
1
|
08
|
08
|
+ 42
|
||||
30-34
|
10
|
0
|
0
|
0
|
25-29
|
06
|
-1
|
-6
|
6
|
20-24
|
04
|
-2
|
-8
|
16
|
15-09
|
04
|
-3
|
-12
|
36
|
20-24
|
03
|
-4
|
-12
|
48
|
15-09
|
03
|
-5
|
-15
|
75
|
-53
|
||||
i = 5
|
N = 50
|
∑fX=-11
|
∑fX2=293
|
σ
= i x √ [(∑X2/N) - (∑X2/N)2 ]
= 5 x √ [(293/50) - (11/50)2 ]
=
5 x √5.81 = 12.05
USE OF STANDARD DEVIATION
a)
When the statistics having the greatest
stability is sought.
b)
When extreme deviations should exercise a
proportionally greater effect up on variability.
c) When coefficients of correlation and other
statistics are subsequently to be computed.
MERITS OF STANDARD DEVIATION
a)
It is well defined and its value is always definite.
b)
It is based on all the scores in the data.
c)
It is amenable to algebraic treatment and possess
many useful mathematical properties, this
is why it is used in many advanced statistical studies.
d)
It is less effected by fluctuations in
sampling than most other measures of variability.
DEMERITS OF STANDARD DEVIATIONS
a)
Statistical interpretation using SD is
comparatively difficult.
b)
It gives more weightage to extreme scores and
less to those which are near the mean, because the squares of the deviations are
taken. These squares will become very large as the deviations increase.
SUMMARY
There is a tendency for data to be dispersed,
scattered or to show variability around the average or the central value. This
tendency is known dispersion or variability. Range is the simplest but very
rough measure of variability. Range is
the difference between the highest and the lowest scores of the series, and
thus depends only on the position of two extreme scores and, as such it is not
reliable.
Quartile
deviation is designed as the semi inter quartile range and is computed by the
formula,
Q = (Q3 – Q1) / 2
Where
Q1 and Q3 represents the first and third quartiles of distribution. It is more
stable than the range, but it also fails to take in to account the fluctuations
of all the items in series. Standard deviation, denoted by symbol `σ ‘is the
square root of arithmetic average of the squared deviations of scores from the
mean of the distribution. It is regarded as the most stable and reliable
measure of variability.
In an ungrouped data, the formula for SD is
σ = √ ∑X2/ N ,
In
case of grouped data, the formula for SD is σ =
√ ∑fX2 / N and
The
shortcut formula is σ = i x √ [(∑fX2/N) - (∑fX2/N)2 ]
.
In
computation of further statistics from the measure of dispersion, we always
prefer to compute standard deviation (SD) to all other measures of variability.
REFERENCES
a)
Henry .E. Garrett - Statistics in Psychology
and Education.
b)
S.K. Mangal -
Statistics in Psychology and Education.
Subscribe to:
Posts (Atom)