Tuesday, 3 January 2017

MEASURES OF VARIABILITY



INTRODUCTION
                          
                       The calculation of three measures of central tendency- the mean, the median, and the mode measures typical or representative of asset of scores as a whole. To find some measure of the variability of our scores, that is, of the “scatter” or “spread” of the separate scores around their central tendency. If a group is homogeneous, that is, made up of individuals of nearly the same ability, most of its scores will fall around the same point on the scale, the range will be relatively short and the variability small. But if the group contains individuals of widely differing capacities, scores will be strung out from high to low, the range will be relatively wide and the variability large.
                        Four measures have been devised to indicate the variability or dispersion within a set of measures. These are (1) the range, (2) the quartile deviation or Q, (3) the standard deviation or SD, and (4) the average deviation. Here we need to focusing on the following measures of variability, (1) the range, (2) the quartile deviation or Q, and (3) the standard deviation.


CALCULATION OF MEASURES OF DISPERSION

1)    THE RANGE
          The range may defined that the interval between the highest and lowest scores. The range is the most general measure of spread or scatter, and is computed when we wish to make a rough comparison of two or more groups for variability. The range takes account of the extremes of the series of scores only, and is unreliable when the area N is small or when there are large gaps (zero f’s) in the frequency distribution. Suppose that the highest score in a distribution is 120 and there is a gap of 20 points before we reach 100, the lowest score is 60, the single high score of 120 increases the range from 40 (100-60) to (120-60).

  USE OF RANGE
a)                 When the data are too scant or too scattered to justify the computation of a more precise measure of variability.
b)                When a knowledge of extreme scores or of total spread is all that is wanted.

   MERITS OF RANGE
                 It can be easily calculated and understood.
  

DEMERITS OF RANGE
a)                 It helps us to make only a rough comparison of two or more groups with respect to the variability of the scores concerned.
b)                It is very greatly affected by fluctuations in sampling. Its value is never stable.
c)                 It takes into account only the two extreme end scores of a distribution and is unreliable when N is small or when there are large gaps in the frequency distribution.
d)                The range does not take into account the composition of a group or a nature of distribution of the scores within the extremes. The range of a symmetrical and asymmetrical distribution can be identical.

2)  THE QUARTILE DEVIATION
                            The quartile deviation or Q is one-half the scale distance between the 75th and 25th percentiles in a frequency distribution. The 25th percentile or Q1 is the first quartile on the score scale, the point below which lie 25% of the scores. The 75th percentile or Q3 is the third quartile on the score scale, the point below which lie 75% of the scores. Quartile deviation is expressed by the formula:
                            Q = (Q3-Q1)/2
                   To find Q, we must first compute the 75th and 25th percentiles. These statistics are found in exactly the same way as was the median, which is, the 50th percentile or Q2. The only difference is that ¼ of N is counted off from the low end of the distribution to find Q1 and that ¾ N is counted off to find Q3. The formulas are:
                                         Q1= l + [i (N/4 – Cum f1)]/ fq 
                                                Q3= l + [i (3N/4 – Cum f1)] / fq
    Where l = the exact lower limit of the interval in which the quartile falls.
                          i = the length of the interval.
                Cum f1 = cumulative f upto the interval which contains the quartile
                         fq = the f on the interval containing the quartiles.

Quartile deviation of the ungrouped data can be calculated as follows:

Scores: 12, 12, 15, 17, 20, 25, 25, 26, 33, 37, and 4
                  Q = (Q3 - Q1)/2
Since there are 11 scores the 3rd score is Q1 and the 9th score is Q3. So,
                                    Q = (9th term – 3rd term)/2
                                        = (33 – 15)/2
                                       = 9
           
Quartile deviation from grouped data can be calculated as follows:
Class
Frequency
Cumulative frequency
120 – 139
50
1000
100 – 119
150
950
80 – 99
500
800
60 – 79
250
300
40 – 59
50
50
                     N = 1000
                                                                      
                                               
                                    Q1 = l + [i (N/4 – Cum f1)]/ fq
                                          = 595 + [20(250-50)] / 250
                                          = 75.5
                                  
                                    Q3 = l + [i (3N/4 – Cum f1)] / fq
                                          = 79.5 + [20 (750 – 300)] / 500
                                          = 97.5
                                                                       
                                      Q = (Q3 – Q1) / 2
                                          = (97.5 – 75.5) / 2
                                          = 11

USE OF QUARTILE DEVIATION
a)    When the median is taken as a measure of central tendency.
b)    When the details of the distribution at either end is available.
c)   When there are scattered or extreme scores which would influence the standard deviation disproportionately.


MERITS OF QUARTILE DEVIATION
a)    It is a more representative and trust worthy measure of variability than the range.
b)   It is a good index of score density at the middle of the distribution.
c)    It is useful in indicating the skewness of a distribution.
d)   Like the median, it is applicable to open-end distributions.

DEMERITS OF QUARTILE DEVIATION
a)    It is not capable of further algebraic treatment.
b)   It is possible for two distributions to have equal quartile deviation, but quite dissimilar variability at the lower and upper 25% scores. This may lead to incorrect conclusions.
c)    It is unduly affected by a considerable clustering of scores at any one end of a distribution.

3) THE STANDARD DEVIATION
                    The standard deviation or SD is the most stable index of variability and is customarily employed in experimental work and in research studies. The SD differs from the average deviation in several respects. In computing the average deviation, we disregard signs and treat all deviations as positive, where as in finding the SD we avoid the difficulty of signs by squaring the separate deviations.  Again the squared deviations used in computing the SD are always taken from the mean, never from the median or mode. The conventional symbol for the SD is the Greek letter Sigma (σ).
The formula for calculating SD of a few scores is
                                                        σ = √ (∑x2 / N)
                                  Let us find the SD from the following scores:
                                                            16, 18, 20, 22, 24
                                  The mean of the scores = (16+18+20+22+24)/5= 20


Scores

Mean
Deviation from the mean( x )
Square of deviation(x2)
16
20
-4
16
18
20
-2
4
20
20
0
0
22
20
2
4
24
20
4
16
N = 5


∑x2 = 40

                                           σ = √ (∑x/N)
                                            = √ (40/5)
                                              = 2.83

SD from the grouped data can be obtained by the formula:
                                         σ = √ (∑fx2/N)

Class
Midpoint
(X)
Frequency
(f)
        fX
Deviation from mean(x)
fx
fx2
0-2
1
1
1
-10
-10
100
3-5
4
3
12
-7
-21
147
6-8
7
1
7
-4
-4
16
9-11
10
2
20
-1
-2
2
12-14
13
3
39
2
6
12
15-17
16
3
48
5
15
75
18-20
19
2
38
8
16
128


N=15
∑fX =165


∑fx2 =480


                                                          Mean =   ∑fx / N
                                                                      =  165 / 15 = 11

                                                          σ          =  √ (∑fx/N)

If frequencies are large, this procedure may involve complex calculations. So a short cut method can be used to calculate SD using the formula:
                                    σ = i  √ [ (∑X2/N) - (∑X/N) 2 ]



  Class
Frequency
       (f)                                                                                               
Deviation           (X)
fX
           fX2
50-54
       3
4
12
48
45-49
       4
3
12
36
40-44
       5
2
10
20
35-39
       8
1
08
08



+ 42






30-34
       10
0
0
0
25-29
       06
-1
-6
6
20-24
       04
-2
-8
16
15-09
       04
-3
-12
36
20-24
       03
-4
-12
48
15-09
       03
-5
-15
75

      

-53

i = 5
     N = 50

      ∑fX=-11
∑fX2=293


                                                          σ = i x √ [(∑X2/N) - (∑X2/N)2 ]
                                                             = 5 x √ [(293/50) - (11/50)2
                                                             = 5 x √5.81 = 12.05
USE OF STANDARD DEVIATION
a)    When the statistics having the greatest stability is sought.
b)    When extreme deviations should exercise a proportionally greater effect up on variability.
c)   When coefficients of correlation and other statistics are subsequently to be computed.

 MERITS OF STANDARD DEVIATION
a)     It is well defined and its value is always definite.
b)    It is based on all the scores in the data.
c)     It is amenable to algebraic treatment and possess many useful mathematical  properties, this is why it is used in many advanced statistical studies.
d)    It is less effected by fluctuations in sampling than most other measures of variability.

DEMERITS OF STANDARD DEVIATIONS
a)                 Statistical interpretation using SD is comparatively difficult.
b)                It gives more weightage to extreme scores and less to those which are near the mean, because the squares of the deviations are taken. These squares will become very large as the deviations increase.

SUMMARY
                            There is a tendency for data to be dispersed, scattered or to show variability around the average or the central value. This tendency is known dispersion or variability. Range is the simplest but very rough measure of variability. Range is the difference between the highest and the lowest scores of the series, and thus depends only on the position of two extreme scores and, as such it is not reliable.

Quartile deviation is designed as the semi inter quartile range and is computed by the formula,
                                    Q = (Q3 – Q1) / 2     
                           Where Q1 and Q3 represents the first and third quartiles of distribution. It is more stable than the range, but it also fails to take in to account the fluctuations of all the items in series. Standard deviation, denoted by symbol `σ ‘is the square root of arithmetic average of the squared deviations of scores from the mean of the distribution. It is regarded as the most stable and reliable measure of variability.
                                               
In an ungrouped data, the formula for SD is σ =  √ ∑X2/ N ,
                                                
In case of grouped data, the formula for SD is σ =  √ ∑fX2 / N  and
                                               
The shortcut formula is σ = i x √ [(∑fX2/N) - (∑fX2/N)2 ] .
                                    
                      In computation of further statistics from the measure of dispersion, we always prefer to compute standard deviation (SD) to all other measures of variability.

REFERENCES
a)                 Henry .E. Garrett - Statistics in Psychology and Education.
b)                S.K. Mangal          -  Statistics in Psychology and Education.



No comments:

Post a Comment