Degree of spread or variation of the variable about a central value
To know how widely the observations are spread on either side of the average
Range
Simplest method
Difference b/w the value of the largest item & the smallest item
Mean Deviation
Avg of the deviations from the arithmetic mean
Standard Deviation
Most common and most appropriate measure of dispersion
Root-mean-square (RMS) deviation of
the values from their mean
the square root of the variance
Interpretation of SD
Large SD: Data points are far from the mean
Small SD: Data points are clustered closely around the mean
Uses of SD in biostatistics
– Summarizes the deviation of a large distribution from mean
– Indicates whether the variation of difference of an individual from the mean is by chance
– Helps in finding the standard error
– Helps in finding the suitable size of sample for valid conclusions
Coefficient of Variation (COV)
Used to compare relative variability
Unit-free measure to compare dispersion of one variable with another
COV = SD/Mean × 100
Standard Error of Mean (SE mean)
measure of variability of sample summaries
SEmean is the SD of sample means
measure of chance variation
It does not mean an error
SE mean = SD/√Sample size
Z Score (Standard Score)
aka Normal deviate
Is difference of a value from group mean, in terms of how many times of SD
Z score = (Individual
level − Mean)/Standard
deviation
Indicates how many SDs an observation is above or below the mean
DISTRIBUTIONS
Poisson Distribution
Discrete probability distribution
Probability of a no. of events occurring in a fixed period of time
Normal Distribution
aka
-Gaussian distribution
-Standard distribution
Distribution of values of a quantitative variable such that
they are symmetric with respect to a middle value & then
the frequencies taper off rapidly and symmetrically on both sides
Bell shaped distribution
Mean= Median = Mode
Bilaterally symmetrical curve
Mean = 0
SD = 1
SD = √Variance
In Normal distribution, SD= 1, thus Variance = 1
• Mean ± 1SD (μ ± 1σ) covers 68% values
• Mean ± 2SD (μ ± 2σ) covers 95% values
• Mean ± 3SD (μ ± 3σ) covers 99% values
Skewness of Central Tendency
Measure of asymmetry of a probability distribution of a random variable
Measures of skewness
Pearson’s mode or 1st Skewness coefficient = (Mean – Mode)/SD
Pearson’s median or 2nd Skewness coefficient = 3(Mean – Median)/SD
Quartile skewness = (Q3 – 2Q2 + Q1)/ Q3 – Q1
Asymmetrical distributions
Right (positive) skew
Right skewed curve
Mean > Median > Mode
Left (negative) skew
Left skewed curve
Mean < Median < Mode
SAMPLING
Sample size depends upon
-The effect size (usually the difference between 2 groups)
- The population SD (for continuous data)
- The desired power of the experiment to detect the postulated effect
Power =1–beta
-The significance level (alpha)
Types of Sampling
Random Sampling
aka
Probability sampling
Non-purposive sampling
Non-random sampling
aka
Non-probability sampling
Purposive sampling
Types of Random Sampling
Simple Random Sampling
Every unit of population has equal and known chance of being selected
aka Unrestricted random sampling
Applicable for small, homogenous and readily available populations
Used in clinical trials
Methods
Lottery method
Random no. tables
Computer software
Systematic Random Sampling
Based on sampling fraction
Every Kth unit is chosen in the population list,
where K is chosen by sampling interval
Sampling Interval (K) Q = Total no. of units in population/
Total no. of units in sample
Applicable for large, non-homogenous populations where complete list of individuals is available
Stratified Random Sampling
Non-homogenous population is converted to homogenous groups/classes (strata); sample is drawn from each strata at random, in proportion to its size
Applicable for large non-homogenous population
Gives more representative sample than simple random sampling
None of the categories is under or over-represented
Multistage Random Sampling
Is done in successive stages; each successive sampling unit is nested in the previous sampling unit
Eg, In large country surveys, states are chosen,
then districts, then villages, then every
10th person in village as final sampling unit
Advantage: Introduces flexibility in sampling
Multiphase Random Sampling
Is done in successive phases; part of information is obtained from whole sample and part from the sub-sample
Eg, In a TB survey, Mantoux test done in first phase, then X-ray done in all Mantoux positives, then sputum examined in all those with positive X-ray findings
Cluster Random Sampling
Applicable when units of population are natural groups or clusters
WHO technique used in CRS:
30 × 7 technique (total = 210 children)
* 30 clusters, each containing
* 7 children who are 12 – 23 months age and
are completely immunized for primary
immunization (till Measles vaccine)
Clusters are heterogeneous within themselves but homogenous with respect to each other
Sampling interval is also calculated in CRS
Accuracy: Low error rate of only ± 5%
Limitation: Clusters cannot be compared with each other.
Types of Non-Random Sampling
Convenience Sampling
Patients are selected, in part or in whole, at the convenience of the researcher
Limited attempt to ensure that sample is an accurate representation of population
Quota Sampling
Population is first segmented into mutually exclusive sub-groups (quotas), then judgment is used to select the units from each group non-randomly
Type of convenience sampling
Snow-ball Sampling
A technique for developing a research sample where existing study subjects recruit future subjects from among their acquaintances.
Thus the sample group appears to grow like a rolling snowball
Is often used in hidden populations which are difficult for researchers to access