What is a statistic?
A statistic is any quantity that is calculated from the data in a sample, such as the mean and the standard deviation, which characterises an important aspect in the sample. Statistics are useful mathematical tools for data analysis. Statistical tools are so useful that statistical analyses are performed by individuals from many disciplines – from those in STEM fields such as people who conduct clinical studies to those in finance and management.
Why bother with statistics?
All experimental measurements contain some variability. When one makes experimental measurements, one should question the reliability of one’s measurements. Given that no conclusion can be drawn at 100% certainty because doing so would require us to make an infinite number of measurements, statistical analyses gives us tools to determine estimates from a finite number of measurements. From these estimates, we can draw conclusions at certain probabilities.
Data Analysis using Statistics
Dealing with a large number of experimental measurements and using statistics by hand to analyse your data can be labourious and time-consuming. For this reason, we strongly recommend using software such as Microsoft Excel to perform all calculations. Microsoft Excel has a feature called “Descriptive Statistics”, which we discussed in another article. For more advanced statistical analyses, the likes of GraphPad Prism and Minitab may be used.
In the analytical sciences, the number of measurements or data points, sometimes referred to as the sample size, is given the symbol n. Replicate measurements are typically performed in triplicate (n = 3). At times, an analytical scientist may perform measurements using sample sizes higher than 3.
Sample Mean and Sample Standard Deviation
The absorbance values of a solution of aspirin were obtained in nonuplicate using UV-Vis spectrophotometry and are shown below:
0.273, 0.275, 0.271, 0.275, 0.274, 0.275, 0.279, 0.278, 0.281
Perform an analysis of the measurements above using statistics.
Mean and Standard Deviation
The mean (x̄) can be defined as the average of the data. Means can be calculated by adding all of the data (x) and dividing the sum by the sample size (n) as per the formula below.
Given that the measurements were performed in nonuplicate, n is equal to 9. For the problem shown above, the mean works out as 0.276. The calculation is shown below:
We can calculate a summary measure of the variability (or dispersion or spread) of the absorbance measurements above using a statistic known as standard deviation (s). The sample standard deviation is defined by the formula below:
The sample size minus one (n-1) is referred to as the degrees of freedom and the square of the standard deviation is known as the variance. In the problem shown above, n-1 = 8. A low standard deviation is indicative of low statistical dispersion for a set of measurements. If one wishes to calculate the standard deviation by hand, one may do so by doing the following:
Step 1: Determine the mean
We determined the mean earlier as x̄ = 0.276.
Step 2: Tabulate the data as shown below. Place the mean value under the x̄ column and the nine absorbance values under the x column. Leave the (x-x̄) and (x-x̄)2 columns blank for now.
Step 3: Fill the (x-x̄) column for the deviation from the mean by subtracting the mean (x̄) from each absorbance value in each row as shown below.
Step 4: Fill the (x-x̄)2 column for the squared deviation by squaring the (x-x̄) values in each row as shown below.
Step 5: Add the (x-x̄)2 and then divide by the degrees of freedom to determine the variance.
Step 6: Calculate the standard deviation by determining the square root of the variance.
The square root of the variance 0.000010 works out as 0.0031.
A low standard deviation of 0.0031 is indicative of low scatter for the absorbance values. Again, since this calculation can be labour-intensive, we recommend the use of software.
Relative Standard Deviation or Coefficient of Variation
We can further assess the precision of a measurement by calculating a statistic known as the Coefficient of Variation (Cv) which is also referred to as Relative Standard Deviation (RSD). In the analytical sciences, Cv values are often in the 1-5% range. Values of less than 1% are considered excellent. Cv as a percentage is simply the standard deviation divided by the mean and multiplied by 100.
A set of measurements with a small standard deviation and small coefficient of variation is deemed to be precise. It is important to note that greater precision does not necessarily mean greater accuracy!
For the nine absorbance values above, the Cv works out as 1%.
Standard deviations and Cv values provide insight on the dispersion of data. Recall that we cannot draw conclusions with 100% certainty as doing so would require an infinite number of measurements and that we use statistics to calculate estimates of “true values” from a finite number of data points. In order to quote uncertainty, we construct confidence intervals (CI) at certain probability levels, typically at 90%, 95% or 99% (95% is quite common). CIs are a range of values about the sample mean which at a certain probability contains the “true mean” (µ). Note: For this part, you will need statistical tables!
Using the expression above, we can construct a 95% confidence interval for the absorbance values. We have already determined the sample size (n) = 9, degrees of freedom (n-1) = 8, mean (x̄) = 0.276 and standard deviation (s) = 0.0031. We can obtain t-values from statistical tables or calculators at certain confidence levels. Based on statistical tables, tn-1 = 2.306 where the degrees of freedom (n-1) = 8. All we need to do now to construct a 95% CI is to fill the expression above with the numbers and perform our calculations.
We can state with 95% confidence that the “true mean” absorbance value for the solution of aspirin analysed lies between 0.274 and 0.278.
Notes on the CI:
- Note that there is an inverse square relationship between CIs and n.
- Therefore, the magnitude would decrease as n increases
The Median and the Mode
Sometimes, it may also be useful to report the mode and the median.
- The median is simply the middle number when the numbers in a dataset are arranged in order.
- The mode is the most commonly occurring value in a dataset.
The median and the mode for the nine absorbance values work out as 0.275.
Descriptive Statistics Using Excel
Excel can rapidly perform statistical calculations using the Descriptive Statistics feature. The output for the nonuplicate measurements of absorbance would be as follows:
A sample was analysed by HPLC in octuplicate. The peak areas (arbitrary units) are as follows:
2542980, 2790246, 2456146, 3099715, 2455472, 2766540, 2656349, 2940676
Perform statistical analyses of the peak area values above and comment on your calculations. You should get the answers shown below (or at least similar to):
Cv = 9%
Want to learn more about pharmaceutical calculations, statistics and the like? Be sure to come back to our section on pharmacy calculations as we add even more great content to help you with your study.