Sunday, January 17, 2010

How to lie with statistics

1. The sample with the built-in bias (sources of bias)

A report based on sampling must use a representative sample, which is one from which every source of bias has been removed.

b. The dependability of a sample can be destroyed just as easily by invisible sources of bias as by these visible ones. That is, even if you can't find a source of demonstrable bias, allow yourself some degree of skepticism about the results as long as there is a possibility of bias somewhere. There always is.

c. The test of the random sample is this: Does every name or thing in the whole group have an equal chance to be in the sample?

The purely random sample is the only kind that can be examined with entire confidence by means of statistical theory, but it is so difficult and expensive to obtain for many uses that sheer cost eliminates it. A more economical substitute, which is almost universally used in such fields as opinion polling and market research, is called stratified random sampling. To get this stratified sample you divide your universe into several groups in proportion to their known prevalence, then you get a random sample within the stratification.

d. bias introduced by unknown factors. e.g. a desire to give a pleasing answer.

2. The well-chosen average

Average of what? Who's included? What kind of average (median, mean, mode)? Who says so, how he knows and how accurate the figure is.

3. The little figures that are not there

It is the one that tells the range of things or their deviation from the average that is given. Knowing nothing about a subject is frequently healthier than knowing what is not so, and a little learning may be a dangerous thing.

No comments: