Can we have confidence in statistics and why not? :)

in #sciencelast year

Since the beginning of pandemics, we are being bombed by statistical information about virus: number of sick, number of deaths, death rate, index of spreading, the famous curve that has to be flattened, probability that someone will catch virus...

Statistic is powerful tool for analyzing data of all kinds and modern world cannot be imagined without it. Using it in right way can be very helpful and misusing it very harmful. This means that we have to be careful with interpretation of given numbers and predictions.
First important thing for reliability is methodology of gathering information – there cannot be trusted output if input is not valid enough; that implies misleading in decision making.
Next important thing is that statistical data are more reliable when the coverage level of people (or data) is higher. Predictions are being made with information about piece of the population and then being generalized, so, bigger sample better prediction. Still, there are some situations and data where more does not mean better because of standard deviations and other indicators of usefulness of data. For example, if we have 100 persons and each of them have different amount of coins (from 1 to 100), statistically they have 50.49 in average with SD 28.99; that means if we generalize this, it will be useless since only one person actually has that much coins, for all others it would be wrong guess.
(Statistical packages have correction algorithms for small samples and missing variables but that is a bit more complex story, and even with that, it is not the same as when there is real data.)

What all this means for us, as individuals? Not much :) Why?
Let us take this pandemics as an example. Statistics may say that probability of catching the virus is 1 or 2% (this number is changing with new data but now is just an example, does not have to be accurate). Now we think, “hey, this is not much, why all the noise about it? chances are so small, I can`t really be in danger”; then, we go to the store and behave as usual, without paying attention to preventive measures. Few days after, we realize that we are sick. “How that happened, why I was so unlucky to be in this 1%?!” And now, we became data for some new statistics that is not any more 1 but 2%.

The thing is that statistics is not very useful for us in the individual level, it is binary: you either will catch virus or not, get symptoms or not, die or not. That depends on our immune system, current environment but also of our decisions about what and how we are going to do. This also stands for other things in our lives.

Let`s say that probability to experience some firearm attack is 0.5%. Does that really tells us how safe are we on the streets? In theory, having it generalized like that, yes. For me as a person, that means it will happen or not. I can influence with my decisions about when and where I will go, considering which places are more dangerous; except, of course, when find myself in the bank during the robbery, that would be really unfortunate circumstances.

Next example, very common, talking about average salary, maneuver with which politicians try to tell us how reach we are; really? well, they are for sure :) The rest of us, we either have those average 1000 or 2000e or not. Besides, how valid is that information without knowing the cost of basic life supplies (expensive cloths, restaurants and vacations not included :))?

Bottom line, statistical data and individual lives are interdependent and realizing limitations of both is big step in decision making.

P.S. Summer is coming and here is an example of common commercial spin: If you buy this product it will increase your chance for losing weight up to 50%. Great, with my way of life I had just 10% now it`s 60% right? Wrong. Current chances 10. 50% of 10 is 5. 10+5=15. Workout shall be then :D

Keep physical distance and change statistical data ;)