Made You Think: 53: The Devil is in The Data: How to Lie with Statistics by Darrell Huff

53: The Devil is in The Data: How to Lie with Statistics by Darrell Huff

Sep 4, 2018

“When you hear a statistic say that the average American brushes their teeth 1.02 times a day, ask yourself how could they have figured it out? Does it make sense that it could have been researched effectively? In this case they would have had to ask and don't you think it's a safe assumption that people lied?”

In this episode of Made You Think, Neil and Nat discuss How to Lie with Statistics by Darrell Huff. In this book we learn how to spot deceptive statistics, ways surveys are manipulated and the hidden agenda behind every piece of data.

“If you can’t prove what you want to prove, demonstrate something else and pretend that they are the same thing. In the daze that follows the collision of statistics with the human mind, hardly anybody will notice the difference. The semi-attached figure is a device guaranteed to stand you in good stead. It always has.”

We cover a wide range of topics, including:

Biased samples & discarded data
Stereotypes, demographics and diversity in data
The Sphinx, Aquatic Apes and Conspiracy Theories
Grapefruits, Graphs and Guantanamo
How to question and uncover the truth behind statistics

And much more. Please enjoy, and be sure to grab a copy of How to Lie with Statistics by Darrell Huff!

You can also listen on Google Play Music, SoundCloud, YouTube, or in any other podcasting app by searching “Made You Think.”

If you enjoyed this episode, be sure to check out our episode on Influence by Robert B. Cialdini for a book with a similar structure, or the book Fooled by Randomness by Nassim Taleb for more on the deception of data.

Be sure to join our mailing list to find out about what books are coming up, giveaways we're running, special events, and more.

Links from the Episode

Show Topics

01:01 – Fun book to read, great pocket guide. Easy to internalize many of the ideas. Useful for everyday life and not getting tricked by data. People rely on data, easily let their opinion be swayed by statistics. The book shows there are so many ways to game a statistic. Learning these rules will serve you well.

03:02 – Lots of overlap to Fooled by Randomness, similar themes for similar problems. This not a new book. Published in 1954 and is more relevant today than ever.

03:38 – Amazon reviews, can’t rely on reviews to be honest, for books, restaurants etc. People give arbitrary scores for unrelated reasons. Scoring using 1-5 or 1-10 isn’t a useful benchmark. Don’t use 7 as a score, 6 or 8 have more concrete meanings. Book reviews skewed by the emotion you feel after reading. Books that are feel-good are rated higher even though if they’re not useful over the long term.

07:23 – Bonus material, 25 minutes, mini-episode on Sphinx conspiracy theories. Check out the Patreon to get it.

07:33 – Book structure, 6 chapters. Different ways statistics can be manipulated. Final chapter gives questions on how talk back to statistics. How to think about data. Similar layout and structure to the book Influence.

08:21 – Biased samples. Where a sample is not representative or too narrow, results are also going to be the same. Psychiatrist example – everyone seems neurotic if you only work with neurotic people. Jimmy Fallon sketch, testing people’s geography knowledge. The joke is that Americans are stupid but they only show those that fail. Also the environment and element of surprise impacts data too. Biased data can’t tell you anything useful.

10:39 – Media portrayal of Trump voters. Using unflattering stereotypes that then becomes accepted as the norm. Media also uses the tactic of showing biased stereotypes of protests and violence to influence opinions on the Middle-East.

11:54 – Statistics on deaths in school vs military. Total deaths may be more in school but this data gets used to imply probability and likelihood of death – which is a completely different statistic. Presenting data one way to provoke an alternative interpretation. Data is being used to tell a story that serves an agenda. When we hear a statistic we assume it’s real, we need to question it more.

14:06 – Discarded data – Example of gallup polls, who answers these polls? Do you know anyone who has been polled? This shows that the sample is not truly representative. Twitter surveys on evolution and skewed data due to restrictive demographics in sampling. The method of survey affects the outcome. Phone polls vs online polls change age demographic. Difficulty of getting a representative sample. All samples will be biased in some way. They key is knowing what is the bias in your sample so it can be corrected or highlighted. Hillary Clinton, opinion polls. Bernie Sanders on healthcare spending.

17:56 – Averages and mean, mode & median. How average can mean 3 different things and are used in certain scenarios. The term average doesn’t mean a lot, need to understand how it was calculated. Mean is hugely skewed by a single outlier but outliers make little difference to the median. As Taleb says, never cross a river that’s on average four feet deep. Averages for income, height, grades, education and how they should be calculated. You can use mean average on things like education because there is a limit to the number of degrees someone can have.

21:34 – Startups and how they calculate their daily active users or revenue per user can be deceptive. Year to date revenue gives a better understanding than monthly. Incomes in a neighborhood can change depending on the average that is used. Once can seem high to prop up real estate figures. The other can seem low to support home owner association protests. Both use the same data manipulated to serve an agenda and presented in different ways. When to use the Mode? Use mode when dealing with non-numerical values to discover the most fashionable or most popular item.

26:35 – Health resort promoting ill informed seminars on the nutritional value of meat. Lots of common myths that we don’t do much research on. The top result on google is not always accurate, it isn’t being fact checked so we should know to research these things.

28:15 – Bonus material. Sphinx and conspiracy theories. Theories not being taken seriously by archeologists. Aquatic apes, crony beliefs and things we want to be true.

29:51 – Difficult to research for everything you hear, you have a time limitation on having to form a belief. Find sources that you can trust and discount those who don’t have the authority to speak on a particular matter. Testing authority & parents. Authority and taking advice of doctors despite how long ago their education may have been.

32:01 – Dangers of listening to people who are not experts in a particular topic. Who is qualified to talk on a particular subject? Everyone thinks everyone should have an opinion on everything. If you trust someone in one area, don’t trust them on everything. The danger of intellectual heroes. Being fans of Taleb but knowing he is not always right. Admire someone’s work but don’t look to them for guidance on everything. Don’t agree with all someone’s opinions. Don’t criticize someone for favouring one viewpoint of someone you think is completely bad.

34:32 – Difficulties of political debate. Not possible to openly agree with Trump on a specific idea like tariffs. People automatically assume you agree with him on everything. Opioid manufacturers being indicted, seems like a great idea but you can voice those opinions. Politics as the new religion. Now is more like picking a side and blindly sticking to it. Loss of discourse. Idea sports.

38:21 – Political parties flip ideals when they are in charge. No incentive to pay down the national debt. Involves imposing unpopular cuts and taxes. Cutting unnecessary spending seems logical. Latest military jet, expensive but unfit for purpose.

43:40 – Changing opinion of Trump. He wasn’t as radical as everyone was expecting. He wants to win a second term. Bernie Sanders may be more the type of person to make radical changes. Bernie Sanders as a dream podcast guest. Debating with Andrew Yang. 2020 Election.

44:54 – Discarded data. Companies continue to run experiments until they get the outcome they want. Significant portions of experiments have been discarded. What is classed as a statistically significant result? If you run 1000 experiments and 999 fail to show significant results. Using the 1 result as showing something significant without presenting the rest of the data. Antidepressant studies show negligible impact compared to a placebo but also had lots of negative side effects. Yet only those studies that showed net positive effects got published.

46:48 – Cosmetics and food companies regularly use skewed samples in their data. Skin complaints and using regression back to the norm as proof of product working. Companies start another study and keep going until they get the results they need.

47:49 – Big Data. The larger size the data set the more likely you will be able to prove whatever you want by slicing the data in particular ways. Correlation and causation. Nicolas Cage movies vs School Shootings. Ice cream consumption vs murder rates. Church of the Flying Spaghetti Monster. Climate change vs Piracy. Nicolas Cage movies vs Swimming Pool Drownings. You can pair any two things together that rise and fall in the same trends. This does not mean that one affects the other.

Small samples have a huge variance. It’s possible to get 8/10 heads when flipping a coin but so much less likely to get 80/100 however the result is still the same. You can get a significant result by using a smaller data set. Most pharmaceutical tests are not done on women. Most drugs go to market without being thoroughly tested on the female biology, the interaction with estrogen, birth control. Limited studies on the interactions with other drugs. You would think it should be tested alongside common medications. Grapefruit juice and other fruits have properties in them that amplify the potency of certain drugs so you have to be careful not to take it alongside certain medications

54:39 – Graph manipulation. Show 3 different graphs with the same data but from different perspectives they look totally different. By having axes that don’t start at zero they don’t tell the whole picture. Zoom in on a significant portion of the results making the incline of the line on the graph steeper or shallower according to the data included. How you frame the graph makes a difference in the perception of the same data.

58:08 – Semi attached figure is when you say one thing and imply another. You can’t say something cures colds but you can say it kills 300k germs in 11 seconds in a test tube. This data then lets people make up their own minds and infer an incorrect conclusion. Cigarettes statistics and the preferred brand of physicians. The statistic doesn’t tell you anything. Weather and the number of accidents. Even though fog is more dangerous there will always be more accidents in clear weather because there is more clear weather days than foggy days. Trying to compare 2 stocks by share price is a common mistake.

01:02:08 – Statistics used for catchy headlines and for their shock value. Accidents in the home are more common – makes you feel like it’s more dangerous. True of anywhere that you spend the most time. You can represent the same data in many different ways so it sounds completely different.

01:04:38 – Correlation vs causation. Smoking vs low grades. Easy to infer that one causes the other but it could be the opposite or other lifestyle factors. People who eat McDonalds vs heart disease and correlating that to eating meat. Beer bellies and the correlation to poor health. Often combined factors including environment and other common habits associated to beer drinking.

01:08:40 – Changing attitudes to college. Myth of college equaling success. Negates the other factors of how you got to college that contributes to your success. You don’t get to see alternative histories. College popularity is dropping, poor choice of investment.

01:11:01 – How to talk to a statistic, questions to ask to understand the data you are being presented with. Who Says So? Who is telling you this information and what is their bias or agenda. When presented with impossible statistics think how did they get that data? Look at the demographics of academic psychological studies – most participants are college students. Think about if studies can be replicated.

01:15:29 – How Does He Know? Look out for evidence of a biased sample or a sample that has been improperly selected. Is the sample big enough to give a reliable conclusion.

01:15:44 – What Were their methods? Does it make sense that people could actually know this information? Cancer diagnosis and changing rates. Survival seems longer as we are detecting it earlier, doesn’t actually mean the treatments have an impact. Also people are living longer to become more susceptible to cancer. And a growth in population so naturally numbers will rise.

01:17:25 – What’s Missing? Looking at raw data can give you a true picture. Johns Hopkins and female students. Look at startup growth, how they measure it. Percentages don’t tell you if they have 100 users or 10k users. Raw percentages are misleading. This also happens with diversity, gender. Expecting women to be exactly 50% of elected representatives. However that doesn’t account for the application pool and what happens when you reach that 50%. Do you limit diversity? Male vs Female leadership in Wall Street Organizations. Sexism. Dichotomy creates oppression. When you try to balance you create an alternative discrimination.

01:23:10 – Did somebody change the subject? The reasons for collecting data often skew the results. Do people want to be counted, are people incentivized to give a truthful answer? China example, different census record, one for military and tax reasons the second for famine relief.

01:24:27 – Does it make sense? If you hear a statistic that doesn’t seem plausible or too incredible it’s usually a good sign to be skeptical.

01:25:08 – Bonus material, sphinx conspiracy theories, join the patreon to access it. Overall a good book, quick read, quite entertaining and funny. Super useful. Internalize the questions and use them against outlandish statistics. Look for multiple examples to prove something is good or bad. People often take one or two experiences and extrapolate that to mean always.

01:31:54 – If you want to know everything that's coming up on the show, get access to that on our Patreon. You also get our detailed book notes and really fun bonus material. We also do monthly Hangouts, next one is going to be like mid-September. We don't like ads we're going with the the crowdfunded method.

If you want another way to support podcast, go to MadeYouThinkPodcast. com/support. We have some of our wonderful partners there. Tell your friends about the show, shout us out on PornHub. Leave a review on iTunes.

Hit us up on Twitter, @NatEliason and @TheRealNeilS, we'll see you all next week.

If you enjoyed this episode, don’t forget to subscribe at https://madeyouthinkpodcast.com

Made You Think

53: The Devil is in The Data: How to Lie with Statistics by Darrell Huff

53: The Devil is in The Data: How to Lie with Statistics by Darrell Huff

Links from the Episode

Mentioned in the show

Books mentioned

People mentioned

Show Topics