? Whatever the arguments, the process where one puts random signal through the statistical pipeline, and one recovers at the tail end of it the D-K effect, seems pretty damning to my mind.
To be certain I'd have to replicate run the scripts myself. The author seems to me of the kind that does that already. Does his own replication, and publishes open source open data. Am not motivated enough to put the effort into this. But given you were motivated enough yourself to blog about it - maybe you will be less lazy than myself, replicate, and report what you find.
Thanks for writing on Substack for all of us to read. :-)
Yeah I've seen that! It's a big problem with the original paper. As far as I know it doesn't generalize to all of the papers finding a similar effect, since the criticism relies on individuals make a relative judgement of their abilities (eg above our below average) and other studies have them make a guess at their objective score, but the regression to the mean issue is related and generalizes further I think
Didn't expect to respond, butttt - Procrastination seems less apt to occur these days than it used to. I am 84 - no telling when I won't be here to 'do the deed', to 'engage'. I sense there is a stronger possibility that I may not be here tomorrow morning to respond to your (anyone's) commentary. Therefore, I am less apt to procrastinate. Love your commentary, Tommy! Makes me thoughtful.
It's a good criticism of the original paper! I don't think it holds for some other forms of the finding (like if you have people rate themselves on an objective rather than relative scale, I don't think this criticism holds), but it definitely does of the original paper. It's surprising how influential that paper was given how problematic the analysis is
> It's surprising how influential that paper was given how problematic the analysis is
I wonder if it tells enough people what they want to hear, that they uncritically amplify it it. The real "D-K Effect" may be exposing cognitive bias of those who parrot the study's conclusions.
It's been a while since I read it, but I liked it! It's a fun, vivid examples of the perils of doing shitty statistics. fMRI as a method is particularly susceptible to bad statistics because it's so high-dimensional, so it's super easy to get false positives.
What I don't love is that sometimes the study is used as a blanket condemnation of fMRI. But it isn't a problem with fMRI (which is a very limited but useful tool) but with certain ways of analyzing fMRI data that the paper was pointing to as problematic.
Thank you. Yes, that's what I understood about it. But it has been on my mind since it seems to me as if the statistics required for more advanced technologies are usually much more complicated than their usual users are trained for. The more advanced the technology, the higher the required technical and statistical knowledge, the easier to get positive results, and the fancier the articles. So both incentives and knowledge gap would presumptively lead to problematic research practices.
That's a reasonable concern. But there's some interesting details in the interaction between complexity of technology and complexity of required stats that at least make the relationship more complicated. The basic problem the salmon study was pointing to was the need to correct for multiple comparisons, which is a pretty basic statistics idea. But then there's a bunch of other complicated stuff--remapping each brain onto a common brain, taking into account the lagtime of the hemodynamic response, etc--that is way more tool-specific and I wouldn't expect most fMRI users to fully understand (let alone be able to code on their own). But luckily there are common analysis tools for fMRI that take care of all of that (and, typically, the standard multiple-comparison issue) without the user needing to know the nitty-gritty details.
Meanwhile, there is plenty of basic survey or other behavioral data that would be best studied with more sophisticated stats. Dunning-Kruger is an interesting example, the data is super simple, no advanced technology, but the meta-cognition interpretation has only survived due to a lack of stats knowledge to properly test it (I've seen similar examples in my field, where entire careers have been based on a theory that was entirely based on bad stats).
I don't know, this was long and rambley. Maybe this would make a good topic for a future reader questions post so I can say something more coherent about it.
Great explanation. Thank you. You are right, I was mistly thinking about tool specific challenges and the lag between creating tools and creating sufficiently tuned analysis packages. But I totally agree that more basic assessment approaches have also been subject of bad statistical practices
...interviewed 50 people who were currently hospitalized from traffic accidents in which they were the driver. Most of these accidents were described as "hit fixed object" hard enough to overturn the car. Police reports placed clear blame on the hospitalized drivers in more than two-thirds of these accidents and suggested it in many of the others. Yet when asked to rate their driving skills, the hospitalized drivers rated themselves closer to "expert" than to "very poor" on a 9 point scale. A second group of drivers, matched to the accident group in terms of age, sex, race, and education level, all with excellent driving records rated themselves no higher and no lower than the hospitalized accident group.
I wrote about Dunning-Kruger, the better than average effect, and all sorts of other overestimations of ability and control here. Later research by Dunning shows that people at all levels of ability exaggerate their competence, with the effect being particularly strong among the least and, surprisingly, the most competent. https://eclecticinquiries.substack.com/p/positive-illusions-the-psychology
On the Dunning-Kruger Effect - have you seen
The Dunning-Kruger Effect is Autocorrelation
https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/
? Whatever the arguments, the process where one puts random signal through the statistical pipeline, and one recovers at the tail end of it the D-K effect, seems pretty damning to my mind.
To be certain I'd have to replicate run the scripts myself. The author seems to me of the kind that does that already. Does his own replication, and publishes open source open data. Am not motivated enough to put the effort into this. But given you were motivated enough yourself to blog about it - maybe you will be less lazy than myself, replicate, and report what you find.
Thanks for writing on Substack for all of us to read. :-)
Yeah I've seen that! It's a big problem with the original paper. As far as I know it doesn't generalize to all of the papers finding a similar effect, since the criticism relies on individuals make a relative judgement of their abilities (eg above our below average) and other studies have them make a guess at their objective score, but the regression to the mean issue is related and generalizes further I think
Didn't expect to respond, butttt - Procrastination seems less apt to occur these days than it used to. I am 84 - no telling when I won't be here to 'do the deed', to 'engage'. I sense there is a stronger possibility that I may not be here tomorrow morning to respond to your (anyone's) commentary. Therefore, I am less apt to procrastinate. Love your commentary, Tommy! Makes me thoughtful.
Thank you for including my question! I’m flattered that you thought it was interesting enough to write about.
It was a good question! And gave me an excuse to read/think more about it than I have before!
On Dunning-Kruger, what do you make of the findings that the presentation of the data is basically autocorrelation? https://www.scientificamerican.com/article/the-dunning-kruger-effect-isnt-what-you-think-it-is/ ... https://economicsfromthetopdown.com/2022/04/08/the-dunning-kruger-effect-is-autocorrelation/
Apparently, you can find the D-K "effect" in random data that's compiled the way they compiled data for their study.
It's a good criticism of the original paper! I don't think it holds for some other forms of the finding (like if you have people rate themselves on an objective rather than relative scale, I don't think this criticism holds), but it definitely does of the original paper. It's surprising how influential that paper was given how problematic the analysis is
> It's surprising how influential that paper was given how problematic the analysis is
I wonder if it tells enough people what they want to hear, that they uncritically amplify it it. The real "D-K Effect" may be exposing cognitive bias of those who parrot the study's conclusions.
My not so relevent question is what are your thoughts about that dead salmon fmri study?
It's been a while since I read it, but I liked it! It's a fun, vivid examples of the perils of doing shitty statistics. fMRI as a method is particularly susceptible to bad statistics because it's so high-dimensional, so it's super easy to get false positives.
What I don't love is that sometimes the study is used as a blanket condemnation of fMRI. But it isn't a problem with fMRI (which is a very limited but useful tool) but with certain ways of analyzing fMRI data that the paper was pointing to as problematic.
Thank you. Yes, that's what I understood about it. But it has been on my mind since it seems to me as if the statistics required for more advanced technologies are usually much more complicated than their usual users are trained for. The more advanced the technology, the higher the required technical and statistical knowledge, the easier to get positive results, and the fancier the articles. So both incentives and knowledge gap would presumptively lead to problematic research practices.
That's a reasonable concern. But there's some interesting details in the interaction between complexity of technology and complexity of required stats that at least make the relationship more complicated. The basic problem the salmon study was pointing to was the need to correct for multiple comparisons, which is a pretty basic statistics idea. But then there's a bunch of other complicated stuff--remapping each brain onto a common brain, taking into account the lagtime of the hemodynamic response, etc--that is way more tool-specific and I wouldn't expect most fMRI users to fully understand (let alone be able to code on their own). But luckily there are common analysis tools for fMRI that take care of all of that (and, typically, the standard multiple-comparison issue) without the user needing to know the nitty-gritty details.
Meanwhile, there is plenty of basic survey or other behavioral data that would be best studied with more sophisticated stats. Dunning-Kruger is an interesting example, the data is super simple, no advanced technology, but the meta-cognition interpretation has only survived due to a lack of stats knowledge to properly test it (I've seen similar examples in my field, where entire careers have been based on a theory that was entirely based on bad stats).
I don't know, this was long and rambley. Maybe this would make a good topic for a future reader questions post so I can say something more coherent about it.
Great explanation. Thank you. You are right, I was mistly thinking about tool specific challenges and the lag between creating tools and creating sufficiently tuned analysis packages. But I totally agree that more basic assessment approaches have also been subject of bad statistical practices
Just to clarify I mean in regard with everybody needs more statistics than is usually learned outside statistics departments
D-K https://www.al.com/living/2018/09/you_think_youre_a_pretty_good.html
...interviewed 50 people who were currently hospitalized from traffic accidents in which they were the driver. Most of these accidents were described as "hit fixed object" hard enough to overturn the car. Police reports placed clear blame on the hospitalized drivers in more than two-thirds of these accidents and suggested it in many of the others. Yet when asked to rate their driving skills, the hospitalized drivers rated themselves closer to "expert" than to "very poor" on a 9 point scale. A second group of drivers, matched to the accident group in terms of age, sex, race, and education level, all with excellent driving records rated themselves no higher and no lower than the hospitalized accident group.
I wrote about Dunning-Kruger, the better than average effect, and all sorts of other overestimations of ability and control here. Later research by Dunning shows that people at all levels of ability exaggerate their competence, with the effect being particularly strong among the least and, surprisingly, the most competent. https://eclecticinquiries.substack.com/p/positive-illusions-the-psychology
I tried to follow you back but I wasn’t allowed too. Thanks for following me. I will trying.
I highly recommend this post: https://wmbriggs.substack.com/p/the-case-for-ending-government-funding