Perhaps the most surprising (and concerning) thing about reading medical literature is how few studies share any raw data. So many seem to just give averages. I am aware that the data would have to be anonymised. And perhaps there are other reasons I'm not aware of, like proprietary reasons?
While physics has more raw data sharing, I was still quite concerned about the paucity of studies that shared their raw data.
It is much more difficult to hide poor scientific practice with raw data. The Data Colada blog series on Francesca Gino's raw data is a good example.
Agreed. If medicine is anything like psychology, the reason for not sharing the raw data is pretty straightforward: there's no incentive to. Journals rarely ask researchers to, and people rarely look at it, so it's just more work to put the data in a presentable format, write up an explanation of the data, etc.
The incentives need to be aligned, which means, again, attaching prestige to studies that share data and having prestigious journals require it.
That is a good point and makes a lot of sense Tommy - though I would add when I was submitting raw data as a physics researcher, it was as raw as it is possible to be raw. In fact a lot of the time it would literally be a .raw file! Not presentable, no explanation attached, just a data dump into a repository and the paper would have a link to it at the end. Unlikely to be looked at, but it was a more of a full transparency nothing to hide type gesture. Though I was surprised how frequently I would use something like a dataset for a paper no one had cited 15 years ago - plenty of PhD students out there putting off writing their theses!
I do like the idea that the incentives need to be aligned. Thinking out loud how would one attach prestige to studies that preregister and share data? Would it be something like a dual-pronged approach - holding journals to account through something like a retraction-watch (https://retractionwatch.com/) table, and then studies into what things produce papers more likely to be reproducible and higher quality? I mean, I'm making the assumption raw data would help - but I don't actually know...
Meandering thoughts aside, good article Tommy - it was a pleasure reading it!
The biggest area of leverage for aligning incentives IMO sits with editors. Make it journal policy that studies need to make data publicly available (some journals already do this, so there's precedent, and as you point out it doesn't need to be onerous). Preregistration is a bit harder since it is more of a structural change, but if the big journals and their editors at least put the pieces in place (making a big deal about submitting pre-registration to them, clearly mark papers that were pre-registered), then the community could align around pre-registered studies holding more weight, things will shift in that direction. There's already some movement in this direction, e.g. https://www.nature.com/nature/for-authors/registered-reports, we just have to push much harder in that direction
You make a lot of sense, and based on your arguments, I would be inclined to agree that the onus lies with the editors. You can count me in on the push to make preregistration and raw data hold more weight. I'm not entirely sure what that would entail, but I'm in! Thanks for the conversation, Tommy. I really enjoyed it. Happy holidays!
Pregistration is a good idea but it's also true that sometimes the data can answer a question you didn't ask. In which case, I suppose that is worth another string if experiments.
I'm all for using the data you have to try to answer additional questions. This can be a great way to generate new theories and hypotheses and is more economical than running a new experiment for every new idea -- but these should be explicitly labeled as exploratory analyses and the results held as more tentative than pre-registered ones!
You can report unexpected findings even with a preregistration; the thing to do would just be to make clear that this isn't something you preregistered.
Nice one Tommy! In ecology, we had our own major drama over fabrication a few years ago. Search Jonathan Pruitt. He seemed equally unready to take blame despite dozens of retractions after detailed research into his papers and the fallout for all his coauthors was huge. It even got the name “Pruittgate”.
Preregistration has been called for for a while. I like the idea. Probably works best for clinical stuff or highly controlled experiments though.
Great article. Quantitative research is fraught with paradoxes. Likert scales, are they ordinal or scaled? What if I do put a number in front or not, horizontal or vertical displayed answer options? Negative or positively worded questions. The degrees of freedom are endless…
Tommy, I frequently work in the field of management and leadership, and have been exposed on numerous occasions to academic research there. It feels like just an absolute wild west of questionable claims, research practices, and irreplicability. I've not gone to the depth of research you have on psych research but when I read the papers, so many of them just don't 'feel' right. There are so many contributing, subjective variables to how someone performs at work or the effects of 'leadership' that frequently aren't duly acknowledged or accounted for that often I don't understand how they've reached the conclusions they did.
Thank you for the depth you provided. I have read somewhere a similar article about this. I am not schooled in this area, but I have always gravitated towards the "scientific spectrum" and as a lay person I place more value on a scientific result than just an "opinion" or "belief" of someone. I also have noticed how science gets adjusted over time as the things that we can comprehend or access become "better" or different. For me personally I think we are all better off because of the people who work on things such as this. Also when a person is presented a scientific conclusion most people do not consider what goes into getting to this conclusion. I know that there are very strict and stringent frameworks for experiments, and if we can account for every instance of possibilities, to create a solid conclusion we must do so. If only because of the reach that comes from the presentation, we cannot be sure where or what bedrock the presentation will become part of. Meaning we cannot know in advance how the presentation will be used to create other conclusions. It is a very difficult field. Mind boggling in a way.
I work in psychology so this article addresses issues in my field. I started my PhD in 2015, a few years into the maelstrom at a time when the replication crisis was the big topic. Things seemed to have quieted down since then. A few changes have been introduced, but I haven't seen the kinds of revolutionary shifts in the field I'd have hoped for.
If anything, I regard the situation as far more bleak. Even if we set aside fraud cases, which are significant but not as big of a contributor to the problem of low replication rates as p-hacking and other bad research practices, there are still deeper issues. Here are a few:
(1) There is a lack of any substantive, unifying theories grounding a great deal of research. One might say (and there are papers on this) that we have a Theory Crisis.
My own work focuses on problems of measurement and validity in the instruments used to assess how nonphilosophers think about philosophical questions, primarily moral realism and antirealism. I believe available evidence hints that the measures we use for these purposes are generally invalid. So, even if they replicated, it would be moot: they aren't measuring what people think they are, and cannot serve the empirical purposes they're put to.
This highlights a more insidious problem then low replication rates: studies can be uninformative or misleading even if they replicate.
Worst case scenario, I say we start a Substack called “Null.”
Seriously, though, I started my career in healthcare research, and now I work with researchers to help promote their work. I think often and talk about the idea that a null finding is still a finding, but you’re right that the incentives to elevating that work just aren’t there. Thanks for this extremely thorough write up about these issues!
Severe kudos for how well researched and thought out this article is. I shared it with several friends who work in psychology. I’m a subbie for life!
Love the idea of preregistration. Knowing that a study failed could be as important as knowing it succeeded!
Absolutely!
Perhaps the most surprising (and concerning) thing about reading medical literature is how few studies share any raw data. So many seem to just give averages. I am aware that the data would have to be anonymised. And perhaps there are other reasons I'm not aware of, like proprietary reasons?
While physics has more raw data sharing, I was still quite concerned about the paucity of studies that shared their raw data.
It is much more difficult to hide poor scientific practice with raw data. The Data Colada blog series on Francesca Gino's raw data is a good example.
Agreed. If medicine is anything like psychology, the reason for not sharing the raw data is pretty straightforward: there's no incentive to. Journals rarely ask researchers to, and people rarely look at it, so it's just more work to put the data in a presentable format, write up an explanation of the data, etc.
The incentives need to be aligned, which means, again, attaching prestige to studies that share data and having prestigious journals require it.
That is a good point and makes a lot of sense Tommy - though I would add when I was submitting raw data as a physics researcher, it was as raw as it is possible to be raw. In fact a lot of the time it would literally be a .raw file! Not presentable, no explanation attached, just a data dump into a repository and the paper would have a link to it at the end. Unlikely to be looked at, but it was a more of a full transparency nothing to hide type gesture. Though I was surprised how frequently I would use something like a dataset for a paper no one had cited 15 years ago - plenty of PhD students out there putting off writing their theses!
I do like the idea that the incentives need to be aligned. Thinking out loud how would one attach prestige to studies that preregister and share data? Would it be something like a dual-pronged approach - holding journals to account through something like a retraction-watch (https://retractionwatch.com/) table, and then studies into what things produce papers more likely to be reproducible and higher quality? I mean, I'm making the assumption raw data would help - but I don't actually know...
Meandering thoughts aside, good article Tommy - it was a pleasure reading it!
The biggest area of leverage for aligning incentives IMO sits with editors. Make it journal policy that studies need to make data publicly available (some journals already do this, so there's precedent, and as you point out it doesn't need to be onerous). Preregistration is a bit harder since it is more of a structural change, but if the big journals and their editors at least put the pieces in place (making a big deal about submitting pre-registration to them, clearly mark papers that were pre-registered), then the community could align around pre-registered studies holding more weight, things will shift in that direction. There's already some movement in this direction, e.g. https://www.nature.com/nature/for-authors/registered-reports, we just have to push much harder in that direction
You make a lot of sense, and based on your arguments, I would be inclined to agree that the onus lies with the editors. You can count me in on the push to make preregistration and raw data hold more weight. I'm not entirely sure what that would entail, but I'm in! Thanks for the conversation, Tommy. I really enjoyed it. Happy holidays!
Pregistration is a good idea but it's also true that sometimes the data can answer a question you didn't ask. In which case, I suppose that is worth another string if experiments.
I'm all for using the data you have to try to answer additional questions. This can be a great way to generate new theories and hypotheses and is more economical than running a new experiment for every new idea -- but these should be explicitly labeled as exploratory analyses and the results held as more tentative than pre-registered ones!
You can report unexpected findings even with a preregistration; the thing to do would just be to make clear that this isn't something you preregistered.
Nice one Tommy! In ecology, we had our own major drama over fabrication a few years ago. Search Jonathan Pruitt. He seemed equally unready to take blame despite dozens of retractions after detailed research into his papers and the fallout for all his coauthors was huge. It even got the name “Pruittgate”.
Preregistration has been called for for a while. I like the idea. Probably works best for clinical stuff or highly controlled experiments though.
Great article. Quantitative research is fraught with paradoxes. Likert scales, are they ordinal or scaled? What if I do put a number in front or not, horizontal or vertical displayed answer options? Negative or positively worded questions. The degrees of freedom are endless…
Tommy, I frequently work in the field of management and leadership, and have been exposed on numerous occasions to academic research there. It feels like just an absolute wild west of questionable claims, research practices, and irreplicability. I've not gone to the depth of research you have on psych research but when I read the papers, so many of them just don't 'feel' right. There are so many contributing, subjective variables to how someone performs at work or the effects of 'leadership' that frequently aren't duly acknowledged or accounted for that often I don't understand how they've reached the conclusions they did.
Leadership is especially hard. As much of it is about influencing a dependant variable- performance, that there is no universal agreed definition of.
By the way, this is worth checking out if you're interested in replication crisis stuff:
https://www.speakandregret.michaelinzlicht.com/p/revisiting-stereotype-threat
...As is abundantly clear: replication issues are ongoing, and major findings are still crumbling.
Et tu, stereotype threat?
Thank you for the depth you provided. I have read somewhere a similar article about this. I am not schooled in this area, but I have always gravitated towards the "scientific spectrum" and as a lay person I place more value on a scientific result than just an "opinion" or "belief" of someone. I also have noticed how science gets adjusted over time as the things that we can comprehend or access become "better" or different. For me personally I think we are all better off because of the people who work on things such as this. Also when a person is presented a scientific conclusion most people do not consider what goes into getting to this conclusion. I know that there are very strict and stringent frameworks for experiments, and if we can account for every instance of possibilities, to create a solid conclusion we must do so. If only because of the reach that comes from the presentation, we cannot be sure where or what bedrock the presentation will become part of. Meaning we cannot know in advance how the presentation will be used to create other conclusions. It is a very difficult field. Mind boggling in a way.
I work in psychology so this article addresses issues in my field. I started my PhD in 2015, a few years into the maelstrom at a time when the replication crisis was the big topic. Things seemed to have quieted down since then. A few changes have been introduced, but I haven't seen the kinds of revolutionary shifts in the field I'd have hoped for.
If anything, I regard the situation as far more bleak. Even if we set aside fraud cases, which are significant but not as big of a contributor to the problem of low replication rates as p-hacking and other bad research practices, there are still deeper issues. Here are a few:
(1) There is a lack of any substantive, unifying theories grounding a great deal of research. One might say (and there are papers on this) that we have a Theory Crisis.
(2) Many studies have extremely poor generalizability. Call this the Generalizability Crisis (see: https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/abs/generalizability-crisis/AD386115BA539A759ACB3093760F4824)
(3) There are problems of measurement and validity. So we may say we have a "validity crisis" (see: https://replicationindex.com/2019/02/16/the-validation-crisis-in-psychology/).
My own work focuses on problems of measurement and validity in the instruments used to assess how nonphilosophers think about philosophical questions, primarily moral realism and antirealism. I believe available evidence hints that the measures we use for these purposes are generally invalid. So, even if they replicated, it would be moot: they aren't measuring what people think they are, and cannot serve the empirical purposes they're put to.
This highlights a more insidious problem then low replication rates: studies can be uninformative or misleading even if they replicate.
Worst case scenario, I say we start a Substack called “Null.”
Seriously, though, I started my career in healthcare research, and now I work with researchers to help promote their work. I think often and talk about the idea that a null finding is still a finding, but you’re right that the incentives to elevating that work just aren’t there. Thanks for this extremely thorough write up about these issues!
Part 2 talks about experiments in Psychology, actually:
https://federicosotodelalba.substack.com/p/beauty?r=4up0lp
https://federicosotodelalba.substack.com/p/last?r=4up0lp
Could AI review research practices including pre-registered studies instead of humans?
Probably and it will only be more capable of doing so with time.