*Every Monday our authors provide a round-up of some of the most recently published peer reviewed articles from the field. We don’t cover everything, or even what’s most important – just a few papers that have interested the author. Visit our Resources page for links to more journals or follow the HealthEconBot. If you’d like to write one of our weekly journal round-ups, get in touch.*

**Researcher Requests for Inappropriate Analysis and Reporting: A U.S. Survey of Consulting Biostatisticians**. Annals of Internal Medicine. [PubMed] *Published October 2018.*

I have spent a fair bit of time masquerading as a statistician. While I frequently try to push for Bayesian analyses where appropriate, I have still had to do Frequentist work including power and sample size calculations. In principle these power calculations serve a good purpose: if the study is likely to produce very uncertain results it won’t contribute much to scientific knowledge and so won’t justify its cost. It can indicate that a two-arm trial would be preferred over a three-arm trial despite losing an important comparison. But many power analyses, I suspect, are purely for show; all that is wanted is the false assurance of some official looking statistics to demonstrate that a particular design is good enough. Now, I’ve never worked on economic evaluation, but I can imagine that the same pressures can sometimes exist to achieve a certain result. This study presents a survey of 400 US-based statisticians, which asks them how frequently they are asked to do some inappropriate analysis or reporting and to rate how egregious the request is. For example, the most severe request is thought to be to falsify statistical significance. But it includes common requests like to not show plots as they don’t reveal an effect as significant as thought, to downplay ‘insignificant’ findings, or to dress up post hoc power calculations as a priori analyses. I would think that those responding to this survey are less likely to be those who comply with such requests and the survey does not ask them if they did. But it wouldn’t be a big leap to suggest that there are those who do comply, career pressures being what they are. We already know that statistics are widely misused and misreported, especially p-values. Whether this is due to ignorance or malfeasance, I’ll let the reader decide.

**Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results**. Advances in Methods and Practices in Psychological Science. [PsyArXiv] *Published August 2018.*

Every data analysis requires a large number of decisions. From receiving the raw data, the analyst must decide what to do with missing or outlying values, which observations to include or exclude, whether any transformations of the data are required, how to code and combined categorical variables, how to define the outcome(s), and so forth. The consequence of each of these decisions leads to a different analysis, and if all possible analyses were enumerated there could be a myriad. Gelman and Loken called this the ‘garden of forking paths‘ after the short story by Jorge Luis Borges, who explored this idea. Gelman and Loken identify this as the source of the problem called p-hacking. It’s not that researchers are conducting thousands of analyses and publishing the one with the statistically significant result, but that each decision along the way may be favourable towards finding a statistically significant result. Do the outliers go against what you were hypothesising? Exclude them. Is there a nice long tail of the distribution in the treatment group? Don’t take logs.

This article explores the garden of forking paths by getting a number of analysts to try to answer the same question with the same data set. The question was, are darker skinned soccer players more likely to receive a red card that their lighter skinned counterparts? The data set provided had information on league, country, position, skin tone (based on subjective rating), and previous cards. Unsurprisingly there were a large range of results, with point estimates ranging from odds ratios of 0.89 to 2.93, with a similar range of standard errors. Looking at the list of analyses, I see a couple that I might have pursued, both producing vastly different results. The authors see this as demonstrating the usefulness of crowdsourcing analyses. At the very least it should be stark warning to any analyst to be transparent with every decision and to consider its consequences.

**Front-Door Versus Back-Door Adjustment With Unmeasured Confounding: Bias Formulas for Front-Door and Hybrid Adjustments With Application to a Job Training Program**. Journal of the American Statistical Association. *Published October 2018.*

Econometricians love instrumental variables. Without any supporting evidence, I would be willing to conjecture it is the most widely used type of analysis in empirical economic causal inference. When the assumptions are met it is a great tool, but decent instruments are hard to come by. We’ve covered a number of unconvincing applications on this blog where the instrument might be weak or not exogenous, and some of my own analyses have been criticised (rightfully) on these grounds. But, and we often forget, there are other causal inference techniques. One of these, which I think is unfamiliar to most economists, is the ‘front-door’ adjustment. Consider the following diagram:

On the right is the instrumental variable type causal model. Provided Z satisfies an exclusion restriction. i.e. independent of U, (and some other assumptions) it can be used to estimate the causal effect of A on Y. The front-door approach, on the left, shows a causal diagram where there is a post-treatment variable, M, unrelated to U, and which causes the outcome Y. Pearl showed that under a similar set of assumptions as instrumental variables, that the effect of A on Y was entirely mediated by M, and that there were no common causes of A and M or of M and Y, then M could be used to identify the causal effect of A on Y. This article discusses the front-door approach in the context of estimating the effect of a jobs training program (a favourite of James Heckman). The instrumental variable approach uses random assignment to the program, while the front-door analysis, in the absence of randomisation, uses program enrollment as its mediating variable. The paper considers the effect of the assumptions breaking down, and shows the front-door estimator to be fairly robust.

**Credits**