The conversation around impact factors and the assessment of research outputs, amplified by the recent ‘splash’ boycott by Randy Shekman, is turning my mind to a different aspect of science – and indeed society – and that is the use of metrics.
We are becoming better and better at producing metrics: more of the things we do are digitised, and by coordinating what we do more carefully we can ‘instrument’ our lives better. Familiar examples might be monitoring household electricity meters to improve energy consumption, analysing traffic patterns to control traffic flow, or even tracking the movement of people in stores to improve sales.
At the workplace it’s more about how many citations we have, how much grant funding we obtain, how many conferences we participate in, how much disk space we use… even how often we tweet. All these things usually have fairly ‘low friction’ instrumentation (with notable exceptions).
This means there is a lot more quantitative data about us as scientists out there than ever before, particularly our ‘outputs’ and related citations, and mostly with an emphasis on the traditional (often maligned) Impact Factor of journals and increasingly on “altmetrics”. This is only going to intensify in the future.
Data driven… to a point
At one level this is great. I’m a big believer in data-driven decisions in science, and logically this should be extended to other arenas. But on another level, metrics can be dangerous.
Four dangers of metrics
- Metrics are low-dimensional rankings of high-dimensional spaces;
- Metrics are horribly confounded and correlated;
- A few metrics are more easily ‘gamed’ than a broad array of metrics;
- There is a shift towards arguments that are supported by available metrics.
The tangle of multidimensional metrics
A metric, by definition, provides a single dimension on which to place people or things (in this case scientists). The big downside is that we know that science is considered “good” only after evaluating it on many levels. It can’t be judged usefully along any single, linear metric. On a big-picture, strategic level, one has to consider things within the context of different disciplines. Then there is an aspect of ‘science community’ – successful science needs both people who are excellent mentors and community drivers, and the ‘lone cats’ who tend to keep to themselves. Even at the smallest level, you have to have a diversity of thinking patterns (even within the same discipline, even with the same modus operandi) for science to be really good. It would be a disaster if scientists were too homogeneous. Metrics implicitly make an assumption of low dimensionality (in the most extreme case, of a single dimension), which by its very definition, cannot capture this multi-dimensional space.
Clearly, there are going to be a lot of factors blending into metrics, and a lot of those will be unwanted confounders and/or correlation structures that confuse the picture. Some of this is well known: for example, different subfields have very different citation rates; parents who take career breaks to raise children (the majority being women) will often have a different readout of their career through this period. Perhaps less widely considered is that institutions in less well-resourced countries do not actually have poorer access to the ‘hidden’ channels of meetings and workshops of science.
Some of the correlations are hard to untangle. Currently, many good scientists like to publish in Science, Nature and Cell, and so … judging people by their Science, Nature and Cell papers is (again, currently) an ‘informative proxy’. But this confounding goes way deeper than one or two factors; rather, it is a really crazy series of things: a ‘fashion’ in a particular discipline, a ‘momentum’ effect in a particular field, attendance at certain conferences, the tweeting and blogging of papers…
Because of the complex correlation between these factors, people can use a whole series of implicit or explicit proxies for success to get a reasonable estimation of where someone might be placed in this broad correlation structure. The harder question is: why is this scientist – or this project proposed by this scientist – in this position in the correlation structure? What happens next if we fund this project/scientist/scheme?
Gaming the system
Making the judgement call
One unconscious aspect of using metrics is the way it affects the whole judgement process. I’ve seen committees – and myself sometimes when I catch myself at it – shift towards making arguments based on available metrics, rather than stepping back and saying, “These metrics are one of a number of inputs, including my own judgement of their work”.
One needs to almost read past the numbers – even if they are poor – and ask, “Is the science worth it?” In the worst case, the person or committee making that judgement call will be asked to justify the decision based entirely on metrics, in order to present a sort of watertight argument. But there are real dangers of believing – against all evidence – that metrics are adequate measures. That said, this is the counter-argument to ‘using objective evidence’ and ‘removing establishment bias’ – the very thing that using metrics helps counter. There has to balance.
- We need more, not fewer, metrics, and to have a diversity of metrics presented to us when we make judgements. This might make interpretation seem more complicated, and therefore harder to judge. And that is, in many cases, correct – it is more complicated and it is hard to judge these things.
- We need good research on metrics and confounders. At the very least this will help expose their strengths and weaknesses; even better, it will potentially make it possible to adjust for (perhaps unexpected) major influencing factors.
- We should collectively accept that, even with a large number of somewhat un-confounded metrics, there will still be confounders we have not thought about. And even if there were perfect, unconfounded metrics, we would still have to decided which aspects of this high-dimensional space we want to select; after all, selecting just one area of ‘science’ is, well, not going to be good.
- We should trust the judgement of committees, in particular when they ‘re-rank’ against metrics. Indeed, if there is a committee whose results can be accurately predicted by its input metrics, what’s the point of that grouping?
My thinking on this subject has been influenced by two great books. One is Daniel Kahneman’s “Thinking, Fast and Slow“, which I’ve blogged about previously. The other is Nate Silver’s excellent “The signal and the noise“. Both are seriously worth reading, for any scientist.