How (not) to measure progress in science?

I’m a bit late to the party but I’ve been enjoying some Collison podcast backlog and realized I had more to say about the “diminishing returns of science” trope that does the rounds from time to time.

Simply stated, the thesis suggests that a variety of metrics employed to measure progress of science all seemingly concur that despite increasing numbers of PhDs and the net accumulation of knowledge, major new discoveries are few and far between, at least compared to science in prior ages.

For a process that’s devoted to discovering knowledge, science is poorly understood by nearly everyone, including scientists. It may not be surprising to find that GAAP metrics don’t neatly translate into an industry that has resolutely resisted market forces since the beginning of time, but there are less glib reasons why measuring progress in science is actually highly non-trivial.

I’m going to go into more detail later in this post, but the fundamental issue is that science is like Tetris, in that it’s both cumulative and self-compressing. Unlike, say, philosophy, the measure of a field’s maturity is the brevity of its textbooks and the unity of its underlying theoretical framework. With the benefit of hindsight, the discoveries of a century ago appear neatly ordered with the most (contingently) significant results retaining salience while the rest have slipped away.

Where do we start?

I first became aware of this idea being in the startup world around 2014, when Peter Thiel was giving talks such as this one.

I found that Thiel had somewhat limited insight here, though I recognize that a scientific background isn’t necessarily a requirement to contribute to this question. Still, comparing rate of financial return between venture capital and fundamental research is a bit gauche, particularly if one happens to be a VC. Of course, such comparisons will be made but it’s germane to begin with the requisite 45 minutes of throat clearing about Keynesian capital overabundance and selection bias.

That is, we don’t typically get lectures from failed (unlucky) VCs about the genius of the market. My rule of thumb is that if 5 sigma is good enough for particle physics, it’s good enough for VC as well. Anyone can get lucky once. Get lucky 5 times in a row and that’s more interesting.

I was more interested when Patrick Collison and Michael Neilson entered the arena in 2018, with an article in the Atlantic asking if science was stagnant.

I’ve met both men socially a couple of times and must begin by stating I have nothing but the highest respect for them personally, and their intentions in this endeavor. I also think that their suggested program of attempting more detailed study of this area is a damn good idea.

However, getting traction is difficult if the foundations are awry and there are aspects of the article that need further attention. Collison has collated other responses here so it’s gratifying that there is an ongoing conversation in this area. Certainly, science coming to a grinding halt is the stuff of civilizational nightmares.

Much of what I’ll write here may be obvious to some readers, perhaps less so to others. It seemed less obvious to me despite being familiar with these ideas for years and working in science for a decade or so, so I’m writing them down.

In summary, the article leans heavily on statistics and surveys about Nobel Prize-winning discoveries to make the case that despite exponentially increasing PhDs, publications, and science funding, few major discoveries have been made very recently.

At the risk of appearing a bit snarky, I would ask a hypothetical question: Over time Stripe has hired exponentially greater numbers of talented software engineers and yet has continued to ship about one product a year. Why?

To be fair, I could ask this of any software startup. The answer, of course, is that it’s complicated. Larger organizations move more slowly. Incremental growth in a market suffers diminishing returns. More ambitious products have more onerous compliance requirements. Competition. Technical debt.

To get a little deeper, like many ambitious companies Stripe has teams working on fairly fundamental research questions in cryptography and computer science. These team members are among the smartest, best resourced humans to have ever lived. How long until we get a Stripe publication that’s as significant as the Church-Turing Thesis?

This is a silly question, intended to provoke more than illuminate. But I think it underscores that progress in science is not purely an organizational problem. Academia is undoubtedly riven with dozens of major inefficiencies, some old and some new. I even wrote a whole book about ways in which assimilating this way of life can challenge humans. See the chapter on leaving academia for more information along these lines. The point is that even private research outside academia, despite enormous leaps in capability, doesn’t necessarily see itself as being in the Nobel Prize game.

And so, with that primer, we turn to the Nobel Prize. Awarded once a year in various fields for outstanding research, the prize itself has numerous well documented limitations. Using it as a mechanism to measure progress in science, no matter how well intentioned, is unlikely to result in deep insight. Doing so rests on a variety of flawed assumptions, chief among them being that science progress is conventionally measurable and the Nobel Prize performs that sort of measurement. Unfortunately, this is fairly far from the truth.

Before we tackle the salience of historical scientific research in general, it’s worth enumerating known biases in the Nobel Prize alone:
– Cadence of once a year, to at most three living participants, doesn’t scale with increased number of scientists, increased lifespans, or scale of scientific research.
– Nobel Committee is notoriously, and increasingly, conservative, reliably shunning certain demographics.
– Nobel Committee strongly prefers established results, meaning that in general old scientists get prizes for work they did when much younger, often work that may not have seemed that significant at the time.
– Scientists who die, or leave the field, or leave science, never get prizes.
– Women hardly ever win, particularly in physics.
– Standards and practices on the Committee have changed over time.
– Standards and practices of science outside the Committee have changed over time.

Earlier in the article, however, Collison and Nielson talk about a survey which asked scientists to evaluate which of a pair of given Nobel Prize discoveries were more significant. This approach ameliorates some of the peculiarities of the Prize system, however as designed cannot give unequivocal evidence. Science is progressive, and later results build on earlier ones. The significance, and salience, of earlier discoveries is enhanced, or overshadowed, by later ones in the same area.

Judging the relative merit of the discovery of the neutron or the Higgs Boson is a pointless exercise. The Higgs is the final page of a story that began with the neutron, a story that involves the contributions of at least tens of thousands of scientists, most unknown even to their own close families. If the Higgs had been discovered before the neutron, it would be more significant by far! So in some ways the question as posed asks “which of these discoveries was made first?” Are we so surprised to find that answers to this question, averaged and graphed, are biased to the left?

Collison and Nielson’s article follows this by talking about the lack of Earth-shaking discoveries, such as Einstein’s final formulation of General Relativity in 1915. Today that seems like as good a date as any to bake a cake with equations on it, but the reality for working scientists is that the ways in which GR “radically changed our understanding of space, time, mass, energy, and gravity” began, in many ways, with Maxwell in the 1870s and continue to the present day. There were several other prominent mathematicians (Hilbert among them) also working on similar geometric formulations of gravity, while GR was not widely accepted in physics for decades. Many of the more interesting cosmological consequences were not appreciated until the universe was found to be expanding and the cosmic microwave background (CMB) discovered, and details are still being actively researched today. I worked in this field for five years and I am 100% certain that by 2050, 90% of what I learned will be utterly irrelevant, I just have no way of knowing what.

The article discusses (and largely rejects) the idea that science is reaching a point of diminishing returns because all the easy stuff has been found and we’re approaching a more-or-less complete knowledge of nature. The death of physics has been predicted in the past, just prior to the discovery of quantum mechanics. It’s true that subfields of physics wax and wane depending on funding priorities and the somewhat stochastic fine-grained nature of discovery. Nuclear physics has run out of superpowers and high energy physics has certainly run out of big accelerators for the time being. But grad students and postdocs are highly fungible and may be counted on to reliably find the next big thing. Just because we haven’t yet digested the significance of the body of knowledge produced by our own generation doesn’t mean that it’s intrinsically worthless.

The article closes with a brief discussion of the idea of productivity slowdown. Economists measuring nation-state level productivity find that gains in per-person productivity in the US and other developed nations have largely tailed off since their peaks between post WW2 1950s-1970s. Obviously at this scale economic behavior is multifactorial (to say the least) but the specific mention of the Concord as a false harbinger does highlight the omission of the single most glaring factor in economic changes in the last quarter of the 20th century: the loss of predictably cheap oil. If I’m right, exploding capacity and plunging electricity costs currently occurring due to developments in photovoltaics and batteries will reverse this trend and enable supersonic air travel. We’ll see before we’re old!

Finally, let’s talk about the single most troubling aspect of the science progress measurement problem. Science is axiomatically different from nearly every other human pursuit, in that it’s cumulative and self-compressing. Scientists often joke that if they’re lucky they’ll get a Nature paper in a career – a single really solid bit of research that, years later, will be half a paragraph or a footnote in a textbook. Scientific progress often isn’t measured in pages of text produced, but pages of text removed. A great insight will allow two previously disparate phenomena to be understood under the same concept, and thus, the knowledge is compressed.

So when we attempt to measure the salience of the net contribution of an individual historical scientist today, it’s very difficult to propagate that distribution forwards or backwards in time, or to analyse it in isolation to other contemporary work. This is not simply a matter of bunging a bunch of weights into the Perron-Frobenius model and redoing PageRank. Everything is contingent.

I will, however, suggest a useful mental model. Imagine that the contribution of individual scientists can be modeled with a power law distribution. Landau actually tried to do something like this, for real.

Over time the rankings will shift and the absolute value will rise and fall but, generally speaking, salience falls with time, and often for reasons beyond any individual’s control, and often for reasons unrelated to the intrinsic quality or utility of that person’s work. Within someone’s career, if their salience is consistently high and the gods are favorable, the Swedish Committee may bestow The Prize but in many ways this is an inherently (and inconsistently) biased sample of an already biased probability distribution.

This all seems a bit handwavey, so I’ll give a concrete example.

I studied physics as an undergrad in building A28 at the University of Sydney. Built in 1920, its striking facade is decorated with the debossed names of famous physicists of the time. Einstein is not listed. Indeed, 1920 predates almost all of quantum mechanics and subatomic physics. It is a fun exercise to read “Anathem” by Neal Stephenson and map all the renamed physics in there to relatively obscure physics arcana in our own universe.

As someone who has read a couple of undergraduate physics textbooks, I could associate each of the names with an effect or equation, but I would be surprised if non-physicists knew most of the names, which are: Archimedes, Roger Bacon, Copernicus, Kepler, Galileo, Newton, Huyghens, Dalton, Fresnel, Fourier, Carnot, Faraday, Maxwell, Helmholtz, Kelvin, Boltzmann, Roentgen, and Bessel. I challenge the interested reader to write a paragraph, from memory, on the key contributions of each of these people.

For the purposes of this blog, I decided to research who did what and when, and it turns out that most of these physicists made their discoveries between 1800 and 1850, which is to say, 70-120 years before the building was built. A more comprehensive list of physicists active during this period can be found here. Attentive readers will have noticed that Collison and Nielson’s primary thesis is that exciting discoveries in physics dried up by 1950, 70 years ago. Coincidence, or perhaps low confidence in one’s ability to predict the long term value of recent discoveries isn’t a new phenomenon?

To take just one example, consider Thomas Young, who is best known today for the classic double slit experiment. A renowned polymath “who made notable contributions to the fields of vision, light, solid mechanics, energy, physiology, language, musical harmony, and Egyptology.” Widely considered to be the smartest working scientist of his generation, though apparently not good enough for the facade of A28. Best known today for an experiment that can be repeated by a four year old with a cheap laser. And yet in his own lifetime, despite considerable talents, he struggled as well as anyone else to wrest knowledge from the unordered chaos of the universe.

Science is a largely artisanal endeavor whose discoveries are always made by a huge number of people working in parallel. Tiny pieces of the puzzle are worked out by people who, in many cases, remain unaware of the other’s existence. The lucky few get an obscure equation named after them. Measuring the rate of equation naming is not a good way to understand the progress of science!

So how might we go about measuring the progress of science?

To take a utilitarian perspective, I think it’s fairly widely agreed that the human condition, both individually and collectively, has improved markedly over the last century. Amongst many others, Human Progress has collated impressive datasets showing rapid, and accelerating, improvement in key indicators such as hunger, poverty, literacy, freedom, life expectancy, exposure to violence, and access to markets. For sure, much of the improvement can be attributed to wider implementation of existing technology, and in some cases rather antique technology at that.

Food scarcity was largely (and unexpectedly) solved in the 20th century with the invention of the Haber process (1913) and Berlaug’s dwarf wheat (1950s), which were widely deployed within decades.

There is no doubt in my mind, however, that on a per-minute or per-smile basis, the material resources my contemporaries enjoy are overwhelmingly the result of new inventions, which is to say, new applications of relatively recently discovered science. The most transformative of these are personal computers and the internet, but I am convinced that we’re not even half way through chapter one of that story.

Manufacturing and automation are also salient examples. One could argue that there’s no reason that, for example, Tesla cars couldn’t have been built on a Ford production line in 1920 (perhaps without the autopilot and computer screen) but that would require overlooking the vast foundation of incremental knowledge gains necessary to make something as banal (and alien!) as a lithium battery cell for only $1.50 – cheaper than a loaf of bread.

A more thorough accounting premised on applied utility will show, I believe, accelerating scientific knowledge generation, diffusion, and application for the improvement of the human condition in every corner of the globe.

10 thoughts on “How (not) to measure progress in science?

  1. Another metaphor:

    Science is like programming code. There are a thousands bugs in the code (unknowns about how our world works) and thousands more programmers (scientists) trying to fix each and every bug.

    Just as in programming, when a scientist solves one small bug (an unknown) it often creates one hundred *more* bugs! However as the relentless push of thousands of programmers marches on through time, the bugs (at least perhaps in one method among many) begin to recede and fundamental understanding and wisdom of that part of the universe begin to emerge.

    As the last of the thousands of programmers corrects the final bug, the entire method begins to work flawlessly for the first time and humanity marvels at our new understanding. Now we have a new equation that perfectly models the world around us more accurately! That last programmer is your Einstein or Newton or Curie. Not just brilliant scientists, but the lucky ones to bring together often hundreds of years worth of understanding and work, to build and give to the world new understanding and power.

    But there are still other parts of the program with bugs in it. Still thousands of scientists working tirelessly to squash them. And still yet more work to be done as after a new method is finally solved, it often leads to everything having to be re-written again from scratch.

    Perhaps one day the program will be complete, but there is so much to learn and of course, it’s the journey that’s often the real lesson.

    Like

  2. Maybe the reason an “increasing numbers of PhDs” hasn’t translated into more scientific progress is that the number of really good people with PhDs has not increased, only the number of mediocre people with PhDs.

    Liked by 1 person

    1. There’s a common misconception that people deemed responsible for large discoveries are geniuses, apart from the rest. The reality is that most science is done by PhD students because they are cheap, and the most important factors are luck, dedication, and raw intelligence in that order. Most of what PhD students do is drudgery.

      Like

  3. I’d like to point out a couple things about the list of scientist names that I think are unintended exaggerations.

    One is that you say “I would be surprised if non-physicists knew any of the names,” but then the list of names contains Archimedes, Galileo and Newton. Surely you would not be surprised if non-physicists regularly knew these people? I would personally be very surprised if the majority of Americans did not know the name Galileo alone, even if they couldn’t say anything accurate about him.

    The second thing is when you say “it turns out that nearly all these physicists made their discoveries between 1800 and 1850”. Seven of these people died before 1800; 61% is certainly a majority, but could by no means be considered “nearly all”.

    Overall, I liked the post, and thanks for contributing to the discourse about the progress of science!

    Like

  4. What if science is limited? My firm believe is that science is limited. The number of fundamental laws will be limited.
    If one agrees to this then we would simply see a saturation effect in a sense that the more ‘easy’ to derive laws are found meanwhile, so what is left over is more difficult to tackle.
    In my daily work we have the saying that achieving 80 % of your tasks takes 20% of your time, the remaining 20% take 80% of you time.
    As a side note: to my knowledge the Nobel Committee tends to avoid awarding progress made in applied science. By this the metric of value for money has inherently a problem from my view.

    Like

  5. maybe the way, our brains are wired, change. at a time and age, when knowledge is mainly stored on machines and human brains work quicker on where to find answers than using their accumulated knowledge to find answers for problems, the general output decreases. PhD or no PhD.

    Like

  6. What a strange essay. I’m put in mind of this beautiful observation about Mrs. Ferrars from “Sense and Sensibility:”

    “She was not a woman of many words: for, unlike people in general, she proportioned them to the number of her ideas.”

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s