What can the state learn from reliability and ML engineering?
Focus on fast detection and response instead of trying to avoid failure
In a great interview in the Mindscape podcast by Sean Carroll, Niall Ferguson analyses the government response to COVID in the US and in Taiwan and concludes that for a system to be robust, it should rather be generally paranoid than very specifically prepared to particular threats (transcript, 0:33:31.6):
If you were a highly bureaucratic public health system that optimized for central control and risk minimization, as I think CDC, the Centers for Disease Control in the United States, did, your response to the news of a novel pathogen from Wuhan was in fact to make it harder to test for that pathogen rather than making it much easier. And this was an epic fail that emanated from, I think, an increasingly sclerotic bureaucratic culture at CDC that would have been unrecognizable to the people who founded the institution in the post-war period.
By contrast, in Taiwan, the response to any threat from China, regardless of whether it’s authentically viral or it is information warfare, is immediate and rapid reaction, and the Taiwanese ramped up testing, launched contact tracing and had a digital system of quarantining within weeks of the first news, even before the WHO had confirmed it was human-to-human transmission. I take from that that it is better to be generally paranoid, which I think you have to be in Taiwan for obvious reasons, and in South Korea, than it is to be very specifically prepared.
The problem about Western bureaucracies is that they are excessively precise in the risks that they are preparing for. This was also true in financial regulation. Financial regulation before 2008 had increasingly complicated: The Basel rules on bank capital adequacy went from a few pages to enormous numbers of pages, and paradoxically the more they prepared for the specific eventuality of a stress test on bank balance sheets, the less prepared they were. I was highly amused to find as I was researching Doom just how many pandemic preparedness reports the American bureaucracy had produced in the years prior to 2020.
A reliability engineer Eric Brewer made exactly the same point 20 years ago, in his paper "Lessons from Giant-Scale Services" (emphasis mine):
The traditional metric for availability is uptime, which is the fraction of time a site is handling traffic. Uptime is typically measured in nines, and traditional infrastructure systems such as the phone system aim for four or five nines (“four nines” implies 0.9999 uptime, or less than 60 seconds of downtime per week). Two related metrics are meantime-between-failure (MTBF) and mean-time-to-repair (MTTR). We can think of uptime as: uptime = (MTBF – MTTR)/MTBF.
Following this equation, we can improve uptime either by reducing the frequency of failures or reducing the time to fix them. Although the former is more pleasing aesthetically, the latter is much easier to accomplish with evolving systems. For example, to see if a component has an MTBF of one week requires well more than a week of testing under heavy realistic load. If the component fails, you have to start over, possibly repeating the process many times. Conversely, measuring the MTTR takes minutes or less and achieving a 10-percent improvement takes orders of magnitude less total time because of the very fast debugging cycle. In addition, new features tend to reduce MTBF but have relatively little impact on MTTR, which makes it more stable. Thus, giant-scale systems should focus on improving MTTR and simply apply best effort to MTBF.
Non-hierarchical networks are more agile
Also in this interview, Ferguson points out that non-hierarchical networks are better prepared to deal with uncertainty and nonlinear influences than hierarchical systems.
In a book called The Pity of War, I was trying to understand why, despite its significant material disadvantages, the German army out-performed in battle the other armies, in particular the British army. And one of the things I learnt from military historians was that although one thinks of Prussian militarism as quintessentially hierarchical, in reality that was the British army. It was in the British Army that you sat in the foxhole waiting for orders, even when it was unlikely that orders could reach you.
The German army in fact had a culture of delegated authority to NCOs that encouraged small groups to take the initiative in the battlefield. So although I didn’t know anything about it at the time, I could have written that in terms of network structure, the German army was far more of a distributed network, and therefore, in fact, far more versatile and responsive to changing battlefield conditions. It was the British army that was actually held back by the waiting for orders mentality.
This is one of the principle of High-reliability organisation: entrust people at the sharp edge to do what they think is needed for safety, as Sidney Dekker writes in "Drift info Failure".
Analogy between overspecification and overfitting
Shortly afterwards, Sean Carroll recounts the overspecification of the EU constitution:
I remember I was teaching at a summer school in Europe, right when the EU constitution was being debated, and one of the guest lecturers, it was a very broad summer school, one of the guest lectures was from Stephen Breyer, the US Supreme Court Justice. And he was too polite to say it out loud, but he’s like, this proposed EU constitution, hundreds of pages, and he’s like, here’s our constitution, that’s worked pretty well, it’s a couple of pages long, because it doesn’t try to pretend to anticipate every single possible thing, and you have to allow for the system to breathe a little bit.
Here, we can see obvious a connection to model overfitting. Over-specified constitution is unlikely to "age well" in a rapidly changing environment, which is equivalent for a model to receiving new data for which it was not trained.
Buterin and Weyl also recognised this analogy from a slightly different perspective in their essay "Central Planning as Overfitting".
The regularisation method based on a feedback loop can perhaps be applied to the public policy design process (e. g., environmental policy), but not to the process of writing a constitution because we cannot easily "iterate" on it. So, we should apply other regularisation techniques when we try to write a constitution.
An equivalent of dropout, perhaps, would be trying to remove every single chapter from the constitution one-by-one and see if the remaining chapters still provide roughly the same guidance for the things which were specifically discussed in the "dropped out" chapter. This process should lead to sufficiently generically formulated chapters which can also provide guidance in the new areas of politics and ethics opened up by the new technological and social developments.
What other principles of reliability and ML engineering can we apply in state and policy work?
One of the principles of reliability engineering is to monitor as many runtime metrics as possible. Applications of this principle in the government are obvious. However, there is also also a contra position that one should only collect data for a purpose. Also, as Nassim Taleb warns, the number of spurious correlations grows quadratically with the growth in the number of collected metrics. So we should be very careful when using the data for correlation analysis.