We are the teachers of a new intelligence now emerging on the planet. The challenge is how best to scale our teaching in order to keep up with the breakneck pace of automation. The future of machine governance depends upon our ability to do so.
It turns one that one of the biggest problems we face comes down to training machines on tasks that lack clearly-articulated objectives. These kinds of problems are extremely common and we humans happen to be quite good at solving them. That’s why one encouraging solution to training machines on these types of ill-defined problems draws on human preferences.
In machine learning terms, objectives are often described in terms of ‘reward functions.’ Reward functions provide feedback to a machine learning system that helps it know whether it’s headed in the right direction. When DeepMind trained its algorithms to play old-fashioned Atari video games, it used game scores as a reward function to measure and improve the algorithm’s strategies. The problem is that many of the world’s problems lack these kinds of objective reward functions.
Take self-driving cars. While it’s true that aspects of collision avoidance can be turned into objective reward functions, these aren’t the only factors that go into making a successful self-driving vehicle. For mainstream adoption to take off, we also need to consider the comfort of the ride, its convenience, how fun it is and countless other subjective variables. To turn these subjective measures into the kind of data needed for useful reward functions, we need good systems for collecting the subjective preferences of human riders.
In order for our future intelligent systems to make use of subjective human feedback, we will need reliable, cost-effective mechanisms for capturing human preferences and turning that information into reward functions.
Some of that capturing of human preferences will happen during product development, as companies train algorithms for their various new products and services. In most cases though, these initial instructions will prove insufficient. We will want our products and services to learn and adjust themselves as we use them over time. In the world where all products are tethered to an Internet of Things, we will come to see all products as services and we will expect those service to learn from their interactions with us.
We already expect this kind of learning today, thanks to the personalization that the Internet has made possible. I expect Netflix to customize its recommendations, Google to uniquely tailor my search results and Amazon to personalize my shopping experience.
In this sense, personalization acts as a stepping stone for building reward functions that will enable machine governance based on human preferences.
‘Ello Gov’na
As a society, we need to think deliberately about the governance of machines. While the word “governance” connotes things like government and boards of directors, a “governor” is also a mechanical system. Governors are used for constraining the speed of engines and other industrial machines. Now, as machines do much more than just whir and spin, our concerns expand beyond simply throttling their operating speeds. As machines become smarter, we need to govern them much as we govern human systems.