Can We Control Superior Artificial Intelligence?
Developments in Artificial Intelligence may one day lead to the creation of superintelligence. The inherent difficulty of controlling a system with an intellect far superior to our own is important to address, says philosopher Nick Bostrom.
Professor Bostrom is founding Director of the Future of Humanity Institute at the University of Oxford. His work covers such subjects like the consequences of future technologies and the concept of existential risk: dangers that threaten the continued existe...
Developments in Artificial Intelligence may one day lead to the creation of superintelligence. The inherent difficulty of controlling a system with an intellect far superior to our own is important to address, says philosopher Nick Bostrom.
Professor Bostrom is founding Director of the Future of Humanity Institute at the University of Oxford. His work covers such subjects like the consequences of future technologies and the concept of existential risk: dangers that threaten the continued existence of humanity. He’s also known for raising the question about whether or not we live in a computer simulation.
In a paper Bostrom defines superintelligence as ‘any intellect that vastly outperforms the best human brains in practically every field, including scientific creativity, general wisdom, and social skills’. If such an AI system is given access to actuators like robots it would be an immensely powerful agent. And even when confined to a single machine restricted to information exchange only this vast intellectual resource would have a huge impact on the way humanity develops. Amongst other things superintelligence would rapidly accelerate scientific and technological progress and, as a probable result, create even more intelligent AI systems.
There’s a safety risk in creating a machine more intelligent than any human or even all humans combined since there is no guarantee we can control the outcome of its actions, Bostrom said in a speech given at the Artificial General Intelligence conference held in Oxford last December.
In terms of risk assessment the safest assumption is that if something can be done, the AI can do it. That is to say, if the laws of physics or any other restrictions hard coded into the universe don’t prevent it, superintelligence will eventually reach the point where it can make it happen. That’s a boon if the system has the best interest of humanity in mind. If, however, its overarching goal would be to turn all available resources into paperclips, Bostrom offers, we’re facing an existential risk.
In his speech Bostrom explores two different approaches to controlling artificial superintelligence: capability control and motivational control. But he immediately adds that every control mechanisms has its weakness.
Capability control can take different forms. There is physical containment. Locking the AI up in a box with no access to actuators or the Internet. But there is always the risk it will hack its way out of the box. Also people would have to communicate with the AI otherwise it would be useless. And people aren’t infallible systems, they can be compromised.
Bostrom suggests designing tripwires into the system. Specific occurrences (an unhealthy interest in paperclips for instance) will cause the system to shut down. However, this would require anticipating all possible suspect behaviors in the design phase. Secondly, the superintelligence may become aware of the tripwires and disable them.
Another way to control the AI is with incentives. The problem with doling out reward signals for good behavior is that the AI might circumvent human control over the virtual cookie jar. In a worst case scenario it would eliminate humans to get direct access to the reward button. Junkie AI.
Another approach is motivational control: engineering human friendly motivations into system in the design phase. One way of going about that is direct specification, programming good behavior directly into the AI. But human values are ‘fragile and complex’, says Bostrom, you can’t code those easily in C++.
A more feasible method is indirect specification, designing a system that learns over time just like humans acquire values as they mature. The problem here is that scientists don’t really know how this works in humans. Bostrom thinks this approach is one of the more promising avenues of controlling a superintelligence and suggests to research this subject more thoroughly.
Bostrom concludes that some contol mechanisms are less promising while other worthy of research. But no method has come anywhere close to providing a guaranteed solution to the superintelligence control problem.
Photo: Traderightuk
Professor Bostrom is founding Director of the Future of Humanity Institute at the University of Oxford. His work covers such subjects like the consequences of future technologies and the concept of existential risk: dangers that threaten the continued existence of humanity. He’s also known for raising the question about whether or not we live in a computer simulation.
In a paper Bostrom defines superintelligence as ‘any intellect that vastly outperforms the best human brains in practically every field, including scientific creativity, general wisdom, and social skills’. If such an AI system is given access to actuators like robots it would be an immensely powerful agent. And even when confined to a single machine restricted to information exchange only this vast intellectual resource would have a huge impact on the way humanity develops. Amongst other things superintelligence would rapidly accelerate scientific and technological progress and, as a probable result, create even more intelligent AI systems.
There’s a safety risk in creating a machine more intelligent than any human or even all humans combined since there is no guarantee we can control the outcome of its actions, Bostrom said in a speech given at the Artificial General Intelligence conference held in Oxford last December.
In terms of risk assessment the safest assumption is that if something can be done, the AI can do it. That is to say, if the laws of physics or any other restrictions hard coded into the universe don’t prevent it, superintelligence will eventually reach the point where it can make it happen. That’s a boon if the system has the best interest of humanity in mind. If, however, its overarching goal would be to turn all available resources into paperclips, Bostrom offers, we’re facing an existential risk.
In his speech Bostrom explores two different approaches to controlling artificial superintelligence: capability control and motivational control. But he immediately adds that every control mechanisms has its weakness.
Capability control can take different forms. There is physical containment. Locking the AI up in a box with no access to actuators or the Internet. But there is always the risk it will hack its way out of the box. Also people would have to communicate with the AI otherwise it would be useless. And people aren’t infallible systems, they can be compromised.
Bostrom suggests designing tripwires into the system. Specific occurrences (an unhealthy interest in paperclips for instance) will cause the system to shut down. However, this would require anticipating all possible suspect behaviors in the design phase. Secondly, the superintelligence may become aware of the tripwires and disable them.
Another way to control the AI is with incentives. The problem with doling out reward signals for good behavior is that the AI might circumvent human control over the virtual cookie jar. In a worst case scenario it would eliminate humans to get direct access to the reward button. Junkie AI.
Another approach is motivational control: engineering human friendly motivations into system in the design phase. One way of going about that is direct specification, programming good behavior directly into the AI. But human values are ‘fragile and complex’, says Bostrom, you can’t code those easily in C++.
A more feasible method is indirect specification, designing a system that learns over time just like humans acquire values as they mature. The problem here is that scientists don’t really know how this works in humans. Bostrom thinks this approach is one of the more promising avenues of controlling a superintelligence and suggests to research this subject more thoroughly.
Bostrom concludes that some contol mechanisms are less promising while other worthy of research. But no method has come anywhere close to providing a guaranteed solution to the superintelligence control problem.
Photo: Traderightuk