How Can We Restrict the Artificial Intelligence of the Future?

How Can We Restrict the Artificial Intelligence of the Future?
How Can We Restrict the Artificial Intelligence of the Future?
As artificial intelligence becomes more and more involved in our lives, it is becoming extremely important for us to place certain limits on it. To some extent, describing this as a matter of life and death would not be an exaggeration.
Could Artificial Intelligence Go Out of Control?
Preventing potential risks while using new technologies has always been part of humanity’s strategy for taming the “beast” of technology. That is why we have highly sophisticated electrical protection systems, detailed traffic rules, countless road safety devices, and even an enormous internet security industry.
In reality, because of the danger of electric shock, we do not decide to shut down electricity across an entire city. Instead, we confine it within layers of safety so that the technology can serve humanity securely. Today, a similar situation is emerging with AI. Much like the panic humans first felt when facing fire, more than a century of science fiction has conditioned people to think first of robots ruling the Earth when confronted with AI. In my view, this possibility is like an asteroid hitting the planet: it is hypothetical, it could happen, but no one knows when.
However, as AI develops rapidly and is applied more widely, the dangers and uncertainties of this new technology are gradually becoming visible. Where are AI’s “insulating tape” and “circuit breaker”? Not long ago, DeepMind revealed in a blog post that AI models may fall into confusion and lose control. They are preparing to develop an “AI insurance mechanism” that could completely shut down AI in an emergency. In other words, once a malicious tendency is detected in an AI, the system would proactively terminate all of its activity.
Of course, research in this area is still largely exploratory, and it raises a series of questions worth thinking about. If there were something like an AI fuse switch, under what circumstances should it stop AI from operating? Are there other ways to ensure AI safety?
What AI risks do we need to guard against? Fire may be one of the most destructive technologies in human history, but at least no one deliberately blames “the evil of fire” or “Prometheus’s original sin.” AI, however, is somewhat different. The complexity of deep neural networks means that AI’s operating logic can, in some cases, be unexplainable or unpredictable. This is the widely discussed AI black-box problem. AI often feels mysterious and frightening—so what are the actual dangers of AI in real-world applications?
Bias
Artificial intelligence has already been shown to learn rudeness and racism. We have seen previous cases in which AI reflected bias related to jobs and race. For example, in March 2016, Microsoft launched a chatbot named Tay. Less than a day after its release, Tay had transformed from a cute 19-year-old “girl” persona into an “AI madman” spewing offensive language and racist remarks, prompting Microsoft to urgently take the product offline.
The root cause of this phenomenon is that AI learns by absorbing conversational data from social networks. But the data itself contains biased and discriminatory language, so the AI learns harmful things and incorporates them into its behavioral patterns. How can we make AI learn only what we consider good? At present, there does not seem to be a good answer.
Fraud and Destruction
People can not only teach bad things to AI—they can also directly use AI to do harm, and this is not uncommon. In 2015, the UK had already identified the use of AI models to imitate users’ tone of voice for email and telecommunications fraud. In addition, many hackers have demonstrated AI’s ability to steal passwords and break through security systems. In many countries, including China, criminals have begun using AI technology to fabricate e-commerce accounts and transaction orders in order to deceive investors into continuing to put in money.
Beyond Human Cognition
As a computer algorithm, AI’s cognition is not based on human common sense, yet both ordinary people and researchers often overlook this. A famous case involved DeepMind training an AI on a rowing game and discovering that the deep learning model eventually arrived at strategies that no normal human player would likely choose.
This should be taken seriously by everyone. In a self-driving scenario, for example, if AI begins to think in ways that do not follow human traffic rules, it might decide to drive off an overpass directly to the ground or travel against traffic because it sees those options as more efficient.
This is not alarmism. Current research has found that even slight damage to road signs can greatly interfere with computer vision. After all, the way a machine interprets an incorrect sign is not the same way a human “thinks” about it.
What Can We Do to Restrict AI?
Losing control over AI may differ from the risks posed by any other technology in human history. AI has already absorbed vast amounts of data and gone through complex internal transformations, so the difficulty humans face is that there is no simple safety rule for AI like there is for electricity. It carries elusive hidden risks. So how can we impose limits on AI?
There are several ideas circulating in the industry. It should be noted that this is not a discussion that leads to one single conclusion. In my view, actually placing limits on AI will require a comprehensive solution in which different approaches work together.
The Executioner Theory
This idea can be traced back to DeepMind, mentioned earlier. The AI safety technology they are developing can be understood as placing an “AI executioner” behind complex AI tasks, always ready to act. The principle is to build another AI system with strong capabilities and its own safety logic, which uses reinforcement learning mechanisms to monitor other AI models at all times. Once it detects a risk in another AI, the executioner immediately terminates that AI’s activity.
In fact, the concept of “interruptibility” has long been central to DeepMind’s work on AI safety. In December 2017, they published a research report titled “Safe Interruptible Agents,” showing how to ensure that an intelligent agent’s performance would not be affected when it is restarted after being interrupted.
Technically, using AI to monitor AI is feasible, but it also introduces new problems. For example, with increasingly complex deep neural networks, models for tracing the source of problems may consume an unbearable amount of labor and cost. And then there is the obvious question: who monitors the “AI executioner”?
The Prosecutor Theory
Whether it is discrimination or behavior that goes beyond human cognition, much of it can be attributed to the black-box nature of deep learning. So is there a way to see through the black box and help human developers identify exactly where an AI went wrong, so that it can be corrected rather than simply interrupted?
I believe that making the black box safe and controllable is one of the main directions of AI safety research. At present, there are two major ways of interpreting the black box.
One is to use AI to inspect and trace AI. For example, attention mechanisms can be used to design neural network models specifically to replicate and track the trajectories of other AI models, thereby identifying the source of bad training outcomes and helping developers make corrections. The other is to use tools that make the structure of deep learning models visible—in other words, making the black box transparent. In this way, when AI fails, R&D personnel can relatively easily inspect the training process of each layer and locate the problem.
However, whether it is an AI prosecutor or a human prosecutor, today’s black-box interpretation technologies usually only deal with less complex deep learning models. At the same time, these solutions generally require large numbers of people to participate, and more problematically, the manpower involved must be highly technically skilled.
The Ethicist
In many ways, preventing AI from doing harm is not merely a technical problem. For example, whether training data is biased largely depends on whether the data providers themselves are biased. Likewise, many AI discrimination problems are caused by developers’ desire to improve business efficiency, which is also a moral issue. In addition, whether it is possible to restrain the urge to develop AI weapons and AI surveillance tools is a matter of social and international responsibility.
To prevent these problems from spreading, AI should not be constrained only from a technical perspective; broad social mechanisms should also be introduced. Earlier this year, 14 organizations and universities—including OpenAI, Oxford University, and Cambridge University—published a research report titled “The Malicious Use of Artificial Intelligence.” The report points out that we should recognize that today’s AI research achievements are dual-use, like a double-edged sword, and that in order to control the risks brought by AI, policymakers should work closely with technologists to investigate, prevent, and mitigate possible malicious uses of AI. In the field of AI, norms and ethical frameworks should be prioritized.
It is clear that restricting AI in the realms of technology, law, morality, and research practice has become an international consensus. Obviously, this is easy to say but may be very difficult to do.
No matter which approach is used to restrict AI, we ultimately have to confront a philosophical question: human nature itself is contradictory, yet we must use unified principles to govern artificial intelligence that imitates humanity. Who will support plans to restrict AI? As AI increasingly depends on training data from human society, human value judgments will also be passed into AI, including some of the moral barriers and distortions that exist within society.
To prevent AI from doing harm, we must first define the boundary between good and evil. Is that boundary absolutely correct? Can those responsible for defining it truly meet the requirement of not doing evil? We all know that Google AI once classified Black people as gorillas, which is clearly a form of discrimination.
In addition, are rules for restricting AI consistent across nations? Today, more and more AI companies, international industrial organizations, and even governments are beginning to focus on AI ethics and to formulate internationally unified ethical standards for AI. But would unified AI regulations violate the customs of some countries? Would they hinder AI research in certain regions? For example, is the European Union’s privacy-protection policy for AI research really suitable for the entire world?
These ethical issues surrounding AI are almost inherently contradictory. Even in the longer-term future, will human judgment truly be better than AI’s? At certain moments, when we use technology to interrupt unpredictable machine-learning behavior, does that actually reveal human weakness or ignorance? Or does it stop new possibilities in which technology creates more technology?
Problems and solutions always move forward in alternation.
Suppose humanity creates a system to restrict AI. Should it be used to stop AI at the moment it awakens autonomous thought? But what moment would that be? And would the system making that judgment itself satisfy the conditions of the very restrictions we seek to impose?


