Why care about AI Safety?

Will AI really cause a catastrophe? Hopefully not! AI has tremendous potential for making the world a better place, especially as the technology continues to develop. We’re already seeing some beneficial applications of AI to healthcare, accessibility, language translation, automotive safety, and art creation, to name just a few. However, advanced AI also poses some serious risks.

Already with the current state of technology, AI can be (and indeed is) used by malicious actors to cause harm: spreading fake news, helping oppressive regimes to surveil and control their citizens (think of real-time facial recognition or China's social credit system). AI systems may also inadvertently cause harm due to being trained on vast amounts of data gathered from the web. Although this method enables the systems to learn about a wide range of topics, it also exposes them and causes them to learn from harmful content, such as hate speech, misinformation, and biases against various groups.

Given the fast pace of development in terms of technical innovations and applications in different domains of society, it is easy to imagine more harmful consequences (intended or unintended) in the near-term: just think of terrorists using AI to invent biochemical weapons, governments deploying autonomous weapons that harm civilians, or scammers faking your loved one's voice in an attempt to get money from you. Furthermore, the spread of AI systems has the potential to contribute to nuclear instability by sabotaging early warning mechanisms, or lay the groundwork for power transitions which aggravate geopolitical tensions and increase the chance of war. On a socio-economic level, AI's rapid growth can exacerbate inequality and cause labor displacement, leaving many without the means to sustain themselves. With regards to our epistemic security, use of AI to disseminate misinformation and drive mass manipulation campaigns may cripple people’s ability to separate truth from falsehood. To make things worse, the intense competition in AI development raises valid concerns about a high-stakes race wherein AI labs might be tempted to prioritize speed over safety, possibly leaving society exposed to inadequately vetted technologies with unpredictable consequences.

More speculatively, some experts believe that advanced AI systems could seek power or control over humans. It’s possible that future AI systems will be qualitatively different from those we see today - they may be able to form sophisticated plans to achieve their goals, and also understand the world well enough to strategically evaluate many relevant obstacles and opportunities. Furthermore, if deployed in an autonomous way (i.e. without humans directly controlling what task they perform), an advanced AI system may attempt to acquire resources or resist shutdown attempts, since these are useful strategies for some goals their designers might specify. Stuart Russell, professor of computer science at the University of California, Berkeley, and author of “the most popular artificial intelligence textbook in the world” offers just one hypothetical example: an AI tasked with combating the acidification of the oceans. To do this, the machine develops a catalyst that enables a rapid chemical reaction between ocean and atmosphere, restoring the oceans’ pH levels. However, in the process of doing so, it also depletes most of the atmosphere’s oxygen, leaving humans and animals to die from asphyxiation. This is of course just one hypothetical example that may never come to occur, but more generally, to see why such catastrophic failures might be challenging to prevent, see this research on specification gaming and goal misgeneralization from DeepMind.

It’s also worth reflecting on the possibility that an advanced AI system of this kind could outmaneuver humanity’s best efforts to stop it. Meta’s Cicero model reached human-level performance in Diplomacy (a strategic board game), demonstrating that AI systems can successfully negotiate with humans. This shows us that an advanced AI system could manipulate humans to assist it or trust it. In addition, AI systems are swiftly becoming proficient at writing computer code with models like GPT3.5 Turbo. Combined with models like ACT-1, which can take actions on the internet, it seems that advanced AI systems could be formidable computer hackers. Hacking creates a variety of opportunities; e.g. an AI system might steal financial resources to purchase more computational power, enabling it to train longer or even deploy copies of itself. These considerations and more have led some of the most cited AI researchers of all time, such as Yoshua Bengio and Geoffrey Hinton to voice that it is, at the very least, “not inconceivable” that AI ends up “wiping out humanity”.

It’s important to balance our collective effort to address the severe societal harms already unfolding from current AI (many of which could be risk factors in ultimately catastrophic outcomes) against our need to prepare for potential (extinction-level) harms from upcoming advanced AI systems. While many highly-reputable experts are concerned primarily about extinction-level risks from advanced AI systems, others argue that the focus has shifted too far from the damage that is already being done now / closer on the horizon. At AISIG, we aim to cover all such efforts that ensure AI will be beneficial for humanity, now and in the future.

As we stand on the cusp of a new technological chapter, it's imperative that we recognize the awe-inspiring opportunities that AI could open up for us. At AISIG, we strongly believe in the transformative power of AI - from conquering diseases and revolutionizing education, to perhaps even unraveling the mysteries of the universe. Imagine an AI-assisted world where the blind can ‘see’, languages are no barriers, and creativity reaches unimaginable heights. We are very passionate about AI and are truly excited about the prospects it holds. However, let us not forget, ‘with great power comes great responsibility’. The immense capabilities and impact of AI means that if not handled with extreme care, the risks could be catastrophic. We do not oppose AI, but rather aspire to guide its advancement with responsibility. Our call for AI safety embodies a commitment to ensuring that we can harness AI’s potential in a manner that benefits all of humanity. Through diligence and careful consideration of risks, let us pave the way for AI to be one of the most remarkable and positive forces in our shared history.

Participate in our courses

We facilitate the Center for AI Safety Course “AI Safety, Ethics, and Society” in two distinct cohorts, each customized to better fit two different approaches to AI Safety: Technical and Governance. We host both courses on-site in Groningen. This updated course has a broader scope compared to the previously facilitated AGISF, and addresses not only control issues and misalignment but also risks like malicious use, accidents, and societal dependence (enfeeblement).

Splitting the course into two cohorts allows us to better cater the course to our diverse audience: split mainly between students with a more technical background and students with a legal one. The main difference between the technical and governance cohorts is the depth in which certain topics are covered. For example, the technical cohort will focus more on the technical aspects of AI Safety, with extra sessions on related topics (e.g. Mechanistic Interpretability, Adversarial Attacks, etc). whereas the Governance cohort will spend more time on the governance aspects of AI Safety, going into more detail on case studies, etc. Both cohorts will have the same workload.

The workload consists of 2-4 hours of weekly readings and 2 hours of weekly learning discussions/interactive tutorials led by experienced facilitators with knowledge of the field.

Please note that these are not courses from the University of Groningen.

Introductory resources

Preventing an AI-related catastrophe (Benjamin Hilton) + audio version
AI experts are increasingly afraid of what they’re creating (Kelsey Piper)
Why I Think More NLP Researchers Should Engage with AI Safety Concerns (Samuel Bowman)
Why Would AI "Aim" To Defeat Humanity? (Holden Karnofsky) + audio version
The alignment problem from a deep learning perspective (Richard Ngo, Lawrence Chan, Sören Mindermann)
Benefits & Risks of Artificial Intelligence (Ariel Conn)
An Overview of Catastrophic AI Risks

Alternatively, here are some related podcasts:

Richard Ngo or Paul Christiano on the AI X-risk Research Podcast
Brian Christian or Ben Garfinkel on the 80,000 Hours Podcast
Ajeya Cotra or Rohin Shah on the Future of Life Institute Podcast

Receive a free book

If you are seriously considering getting involved with AI Safety (study/work), we would love to send you a free copy of either Superintelligence (Nick Bostrom), Human Compatible (Stuart Russell), or The Alignment Problem (Brian Christian). Simply fill out this form.