Connecting graduate students in artificial intelligence

Third AI Safety Discussion

KEC 2057

The Fall 2022 AIGSA reading group is about AGI (Artificial General Intelligence) Safety Fundamentals. Join us for the third meeting, even if you missed earlier meetings! Lunch will be provided.

This week’s topic is Threat models and types of solutions. Here’s an introduction from Richard Ngo, the developer of the curriculum:

How might misaligned AGIs cause existential catastrophes, and how might we stop them? Two threat models are outlined in Christiano (2019) - the first focusing on outer misalignment, the second on inner misalignment. Muehlhauser and Salamon (2012) outline a core intuition for why we might be unable to prevent these risks: that progress in AI will at some point speed up dramatically. Ngo (2022) evaluates the implications of these intuitions for our ability to control misaligned AGIs.

How might we prevent these scenarios? Christiano (2020) gives a broad overview of the landscape of different contributions to making AIs aligned, with a particular focus on some of the techniques we’ll be covering in later weeks.

To prepare, please spend ~1 hour reading these short pieces:

  1. What failure looks like (Christiano, 2019) (10 mins)
  2. Intelligence explosion: evidence and import (Muehlhauser and Salamon, 2012) (only pages 10-15) (15 mins)
  3. AGI safety from first principles (Ngo, 2020) (only section 5: Control) (15 mins)
  4. AI alignment landscape (Christiano, 2020) (35 mins)