Predictability and Surprise in Large Generative Models

Citation

Authors: Deep Ganguli et al. Year: 2022 Venue: URL:

Abstract

This paper analyzes the paradox that large generative models are highly predictable (via scaling-laws) yet unpredictable in specific capabilities and outputs, with implications for AI policy.

Summary

Reconciles the apparent contradiction between predictable aggregate scaling behavior and unpredictable specific capability emergence, with recommendations for policy and deployment.

Key Contributions

  1. Framework distinguishing smooth general vs. abrupt specific scaling
  2. Analysis of predictability-unpredictability paradox
  3. Policy recommendations for AI deployment
  4. Economic value analysis of language models

Core Concepts & Definitions

Smooth General Capability Scaling

Model performance improves as power law in compute, data, parameters:

  • (compute)
  • (data)
  • (parameters)

Abrupt Specific Capability Scaling

Specific capabilities can suddenly emerge at particular scales:

  • GPT-3 3-digit addition: <1% (N<6B) → 80% (N=175B)
  • Gopher MMLU: ~30% (N<6B) → 60% (N=280B)

Open-Endedness

Models can produce outputs for essentially any input, making comprehensive testing impossible.

Distinguishing Features

  1. Smooth general capability scaling
  2. Abrupt specific capability scaling
  3. Unknown specific capabilities until tested
  4. Open-ended outputs

Main Results

  1. Scaling laws enable prediction of general but not specific capabilities
  2. Analogy: daily weather (specific, volatile) vs. seasonal averages (general, predictable)
  3. Language models increasingly function as recommendation systems with scale
  4. Recommendations: continuous monitoring, staged deployment, capability discovery protocols

Relevance to Project

Medium — Contextual/policy framing:

  • Explains why our skill framework matters (unpredictable specific capabilities)
  • Smooth vs. abrupt distinction relates to our task-relative ontology
  • Motivates systematic capability-discovery (ACD)
  • Policy implications for skill assessment deployment

Questions & Notes

  • Can our algebraic framework help predict which specific capabilities will emerge?
  • Does their smooth/abrupt distinction map to our mereological/algebraic structures?
  • Their “open-endedness” relates to our generative ontology concept