Automated Capability Discovery via Foundation Model Self-Exploration

Citation

Authors: Cong Lu, Shengran Hu, Jeff Clune Year: 2025 Venue: Preprint (arXiv) URL: http://arxiv.org/abs/2502.07577 Code: https://github.com/conglu1997/ACD

Abstract

Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains. It remains challenging to precisely characterize the full spectrum of abilities and risks. We introduce AUTOMATED CAPABILITY DISCOVERY (ACD), a framework that designates one foundation model as a scientist to systematically propose open-ended tasks probing the abilities of a subject model.

Summary

ACD uses one LLM as a “scientist” to automatically discover capabilities and failure modes in a “subject” LLM through open-ended task generation.

Key Contributions

  1. ACD framework for automated capability-discovery
  2. Open-ended task generation with interestingness filtering
  3. Automatic clustering into capability areas
  4. Validated scoring aligning with human evaluation

Core Concepts & Definitions

ACD Framework

  • Scientist model: Proposes new task families
  • Subject model: Attempts tasks
  • Scoring via programmatic checks or LLM judge

Task Family

Structured set of tasks including:

  1. Specific task instances with unique data
  2. Instruction provision for subject model
  3. Scoring mechanism

Interestingness Filter

Uses embedding-based similarity to determine if proposed task is “interestingly new.”

Main Results

  1. 5000 generations → 1330 “interestingly new” tasks → 25 distinct capability clusters
  2. Human evaluation confirms high validity of auto-generated tasks
  3. Self-assessment reasonably aligns with human judgments
  4. Automatically generates “Capability Reports”

Relevance to Project

High — Directly applicable to our skill discovery:

  • Could automate expansion of our skill ontology
  • “Interestingness filter” relates to our fitness function
  • Capability clustering maps to our induced ontology concept
  • Self-exploration aligns with ontological-expansion

Questions & Notes

  • Can we adapt ACD to discover skills rather than tasks?
  • How does their clustering compare to our algebraic structure?
  • Could “scientist model” help validate our skill compositions?