Automated Capability Discovery via Foundation Model Self-Exploration
Citation
Authors: Cong Lu, Shengran Hu, Jeff Clune Year: 2025 Venue: Preprint (arXiv) URL: http://arxiv.org/abs/2502.07577 Code: https://github.com/conglu1997/ACD
Abstract
Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains. It remains challenging to precisely characterize the full spectrum of abilities and risks. We introduce AUTOMATED CAPABILITY DISCOVERY (ACD), a framework that designates one foundation model as a scientist to systematically propose open-ended tasks probing the abilities of a subject model.
Summary
ACD uses one LLM as a “scientist” to automatically discover capabilities and failure modes in a “subject” LLM through open-ended task generation.
Key Contributions
- ACD framework for automated capability-discovery
- Open-ended task generation with interestingness filtering
- Automatic clustering into capability areas
- Validated scoring aligning with human evaluation
Core Concepts & Definitions
ACD Framework
- Scientist model: Proposes new task families
- Subject model: Attempts tasks
- Scoring via programmatic checks or LLM judge
Task Family
Structured set of tasks including:
- Specific task instances with unique data
- Instruction provision for subject model
- Scoring mechanism
Interestingness Filter
Uses embedding-based similarity to determine if proposed task is “interestingly new.”
Main Results
- 5000 generations → 1330 “interestingly new” tasks → 25 distinct capability clusters
- Human evaluation confirms high validity of auto-generated tasks
- Self-assessment reasonably aligns with human judgments
- Automatically generates “Capability Reports”
Relevance to Project
High — Directly applicable to our skill discovery:
- Could automate expansion of our skill ontology
- “Interestingness filter” relates to our fitness function
- Capability clustering maps to our induced ontology concept
- Self-exploration aligns with ontological-expansion
Questions & Notes
- Can we adapt ACD to discover skills rather than tasks?
- How does their clustering compare to our algebraic structure?
- Could “scientist model” help validate our skill compositions?