Automated Capability Discovery via Foundation Model Self-Exploration

Citation

Authors: Cong Lu, Shengran Hu, Jeff Clune Year: 2025 Venue: Preprint (arXiv) URL: http://arxiv.org/abs/2502.07577 Code: https://github.com/conglu1997/ACD

Abstract

Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains. It remains challenging to precisely characterize the full spectrum of abilities and risks. We introduce AUTOMATED CAPABILITY DISCOVERY (ACD), a framework that designates one foundation model as a scientist to systematically propose open-ended tasks probing the abilities of a subject model.

Summary

ACD uses one LLM as a “scientist” to automatically discover capabilities and failure modes in a “subject” LLM through open-ended task generation.

Key Contributions

ACD framework for automated capability-discovery
Open-ended task generation with interestingness filtering
Automatic clustering into capability areas
Validated scoring aligning with human evaluation

Core Concepts & Definitions

ACD Framework

Scientist model: Proposes new task families
Subject model: Attempts tasks
Scoring via programmatic checks or LLM judge

Task Family

Structured set of tasks including:

Specific task instances with unique data
Instruction provision for subject model
Scoring mechanism

Interestingness Filter

Uses embedding-based similarity to determine if proposed task is “interestingly new.”

Main Results

5000 generations → 1330 “interestingly new” tasks → 25 distinct capability clusters
Human evaluation confirms high validity of auto-generated tasks
Self-assessment reasonably aligns with human judgments
Automatically generates “Capability Reports”

Relevance to Project

High — Directly applicable to our skill discovery:

Could automate expansion of our skill ontology $S$
“Interestingness filter” relates to our fitness function $ϕ$
Capability clustering maps to our induced ontology concept
Self-exploration aligns with ontological-expansion

Questions & Notes

Can we adapt ACD to discover skills rather than tasks?
How does their clustering compare to our algebraic structure?
Could “scientist model” help validate our skill compositions?

Skills Calculus

Explorer

Automated Capability Discovery via Foundation Model Self-Exploration

Automated Capability Discovery via Foundation Model Self-Exploration

Citation

Abstract

Summary

Key Contributions

Core Concepts & Definitions

ACD Framework

Task Family

Interestingness Filter

Main Results

Relevance to Project

Questions & Notes

Graph View

Table of Contents

Backlinks

Skills Calculus

Explorer

Automated Capability Discovery via Foundation Model Self-Exploration

Automated Capability Discovery via Foundation Model Self-Exploration

Citation

Abstract

Summary

Key Contributions

Core Concepts & Definitions

ACD Framework

Task Family

Interestingness Filter

Main Results

Relevance to Project

Questions & Notes

Related Papers

Graph View

Table of Contents

Backlinks