SELF: Self-Evolution with Language Feedback
Citation
Authors: Xiao Lu et al. Year: 2024 Venue: URL:
Abstract
How can LLMs continuously self-improve without external rewards or human intervention? This paper presents a two-phase framework: (1) Meta-skill learning teaches self-feedback and self-refinement, (2) Iterative self-evolution where model generates, refines, filters, and self-trains.
Summary
SELF enables autonomous model improvement through explicitly trained meta-skills for self-evaluation and self-refinement, followed by iterative self-evolution cycles.
Key Contributions
- Defines self-feedback and self-refinement as learnable meta-skills
- Two-phase training framework (meta-skill learning + self-evolution)
- Demonstrates meta-skill transfer to smaller models
- Shows progressive improvement through iteration
Core Concepts & Definitions
Meta-Skills (SELF definition)
- Self-Feedback Ability: Evaluate own responses using natural language:
- Self-Refinement Ability: Optimize responses based on self-feedback:
Self-Refinement Distribution
Meta-Skill Learning Objective
Main Results
- Vicuna-7B: 14.09% → 29.64% accuracy on GSM8K (3 iterations)
- Progressive improvement: +6.82% on GSM8K, +4.9% on SVAMP
- Meta-skill learning alone provides +6.82% boost
- Self-refinement transfers to smaller models (previously only in large models)
Relevance to Project
Medium — Relevant for meta-skill conceptualization:
- Their meta-skills (feedback, refinement) are procedural, not compositional
- Contrast with Fan et al.’s compositional meta-skills
- Self-evolution relates to our ontological-expansion concept
- Training objective could inform our fitness function learning
Questions & Notes
- How do procedural meta-skills (SELF) relate to compositional meta-skills (Fan)?
- Can self-refinement be viewed as a skill composition operation?
- Their iterative improvement resembles our generative ontology expansion