SELF: Self-Evolution with Language Feedback

Citation

Authors: Xiao Lu et al. Year: 2024 Venue: URL:

Abstract

How can LLMs continuously self-improve without external rewards or human intervention? This paper presents a two-phase framework: (1) Meta-skill learning teaches self-feedback and self-refinement, (2) Iterative self-evolution where model generates, refines, filters, and self-trains.

Summary

SELF enables autonomous model improvement through explicitly trained meta-skills for self-evaluation and self-refinement, followed by iterative self-evolution cycles.

Key Contributions

  1. Defines self-feedback and self-refinement as learnable meta-skills
  2. Two-phase training framework (meta-skill learning + self-evolution)
  3. Demonstrates meta-skill transfer to smaller models
  4. Shows progressive improvement through iteration

Core Concepts & Definitions

Meta-Skills (SELF definition)

  1. Self-Feedback Ability: Evaluate own responses using natural language:
  2. Self-Refinement Ability: Optimize responses based on self-feedback:

Self-Refinement Distribution

Meta-Skill Learning Objective

Main Results

  1. Vicuna-7B: 14.09% → 29.64% accuracy on GSM8K (3 iterations)
  2. Progressive improvement: +6.82% on GSM8K, +4.9% on SVAMP
  3. Meta-skill learning alone provides +6.82% boost
  4. Self-refinement transfers to smaller models (previously only in large models)

Relevance to Project

Medium — Relevant for meta-skill conceptualization:

  • Their meta-skills (feedback, refinement) are procedural, not compositional
  • Contrast with Fan et al.’s compositional meta-skills
  • Self-evolution relates to our ontological-expansion concept
  • Training objective could inform our fitness function learning

Questions & Notes

  • How do procedural meta-skills (SELF) relate to compositional meta-skills (Fan)?
  • Can self-refinement be viewed as a skill composition operation?
  • Their iterative improvement resembles our generative ontology expansion