Evaluation
Definition
Skill evaluation is the assessment of whether and how well a model can apply skills to tasks.
In the Literature
SKILL-MIX
- SKILL-MIX: Generate text demonstrating skills on random topic
- Metrics:
- Skill Fraction: proportion of skills exhibited
- Full Marks Ratio : proportion achieving perfect score
- Auto-grading: GPT-4 / LLaMA-2-70B judges
Beyond Stochastic Parrot Criterion
Model surpasses memorization if:
Competence (Arora & Goyal)
ACD (Lu et al.)
- Automated task generation for capability discovery
- Interestingness filtering for novel evaluations
- Capability clustering and reporting
In This Project
Fitness Function
Measures how well skill solves task .
Evaluation Decomposition
The interaction term captures emergent effects beyond individual skills.
Task-Relative Assessment
The set defines which skills are competent for task .
Related Concepts
- skills — What is evaluated
- composition — Composite skill evaluation
- emergence — Sudden capability appearance