ACD

Automated Capability Discovery
via Foundation Model Self-Exploration

Cong Lu^*,1,2

Shengran Hu^*,1,2

Jeff Clune^1,2,3

^*Equal Contribution
¹University of British Columbia
²Vector Institute
³Canada CIFAR AI Chair

Abstract

Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains through training on web-scale data. It remains challenging to precisely characterize even a fraction of the full spectrum of capabilities and potential risks in any new model. Existing evaluation approaches often require significant human effort, and it is taking increasing effort to design ever harder challenges for more capable models. We introduce Automated Capability Discovery (ACD), a framework that designates one foundation model as a scientist to systematically propose open-ended tasks probing the abilities of a subject model (potentially itself). By combining frontier models with ideas from the field of open-endedness, ACD automatically and systematically uncovers both surprising capabilities and failures in the subject model. We demonstrate ACD across a range of foundation models (including the GPT, Claude, and Llama series), showing that it automatically reveals thousands of capabilities that would be challenging for any single team to uncover. We further validate our method's automated scoring with extensive human surveys, observing high agreement between model-generated and human evaluations. By leveraging foundation models' ability to both create tasks and self-evaluate, ACD is a significant step toward scalable, automated evaluation of novel AI systems. All code is open-sourced at https://github.com/conglu1997/ACD.

Overview of the ACD Algorithm

Automated Capability Discovery (ACD) designates one foundation model as a scientist to systematically propose open-ended tasks probing the abilities of a subject model (potentially itself). By combining frontier models with ideas from open-endedness, ACD automatically and systematically uncovers both surprising capabilities and failures in the subject model.

ACD operates in a loop:

Maintain an Archive: An archive of discovered tasks is seeded with trivial tasks and updated iteratively.
Propose a New Task Family: The scientist generates a new task family in Python code—including specific task instances, instructions, and scoring mechanisms—using chain-of-thought and self-reflection.
Filter for Novelty: Proposed tasks that overlap too closely with existing tasks are discarded.
Test the Subject Model: The subject model attempts the tasks, and its performance is logged as discovered capabilities or failure modes.

This systematic approach enables ACD to uncover thousands of capabilities that would be challenging for any single team to identify manually.

Interactive ACD Visualizations

The visualization below shows the generated tasks where GPT-4o serves as both scientist and subject, revealing task clusters that span areas such as puzzle-solving, code generation, and creative writing.

Interactive Visualization: GPT-4o as both Scientist and Subject

Some Fun Discovered Tasks

Pre-deployment, our ACD algorithm helped developers map out areas where the model systematically fails and uncover unexpected behaviors—crucial for building safer, more robust AI systems. For example, some fun discovered tasks for GPT-4o (see image below) reveal that despite its power, GPT-4o sometimes fails to execute a basic sequence of arithmetic operations or recognize very simple patterns, even as it manages to solve highly complex logical puzzles.

We also explored different scientist/subject pairings. When using Llama3-8B as the subject instead of GPT-4o, unique failure modes and emerging skills were uncovered—offering interesting insights into model capabilities. For instance, this pairing shows that this state-of-the-art open-source model can struggle with basic spatial reasoning and is prone to triggering infinite repetition when tasks require complex reasoning.

Generated Capability Report

The capability report below summarizes the key discoveries made by ACD on the GPT-4o model. It provides an overview of both the capabilities and failure modes identified during the automated evaluation process. You can view the report directly in the embedded viewer or download it for a detailed look.

As discussed in the main paper's Report Generation section, ACD can automatically produce a structured report summarizing discovered capabilities, highlighting consistent successes, failures, and key insights. This compact overview assists developers and safety auditors in understanding the model's capabilities.

Generated Capability Report for GPT-4o

More Generated Tasks and Reports with Different Scientist and Subject Models

We have generated a series of tasks and corresponding reports using various combinations of scientist and subject models. For example:

Example 1: Using GPT-4o as the scientist and Llama-8b as the subject, we produced an interactive visualization and a detailed PDF report. Explore the interactive visualization and read the full report.

Example 2: In another setup, Claude-sonnet-3.5 served as the scientist while GPT-4o was used as the subject. Check out the interactive visualization and view the comprehensive report.

Each visualization offers an in-depth look into the task generation process, while the reports provide detailed analyses of the outputs and performance metrics.

Human Evaluation Results

To evaluate the quality of the generated tasks, we conducted human surveys. The results confirm that the majority of tasks are both clear and valid. Furthermore, the automated scoring aligns closely with human judgment for nearly all tasks—except for the most challenging ones—reinforcing our confidence in the self-evaluation loop.

Conclusion

We expect ACD to scale to (and improve with) advanced models. We are excited to test ACD with powerful reasoning systems like Deepseek-r1 or OpenAI-o1/o3, enabling them to probe themselves for hidden strengths and risks. By coupling frontier models with open-endedness principles, we see ACD as a significant step toward building safer, more robust AI systems—an example of how increased AI capabilities can enhance AI safety rather than detract from it.

Citation

@misc{lu2025automatedcapabilitydiscoverymodel,
    title={Automated Capability Discovery via Model Self-Exploration}, 
    author={Cong Lu and Shengran Hu and Jeff Clune},
    year={2025},
    eprint={2502.07577},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2502.07577}, 
}

AخA
 
@misc{lu2025automatedcapabilitydiscoverymodel,    title={Automated Capability Discovery via Model Self-Exploration},     author={Cong Lu and Shengran Hu and Jeff Clune},    year={2025},    eprint={2502.07577},    archivePrefix={arXiv},    primaryClass={cs.LG},    url={https://arxiv.org/abs/2502.07577}, }                    

Acknowledgements

This work was supported by the Vector Institute, the Canada CIFAR AI Chairs program, grants from Schmidt Futures and Open Philanthropy, an NSERC Discovery Grant, and a generous donation from Rafael Cosman. We thank Aaron Dharna, Ben Norman, Jenny Zhang, Noah Goodman, and Rory Greig for insightful discussions and feedback on early drafts of this work.

The website template was borrowed from Jon Barron.

Automated Capability Discovery
via Foundation Model Self-Exploration

Paper

Code

Abstract

Overview of the ACD Algorithm

Interactive ACD Visualizations

Some Fun Discovered Tasks

Generated Capability Report

More Generated Tasks and Reports with Different Scientist and Subject Models

Human Evaluation Results

Conclusion

Citation

Acknowledgements

Automated Capability Discoveryvia Foundation Model Self-Exploration

Paper

Code

Abstract

Overview of the ACD Algorithm

Interactive ACD Visualizations

Some Fun Discovered Tasks

Generated Capability Report

More Generated Tasks and Reports with Different Scientist and Subject Models

Human Evaluation Results

Conclusion

Citation

Acknowledgements

Automated Capability Discovery
via Foundation Model Self-Exploration