This task can be performed using Autoagent
Autoagent: autonomous harness engineering for smarter, faster testing
Best product for this task
Autoagent
oss
AutoAgent is an open-source framework for autonomous harness engineering where a meta-agent rewrites an LLM agent’s harness, runs Harbor benchmarks, and hill-climbs on scores. You define the loop in program.md and let it iteratively optimize prompts, tools, and orchestration.

What to expect from an ideal product
- Autoagent runs a meta-agent that automatically rewrites your LLM agent's prompts and tests them against Harbor benchmarks to find what works best
- The framework uses hill-climbing algorithms to continuously improve prompt performance by scoring results and keeping the versions that perform better
- You simply define your optimization goals in a program.md file and Autoagent handles the iterative testing and refinement process without manual intervention
- The system optimizes not just prompts but also tools and orchestration patterns, giving you a complete solution for agent performance tuning
- Built-in benchmark testing means you get measurable performance improvements rather than guessing whether your prompt changes actually help
