4 results for “evals”
Evaluate, score, and systematically improve prompts in the codebase. Identifies weak prompts, generates test cases, scores outputs, and proposes optimized versions. Use when the user says "improve this prompt", "why is the AI doing X", "eval my prompts", or "optimize the agent".
Texas Hold'em poker intelligence — hand evaluation, pot odds, position strategy, and live player profiling for AI agents
Guides agents through autonomous ManiSkill and VSLAM evaluation, tuning, verification, memory storage, and summary writing using the Autolab MCP server and repo tooling.
Iteratively optimize thermal designs by solving 2D heat equations (Poisson PDE). Parameterize heat source placement and material conductivity, simulate temperature distributions, evaluate performance metrics, and propose improvements autonomously. Use when the user asks to design, optimize, or analyze heat sinks, thermal layouts, cooling systems, or any steady-state thermal problem on a 2D domain.
Categories will appear as skills are published.