The Gym (Reinforcement Learning Sandbox)
The Gym is an isolated sandbox environment designed for safe code evaluation, testing, and benchmarking. It allows SYNAPSEED to compile and run Rust code fragments in ephemeral environments without polluting the main project or risking system stability.
Purpose
- Verification: Verify that code generated by the LLM actually compiles and passes tests.
- Benchmarking: Measure the performance of different code implementations.
- Adversarial Testing: Mutate code to ensure test suites are robust (mutation testing).
- Fuzzing: Auto-generate property-based tests to find edge cases.
How It Works
- Scenario Creation: A
Scenariois defined with source code, optional tests, and dependencies. - Isolation: The Gym creates a temporary Cargo project in a system temp directory.
- Execution: It compiles the code, runs the tests/benchmarks using
cargo testandcargo bench. - Reporting: Returns a
Reportwith compilation status, test results, performance metrics, and a composite score. - Cleanup: The temporary directory is automatically deleted after execution.
MCP Tool: train
The Gym is exposed via the train MCP tool.
{
"name": "train",
"args": {
"source": "pub fn add(a: i32, b: i32) -> i32 { a + b }",
"tests": "use eval_project::add;\n#[test]\nfn test_add() { assert_eq!(add(2, 2), 4); }",
"adversarial": true
}
}Advanced Features
Adversarial Mutation Testing
When enabled, the Gym will intentionally introduce "mutants" (bugs) into the code (e.g., changing a + b to a - b) and run the test suite. If the tests pass despite the bug, the mutant "survives," indicating a weak test suite. The Gym calculates a Mutation Score based on the percentage of killed mutants.
Fuzzing
The Gym can integrate with proptest to automatically generate property-based tests for public functions, trying to find inputs that cause panics or assertion failures.