Skip to content

The Gym (Reinforcement Learning Sandbox)

The Gym is an isolated sandbox environment designed for safe code evaluation, testing, and benchmarking. It allows SYNAPSEED to compile and run Rust code fragments in ephemeral environments without polluting the main project or risking system stability.

Purpose

  • Verification: Verify that code generated by the LLM actually compiles and passes tests.
  • Benchmarking: Measure the performance of different code implementations.
  • Adversarial Testing: Mutate code to ensure test suites are robust (mutation testing).
  • Fuzzing: Auto-generate property-based tests to find edge cases.

How It Works

  1. Scenario Creation: A Scenario is defined with source code, optional tests, and dependencies.
  2. Isolation: The Gym creates a temporary Cargo project in a system temp directory.
  3. Execution: It compiles the code, runs the tests/benchmarks using cargo test and cargo bench.
  4. Reporting: Returns a Report with compilation status, test results, performance metrics, and a composite score.
  5. Cleanup: The temporary directory is automatically deleted after execution.

MCP Tool: train

The Gym is exposed via the train MCP tool.

json
{
  "name": "train",
  "args": {
    "source": "pub fn add(a: i32, b: i32) -> i32 { a + b }",
    "tests": "use eval_project::add;\n#[test]\nfn test_add() { assert_eq!(add(2, 2), 4); }",
    "adversarial": true
  }
}

Advanced Features

Adversarial Mutation Testing

When enabled, the Gym will intentionally introduce "mutants" (bugs) into the code (e.g., changing a + b to a - b) and run the test suite. If the tests pass despite the bug, the mutant "survives," indicating a weak test suite. The Gym calculates a Mutation Score based on the percentage of killed mutants.

Fuzzing

The Gym can integrate with proptest to automatically generate property-based tests for public functions, trying to find inputs that cause panics or assertion failures.

Released under the Apache License 2.0.