Determinism contract
Every GitignoreOutput is uniquely identified by:
GitignoreOutput(
content: str,
rules: tuple[Rule, ...],
output_hash: str, # "sha256:<hex>"
rules_table_version: str, # carried from the rules table
core_version: str, # the package version of the generator
)For a fixed input, all five fields are stable across runs, machines, and Python interpreters of the same major version.
Three layers of enforcement
1. Property tests
Located in packages/occam-gitignore-core/tests/test_determinism.py:
test_generate_is_byte_identical_across_runs— callgenerate(...)twice with the same arguments; assertout.contentandout.output_hashmatch.test_generate_is_invariant_to_input_order— shuffle the tree and the user extras; assert the output is byte-identical.test_output_carries_versions— bothcore_versionandrules_table_versionare non-empty and present in the result.
2. Snapshots
Located in packages/occam-gitignore-core/tests/test_snapshots.py. Five golden files in tests/snapshots/ capture the canonical output for typical project shapes (Python, Node+TS, Python+Docker, Java, Rust). The test:
- Re-runs
generate(...)with the recorded inputs. - Asserts byte-equality with the snapshot.
- Asserts the
sha256ofcontentmatches a hash hard-coded in the test. - Asserts
out.output_hashmatches that hash.
If any of those four checks fails, the change is intentional — and the snapshot and hash must be updated together.
3. Benchmark stability
Run with --repeats N. Each case is generated N times. The bench computes:
stability = (#cases whose hash is identical across all N runs) / (#cases)A regression to < 1.0 is a release blocker — the CI workflow exits with code 1 in that case (see _exit_code in occam_gitignore_bench.__main__).
What can change the hash legitimately
- A new template file (or an edit to an existing template).
- A new
rules_table.json(its version is content-addressed too). - A change in
core_version. - A change in
GenerateOptions(include_comments,include_provenance,extras).
What must never change the hash
- Process restarts, machine, OS, locale, timezone.
- The order in which the input tree is enumerated.
- The order in which user extras are passed.
PYTHONHASHSEED(we always sort before iteration).