Fine-Tuning

Source: src/renderer/src/components/EngineInterface.tsx

Overview

LoRA and QLoRA fine-tuning using Apple's MLX framework. Runs entirely on-device using unified memory. No GPU rental or cloud services required.

Presets

Three built-in configurations:

Preset	Epochs	Learning Rate	Batch	LoRA Rank	Use Case
Draft	1	1e-5	2	4	Quick test runs
Balanced	3	2e-5	4	8	General fine-tuning
Deep	10	1e-5	2	16	Maximum quality

All parameters can be overridden manually.

Hyperparameters

Training

Parameter	Range	Default	Description
Epochs	1–100	3	Number of passes over the dataset
Learning Rate	>0 to 1.0	1e-4	Step size for weight updates
Batch Size	1–64	1	Samples per training step
Max Sequence Length	64–32768	512	Maximum token length per sample
Seed	integer or null	null	Random seed for reproducibility

LoRA

Parameter	Range	Default	Description
Rank	1–256	8	LoRA rank (lower = fewer parameters)
Alpha	>0	16.0	Scaling factor for LoRA updates
Dropout	0–1.0	0.0	Dropout probability in LoRA layers
Layers	1–128	8	Number of layers to apply LoRA to

Dataset

The training dataset must be a JSONL file with instruction, input (optional), and output fields. Use the Data Preparation page to convert CSV files.

Select the dataset file via the file picker. The UI shows a preview of the first few entries.

Training Flow

Select a base model (must be loaded).
Choose a preset or configure manually.
Select a JSONL dataset.
Name the fine-tuning job.
Click "Start Training".
The backend spawns a training thread. Frontend polls job status every 2 seconds.
Loss curve and metrics are displayed in real-time.
On completion, the adapter is saved to ~/.silicon-studio/adapters/.

Post-Training

After training completes:

The adapter is saved alongside the base model path.
It appears in the Models list with is_finetuned: true.
It can be exported via Model Export at various quantization levels.

DPO Preference Training

SiliconDev captures preference data automatically: every time you approve or reject a diff in the Code Workspace, the prompt and the proposed change are logged as a DPO pair (chosen/rejected).

Preference Training Tab

The fine-tuning UI includes a "Preference Training" tab that shows:

Pair counter: How many approve/reject pairs have been collected.
Train with Preferences button: Launches a DPO training job using the collected pairs.
Loss curve: Same real-time visualization as LoRA training.

DPO Data Format

Pairs are stored in ~/.silicon-studio/datasets/dpo_pairs.jsonl:

jsonl

{"prompt": "...", "chosen": "...", "rejected": "...", "metadata": {"tool": "patch_file", "file": "src/auth.py"}}

Inspector Scoring

The agent's Inspector adds a numeric quality score (1-10) to each response. This score acts as an automatic quality gate and can be used as a reward signal for RLAIF-style training.

Limitations

One training job at a time.
Training speed depends on model size and available unified memory.
Very large models (30B+) may not fit in memory for fine-tuning on 16GB machines.

Backend

Training is implemented in backend/app/engine/service.py using mlx_lm.lora(). The training config is written to a temporary YAML file and passed to the MLX training loop. Job status (loss, step, epoch) is tracked in memory and exposed via GET /api/engine/jobs/{id}.

Fine-Tuning ​

Overview ​

Presets ​

Hyperparameters ​

Training ​

LoRA ​

Dataset ​

Training Flow ​

Post-Training ​

DPO Preference Training ​

Preference Training Tab ​

DPO Data Format ​

Inspector Scoring ​

Limitations ​

Backend ​