ALMA Architecture
This document provides a comprehensive overview of ALMA's architecture, design principles, and implementation details.
Table of Contents
Overview
ALMA follows a 4-layer architecture that separates concerns and enables clean abstractions:
┌───────────────────────────────────────────────────────────┐
│ L4: Intent Layer │
│ User interfaces (CLI, API, Web UI) │
└─────────────────────┬─────────────────────────────────────┘
│
┌─────────────────────▼─────────────────────────────────────┐
│ L3: Reasoning & Orchestration │
│ 1. Intent Parsing (Qwen/LLM) │
│ 2. Resilient Workflow (LangGraph State Machine) │
│ [Validate -> Check -> Heal -> Execute -> Verify] │
└─────────────────────┬─────────────────────────────────────┘
│
┌─────────────────────▼─────────────────────────────────────┐
│ L2: Interface Layer │
│ Model Context Protocol (MCP) Server │
│ Standardized Tooling for LLMs (List, Deploy, etc.) │
└─────────────────────┬─────────────────────────────────────┘
│
┌─────────────────────▼─────────────────────────────────────┐
│ L1: Execution Layer │
│ ProxmoxEngine (Task-Aware) + Docker + K8s │
└───────────────────────────────────────────────────────────┘4-Layer Architecture
Layer 4: Intent Layer
Purpose: User interaction and interface management
Components:
- CLI (
alma/cli/): Command-line interface using Typer - REST API (
alma/api/): FastAPI-based HTTP API - Web UI (
alma-web/): React-based browser dashboard
Key Features:
- Multiple input formats (natural language, YAML, JSON)
- Rich output formatting (tables, progress bars)
- Interactive mode for complex workflows
- API key authentication (configurable)
Example CLI Commands:
alma deploy blueprint.yaml
alma council convene "web app with database"
alma templates list
alma dashboardExample API Calls:
POST /api/v1/conversation/generate-blueprint
POST /api/v1/blueprints/
POST /api/v1/ipr/
GET /api/v1/blueprints/{id}/deployLayer 3.5: Cognitive Layer (The Brain)
Purpose: Acts as a middleware between raw user intent and LLM execution. It provides safety, context, and personality.
Component: alma/core/cognitive.py
Key Sub-Systems:
Context Tracker (FocusContext)
- Responsibility: Remembers what resource we are talking about.
- Logic: If user says "Scale it to 5", the tracker resolves "it" to the
active_resource_id(e.g.,vm-web-01). Handles context switching detection.
Risk Guard (RiskProfile)
- Responsibility: Prevents catastrophic errors driven by emotion or haste.
- Matrix:
High Frustration+Destructive Intent= BLOCKLow Frustration+Destructive Intent= CONFIRMATION REQUIREDAny Emotion+Read Intent= ALLOW
Adaptive Persona Engine
- Responsibility: Modulates the tone of voice.
- Modes:
- ARCHITECT: Verbose, explanatory, suggests best practices. (Trigger:
create,design) - OPERATOR: Concise, JSON-heavy, status-focused. (Trigger:
deploy,scale) - MEDIC: Systematic, inquisitive, reassuring. (Trigger:
troubleshoot,fix)
- ARCHITECT: Verbose, explanatory, suggests best practices. (Trigger:
Layer 2: Modeling Layer
Purpose: Declarative infrastructure representation
Components:
- Schemas (
alma/schemas/): Pydantic models for validation - Database Models (
alma/models/): SQLAlchemy ORM models - Blueprint Parser: YAML ↔ Python object conversion
Blueprint Structure:
version: "1.0"
name: my-infrastructure
description: "Production web application"
resources:
- type: compute | network | storage | service
name: resource-name
provider: proxmox | docker | fake
specs:
# Provider-specific specifications
dependencies:
- other-resource-name
metadata:
# Additional metadata
metadata:
environment: production
owner: platform-teamValidation:
- Schema validation (Pydantic)
- Dependency graph validation
- Provider compatibility checks
- Resource quota checks
Layer 1: Execution Layer
Purpose: Actual infrastructure provisioning and management
Components:
- Engine Interface (
alma/engines/base.py): Abstract base class - Engine Plugins: Provider-specific implementations
- FakeEngine: Testing and development
- ProxmoxEngine: Proxmox VE integration
- (Future) DockerEngine, AnsibleEngine, etc.
- Controller: Orchestrates deployment workflow
- State Manager: Tracks resource state
Engine Interface:
class Engine(ABC):
async def validate_blueprint(blueprint) -> bool
async def deploy(blueprint) -> DeploymentResult
async def get_state(resource_id) -> ResourceState
async def destroy(resource_id) -> bool
async def rollback(deployment_id) -> bool
async def health_check() -> boolDeployment Flow:
1. Validate blueprint
2. Resolve dependencies
3. For each resource in topological order:
a. Check if exists
b. Create or update
c. Verify state
d. Update database
4. Return deployment resultComponent Details
Database Layer
Technology: PostgreSQL (production) / SQLite (development)
Tables:
system_blueprints: Stores blueprint definitionsinfrastructure_pull_requests: IPR workflow- (Future)
deployments: Deployment history - (Future)
resource_states: Current infrastructure state
Migrations: Alembic for schema versioning
Example Schema:
CREATE TABLE system_blueprints (
id INTEGER PRIMARY KEY,
version VARCHAR(50) NOT NULL,
name VARCHAR(255) NOT NULL,
description TEXT,
resources JSON NOT NULL,
metadata JSON,
created_at TIMESTAMP,
updated_at TIMESTAMP
);API Layer
Framework: FastAPI
Routes:
/ # Health check
/api/v1/blueprints/* # Blueprint CRUD
/api/v1/ipr/* # IPR workflow
/api/v1/conversation/* # AI chatFeatures:
- OpenAPI documentation (
/docs) - Async request handling
- CORS support
- Dependency injection
- Automatic validation
LLM Integration
Model: Configurable (defaults to Qwen/Qwen2.5-0.5B-Instruct when using local Transformers; any OpenAI-compatible endpoint is also supported)
Architecture:
User Input
↓
Prompt Template
↓
LLM Processing (Qwen2.5)
↓
Response Parsing
↓
Structured OutputPrompt Engineering:
- System prompts define AI role
- Few-shot examples for consistency
- JSON/YAML extraction
- Temperature control for creativity
Data Flow
Blueprint Creation Flow
1. User Request (CLI/API)
↓
2. Intent Classification (LLM)
↓
3. Blueprint Generation (LLM + Templates)
↓
4. Validation (Pydantic schemas)
↓
5. Save to Database (SQLAlchemy)
↓
6. Return Blueprint IDDeployment Flow
1. Create IPR (optional)
↓
2. Review & Approval (human)
↓
3. Load Blueprint
↓
4. Select Engine (based on provider)
↓
5. Validate with Engine
↓
6. Resolve Dependencies
↓
7. Deploy Resources (topological order)
↓
8. Update State
↓
9. Return Deployment ResultRollback Flow
1. Identify Deployment
↓
2. Load Target State
↓
3. Calculate Diff
↓
4. Generate Rollback Plan
↓
5. Execute Rollback (reverse order)
↓
6. Verify State
↓
7. Update DatabaseDesign Principles
1. Separation of Concerns
Each layer has a single, well-defined responsibility:
- Intent: User interaction
- Reasoning: Intelligence
- Modeling: Data representation
- Execution: Infrastructure operations
2. Plugin Architecture
Engines are pluggable modules that implement a common interface:
- Easy to add new providers
- Testable with FakeEngine
- Provider-agnostic core logic
3. Declarative Infrastructure
Blueprints describe what, not how:
- Idempotent operations
- Version-controlled
- Provider-independent (mostly)
4. Human-in-the-Loop
Critical operations require human approval:
- IPR system for deployments
- Dry-run mode for testing
- Rollback capabilities
5. Async Everything
All I/O operations are async:
- Better resource utilization
- Handles concurrent requests
- Non-blocking LLM inference
6. Type Safety
Strong typing throughout:
- Pydantic for runtime validation
- MyPy for static type checking
- Explicit contracts between layers
Technology Stack
Core
- Python 3.10+: Modern async/await
- FastAPI: High-performance API framework
- Pydantic: Data validation
- SQLAlchemy 2.0: Async ORM
- Alembic: Database migrations
AI/ML
- Transformers (optional): Hugging Face library for local LLM inference
- PyTorch (optional): Required when using local Transformers models
- LangGraph / LangChain: Workflow orchestration and LLM tooling
CLI
- Typer: CLI framework
- Rich: Terminal formatting
- PyYAML: YAML processing
Database
- PostgreSQL: Production database
- SQLite: Development database (default)
- AsyncPG: Async PostgreSQL driver
Testing
- Pytest: Test framework
- HTTPX: Async HTTP client
- Coverage: Code coverage
DevOps
- Docker: Containerization
- GitHub Actions: CI/CD
- Pre-commit: Code quality hooks
Performance Considerations
LLM Optimization
- Singleton pattern for model caching
- Warmup on startup
- Thread pool for inference
- Device selection (CPU/GPU/MPS)
Database Optimization
- Connection pooling
- Async queries
- Indexes on foreign keys
- JSON field indexing (PostgreSQL)
API Optimization
- Async request handling
- Response caching (future)
- Pagination for large results
- Streaming for long operations
Security
Authentication
- API key authentication (configurable via
ALMA_API_KEYenvironment variable) - OAuth2 integration (planned)
- Role-based access control (planned)
Data Protection
- Environment variables for secrets
- No secrets in blueprints
- Encrypted database connections
Audit Trail
- All changes logged
- IPR approval workflow
- Deployment history
Scalability
Horizontal Scaling
- Stateless API servers
- Shared database
- Load balancer ready
Vertical Scaling
- Async I/O handles many connections
- LLM model size configurable
- Database connection pooling
Current Status and Roadmap
Completed
- [x] Prometheus metrics (
alma/middleware/metrics.py) - [x] Web UI — React-based dashboard (
alma-web/) - [x] Event sourcing (
alma/core/events.py) - [x] CQRS pattern (
alma/core/cqrs.py) - [x] Docker engine (
alma/engines/docker.py) - [x] WebSocket support for real-time updates (
/ws/deployments) - [x] GraphQL API (
/graphql) — basic system status queries - [x] API key authentication
Planned
- [ ] Multi-tenancy
- [ ] RBAC (Role-Based Access Control)
- [ ] Native Terraform Provider (Go wrapper)
- [ ] Kubernetes Operator (CRD Controller)
References
Next: Engines Documentation