Skip to content

Project Status & Documentation Index

ALMA: Infrastructure as Conversation
Current Version: 0.8.8


Feature Status

FeatureStatusNotes
Enhanced Function CallingCompletealma/core/tools.py — 13 tools
Streaming ResponsesCompleteSSE endpoints in alma/api/routes/conversation.py
Blueprint TemplatesCompletealma/config/blueprints.yaml — 10 templates
Rate LimitingCompletealma/middleware/rate_limit.py
Metrics CollectionCompletealma/middleware/metrics.py — Prometheus
API Key AuthenticationCompleteConfigurable via ALMA_API_KEY env var
Web UICompleteReact-based dashboard in alma-web/
LangGraph WorkflowCompletealma/core/agent/graph.py
Multi-Agent CouncilCompletealma/core/agent/council.py
WebSocket UpdatesComplete/ws/deployments
GraphQL APIPartialBasic system status only (/graphql)
Kubernetes EngineExperimentalalma/engines/kubernetes.py
RBACPlannedNot yet implemented
Multi-tenancyPlannedNot yet implemented

Documentation

User Documentation

DocumentPurpose
README.mdProject overview, quick start
USER_GUIDE.mdUser guide with examples
API_REFERENCE.mdAPI documentation

Technical Documentation

DocumentPurpose
PRODUCTION_DEPLOYMENT.mdProduction setup & operations
RATE_LIMITING_AND_METRICS.mdMonitoring reference
ARCHITECTURE.mdTechnical architecture
TOOLS_API.mdLLM tools documentation
STREAMING_AND_TEMPLATES.mdStreaming & templates

Feature Highlights

1. Enhanced Function Calling (13 Tools)

The LLM can call these infrastructure tools:

  • create_blueprint - Generate infrastructure blueprints
  • validate_blueprint - Syntax and semantic validation
  • estimate_resources - Resource requirement calculation
  • optimize_costs - Cost reduction recommendations
  • security_audit - Security compliance checks
  • generate_deployment_plan - Step-by-step deployment guides
  • troubleshoot_issue - Problem diagnosis
  • compare_blueprints - Version comparison
  • suggest_architecture - Best practices recommendations
  • calculate_capacity - Capacity planning
  • migrate_infrastructure - Migration strategies
  • check_compliance - Compliance verification
  • forecast_metrics - Predictive analytics

2. Streaming Responses (SSE)

Real-time streaming for LLM operations:

  • Endpoints: /chat-stream, /generate-blueprint-stream
  • Implementation: Server-Sent Events (SSE)

3. Blueprint Templates (10 Templates)

Pre-built infrastructure topologies (see alma/config/blueprints.yaml):

TemplateComplexityResources
simple-web-appLow3
ha-web-appMedium8
microservices-k8sHigh15+
postgres-haMedium6
data-pipelineHigh12
ml-trainingHigh10
zero-trust-networkMedium9
observability-stackMedium7
api-gatewayLow4
redis-clusterMedium5

4. Rate Limiting

Token bucket algorithm with per-endpoint limits:

  • Global: 60 RPM per IP
  • Burst: 10 requests (customizable)
  • Per-Endpoint Limits:
    • Chat streaming: 20 RPM (LLM intensive)
    • Blueprint generation: 30 RPM
    • Tool execution: 40 RPM
    • CRUD operations: 100 RPM
  • Headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset

5. Metrics Collection

Prometheus metrics endpoint (/metrics) with:

  • HTTP requests (total, duration, sizes)
  • LLM operations (requests, tokens, duration)
  • Blueprint operations (CRUD, validation)
  • Deployments (operations, duration)
  • Tool executions (by tool, status)
  • Rate limiting (hits, clients)
  • System metrics (connections, cache)

A pre-configured Grafana dashboard is included in grafana-dashboard.json.


Getting Started

Installation

bash
git clone https://github.com/fabriziosalmi/alma.git
cd alma
python3 -m venv venv
source venv/bin/activate
pip install -e .
python run_server.py

First API Call

bash
curl -X POST http://localhost:8000/api/v1/conversation/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Create a simple web application"}'

Explore Templates

bash
# List templates
curl http://localhost:8000/api/v1/templates

# Get specific template
curl http://localhost:8000/api/v1/templates/ha-web-app

Check Metrics

bash
# Prometheus format
curl http://localhost:8000/metrics

# Human-readable
curl http://localhost:8000/api/v1/monitoring/metrics/summary

API Endpoints Summary

Core APIs

EndpointMethodPurpose
/api/v1/blueprintsGET, POSTBlueprint CRUD
/api/v1/blueprints/{id}GET, PUT, DELETESingle blueprint
/api/v1/blueprints/generate-blueprintPOSTAI generation
/api/v1/blueprints/generate-blueprint-streamPOSTAI streaming
/api/v1/conversation/chatPOSTChat interface
/api/v1/conversation/chat-streamPOSTChat streaming
/api/v1/iprsGET, POSTIPR management
/api/v1/iprs/{id}/reviewPOSTReview IPR
/api/v1/iprs/{id}/deployPOSTDeploy IPR
/api/v1/toolsGETList tools
/api/v1/tools/executePOSTExecute tool
/api/v1/templatesGETList templates
/api/v1/templates/{id}GETGet template
/api/v1/templates/{id}/customizePOSTCustomize template

Monitoring APIs

EndpointMethodPurpose
/metricsGETPrometheus metrics
/api/v1/monitoring/metrics/summaryGETMetrics summary
/api/v1/monitoring/rate-limit/statsGETRate limit stats
/api/v1/monitoring/health/detailedGETHealth check

Production Deployment

bash
# Start full stack with monitoring
docker-compose -f docker-compose.metrics.yml up -d

Includes:

  • ALMA API server
  • PostgreSQL 15
  • Redis 7
  • Prometheus
  • Grafana (with pre-configured dashboard)

Access:

See PRODUCTION_DEPLOYMENT.md for full setup details.


Testing

bash
# Unit tests
pytest tests/unit/ -v

# Test rate limiting
for i in {1..70}; do curl http://localhost:8000/api/v1/blueprints; done

# Test metrics
curl http://localhost:8000/metrics | grep http_requests_total

# Test streaming
curl -N http://localhost:8000/api/v1/conversation/chat-stream \
  -d '{"message": "Create Kubernetes cluster"}'

Security

Implemented

  • IP-based rate limiting
  • SQL injection protection (SQLAlchemy ORM)
  • Input validation (Pydantic)
  • CORS configuration
  • API key authentication (configurable via ALMA_API_KEY)

Planned

  • JWT token support
  • RBAC (Role-Based Access Control)
  • OAuth2 integration
  • Audit logging
  • Encryption at rest

Roadmap

Near Term

  • RBAC: Fine-grained role-based access control.
  • Native K8s Operator: CRD-based management instead of API-driven.

Future

  • Cloud Portability: Abstracted blueprint definitions for AWS, Azure, GCP.
  • Hybrid Deployment: Unified management plane for On-Prem and Cloud.
  • Anomaly Detection: ML-based resource usage monitoring.

Support & Community


Implementation Status

Completed

  • [x] Core Engines: Proxmox (primary), Docker, Kubernetes (experimental), Simulation.
  • [x] IPR Workflow: Create, Review, Deploy lifecycle.
  • [x] LLM Integration: Configurable LLM backend with 13 specialized tools.
  • [x] Multi-Agent Council: Architect, SecOps, FinOps agents for blueprint review.
  • [x] LangGraph Workflow: State machine for deployment orchestration.
  • [x] Observability: Prometheus metrics.
  • [x] Rate Limiting: Token-bucket per-IP limiting.
  • [x] Streaming: SSE endpoints for real-time LLM output.
  • [x] API Key Authentication: Configurable auth middleware.
  • [x] Web UI: React-based dashboard (alma-web/).

Active Development

  • [ ] Native K8s Operator: CRD-based management.
  • [ ] RBAC: Fine-grained role-based access control.
  • [ ] Multi-tenancy: Isolated namespaces per team.

Released under the Apache 2.0 License.