Async & MLOps
Async LLM client, multi-tier caching, and structured logging
This document covers Trinity's infrastructure for LLM content generation:
- Async/Await: HTTP client for concurrent LLM requests
- Multi-Tier Caching: Memory, Redis (optional), and filesystem caching
- Structured Logging: JSON format for production observability
- Developer Experience: Makefile shortcuts and Docker support
Async LLM Client
Overview
Trinity's async LLM client supports concurrent requests:
- Non-blocking I/O for LLM calls
- Exponential backoff retry (max 3 attempts)
- Circuit breaker (fail-fast after repeated errors)
- Multi-tier cache support
- Structured logging with correlation IDs
Usage
python
from src.llm_client import AsyncLLMClient
import asyncio
async def generate():
async with AsyncLLMClient(
provider="ollama",
model="llama3.2:3b",
api_url="http://localhost:11434"
) as client:
# Single request
response = await client.generate_content_async(
prompt="Write a bio for a developer",
correlation_id="build-123"
)
# Concurrent requests
prompts = [
"Write a bio",
"Write a project description",
"Write a skills section"
]
responses = await asyncio.gather(
*[client.generate_content_async(p) for p in prompts]
)
return responses
results = asyncio.run(generate())Circuit Breaker
python
from trinity.utils.circuit_breaker import CircuitBreaker
breaker = CircuitBreaker(
failure_threshold=5, # Open after 5 failures
timeout=60 # Stay open for 60 seconds
)
async with AsyncLLMClient() as client:
try:
response = await breaker.call_async(
client.generate_content_async,
prompt="Write a bio"
)
except CircuitBreakerOpen:
logger.error("Circuit breaker open, service unavailable")
# Use cached response or fallbackStates:
- CLOSED: Normal operation
- OPEN: Failing fast (timeout period)
- HALF_OPEN: Testing if service recovered
When to Use Async vs Sync
Use Async When:
- Generating multiple pages in one build
- Batch processing
- Using cloud LLM providers (network latency)
Use Sync When:
- Single-page generation
- Simple scripts or prototypes
- Local testing
Important: Async requires Python 3.10+ and httpx[http2].
Multi-Tier Caching
Overview
Trinity implements a 3-tier cache for LLM responses:
- Memory Cache (in-process LRU, <1ms, cleared on restart)
- Redis Cache (distributed, 5-10ms, optional - requires Redis server)
- Filesystem Cache (persistent, 20-50ms,
.cache/directory)
Architecture
CacheManager (Unified API)
│
┌────────┼────────┐
▼ ▼ ▼
Memory Redis File
LRU (opt.) .cache/Lookup order: memory → Redis → filesystem → LLM API call → store in all tiers.
Configuration
yaml
# config/settings.yaml
cache:
enabled: true
tiers:
- memory # Always recommended
- redis # Optional (requires Redis server)
- filesystem # Always recommended
redis:
host: localhost
port: 6379
db: 0
password: null
ttl: 3600 # 1 hour (seconds)
filesystem:
directory: .cache
max_size_mb: 100Usage
Automatic Caching (via AsyncLLMClient):
python
async with AsyncLLMClient() as client:
# First call: cache MISS (calls LLM)
response1 = await client.generate_content_async(prompt)
# Second call with same prompt: cache HIT (<1ms)
response2 = await client.generate_content_async(prompt)Manual Cache Control:
python
from trinity.utils.cache_manager import CacheManager
cache = CacheManager()
cache.set("my-key", {"data": "value"}, ttl=3600)
value = cache.get("my-key") # Returns dict or None
cache.delete("my-key")
cache.clear_all()Cache Key Generation
Cache keys are derived from:
- Prompt content (hashed)
- Model name
- Provider
- Temperature/top_p settings
python
import hashlib
def generate_cache_key(prompt, model, provider):
content = f"{provider}:{model}:{prompt}"
return hashlib.sha256(content.encode()).hexdigest()[:16]Redis Setup (Optional)
bash
# macOS
brew install redis && brew services start redis
# Ubuntu/Debian
sudo apt-get install redis-server && sudo systemctl start redis
# Docker
docker run -d -p 6379:6379 redis:7-alpine
# Verify
redis-cli ping # Should return "PONG"Structured Logging
Overview
Trinity uses structured logging:
- Development: Human-readable colored output (default)
- Production (
TRINITY_ENV=Production): JSON format for log aggregation
Configuration
yaml
# config/logging.yaml
default_profile: development
profiles:
development:
level: DEBUG
format: human
production:
level: INFO
format: json
testing:
level: WARNING
format: jsonUsage
python
from trinity.utils.structured_logger import get_logger
logger = get_logger(__name__)
# Simple message
logger.info("server_started")
# With structured context
logger.info("request_processed", extra={
"method": "POST",
"path": "/generate",
"duration_ms": 234,
"status_code": 200
})JSON Output (Production)
json
{
"timestamp": "2025-01-27T12:34:56.789Z",
"level": "INFO",
"logger": "trinity.main",
"message": "build_started",
"correlation_id": "550e8400-e29b-41d4-a716-446655440000"
}Log Profiles
bash
# Development (human-readable)
LOG_PROFILE=development python main.py
# Production (JSON)
LOG_PROFILE=production python main.py > logs/trinity.log
# Testing (minimal output)
LOG_PROFILE=testing pytestMakefile Commands
bash
make logs # View all logs
make logs-json # View JSON logs
make logs-errors # View only errors
make logs-analyze # Analyze with jq
make logs-clear # Clear all logsDocker
Quick Start
bash
make docker-build
make docker-run
docker-compose logs -f trinity
make docker-stopdocker-compose.yml
yaml
version: '3.8'
services:
trinity:
build:
context: .
dockerfile: Dockerfile.dev
container_name: trinity-core
environment:
- LOG_PROFILE=production
- CACHE_REDIS_HOST=redis
volumes:
- ./logs:/app/logs
- ./output:/app/output
- ./.cache:/app/.cache
depends_on:
- redis
networks:
- trinity-network
redis:
image: redis:7-alpine
container_name: trinity-redis
ports:
- "6379:6379"
volumes:
- redis-data:/data
networks:
- trinity-network
volumes:
redis-data:
networks:
trinity-network:
driver: bridgeEnvironment Variables
bash
LOG_LEVEL=INFO
LOG_FORMAT=json
LOG_PROFILE=production
TRINITY_ENV=Production # Enable JSON telemetry to stdout
CACHE_REDIS_HOST=redis
CACHE_REDIS_PORT=6379
LLM_PROVIDER=ollama
LLM_API_URL=http://host.docker.internal:11434
LLM_MODEL=llama3.2:3bMakefile Reference
bash
# Setup
make install # Install dependencies
make setup # Full setup (venv + deps)
# Testing
make test # Run all tests
make test-coverage # With coverage report
make test-async # Only async tests
# Code Quality
make format # Format with black
make lint # Lint with ruff
make type-check # Type check with mypy
# Build
make build # Build with default theme
# Cache
make cache-clear # Clear all caches
# Logs
make logs # View all logs
make logs-errors # View only errors
# Docker
make docker-build # Build image
make docker-run # Run containerSee Development → Setup for full Makefile reference.
Next Steps
- Retry Logic with Heuristics - Full pipeline
- Setup Guide - Installation and configuration
- Self-Healing Features - Predictor and Healer
- LLM and Caching - Advanced caching strategies