Skip to content

Async & MLOps

Async LLM client, multi-tier caching, and structured logging

This document covers Trinity's infrastructure for LLM content generation:

  • Async/Await: HTTP client for concurrent LLM requests
  • Multi-Tier Caching: Memory, Redis (optional), and filesystem caching
  • Structured Logging: JSON format for production observability
  • Developer Experience: Makefile shortcuts and Docker support

Async LLM Client

Overview

Trinity's async LLM client supports concurrent requests:

  • Non-blocking I/O for LLM calls
  • Exponential backoff retry (max 3 attempts)
  • Circuit breaker (fail-fast after repeated errors)
  • Multi-tier cache support
  • Structured logging with correlation IDs

Usage

python
from src.llm_client import AsyncLLMClient
import asyncio

async def generate():
    async with AsyncLLMClient(
        provider="ollama",
        model="llama3.2:3b",
        api_url="http://localhost:11434"
    ) as client:
        
        # Single request
        response = await client.generate_content_async(
            prompt="Write a bio for a developer",
            correlation_id="build-123"
        )
        
        # Concurrent requests
        prompts = [
            "Write a bio",
            "Write a project description",
            "Write a skills section"
        ]
        
        responses = await asyncio.gather(
            *[client.generate_content_async(p) for p in prompts]
        )
        
        return responses

results = asyncio.run(generate())

Circuit Breaker

python
from trinity.utils.circuit_breaker import CircuitBreaker

breaker = CircuitBreaker(
    failure_threshold=5,   # Open after 5 failures
    timeout=60            # Stay open for 60 seconds
)

async with AsyncLLMClient() as client:
    try:
        response = await breaker.call_async(
            client.generate_content_async,
            prompt="Write a bio"
        )
    except CircuitBreakerOpen:
        logger.error("Circuit breaker open, service unavailable")
        # Use cached response or fallback

States:

  • CLOSED: Normal operation
  • OPEN: Failing fast (timeout period)
  • HALF_OPEN: Testing if service recovered

When to Use Async vs Sync

Use Async When:

  • Generating multiple pages in one build
  • Batch processing
  • Using cloud LLM providers (network latency)

Use Sync When:

  • Single-page generation
  • Simple scripts or prototypes
  • Local testing

Important: Async requires Python 3.10+ and httpx[http2].


Multi-Tier Caching

Overview

Trinity implements a 3-tier cache for LLM responses:

  1. Memory Cache (in-process LRU, <1ms, cleared on restart)
  2. Redis Cache (distributed, 5-10ms, optional - requires Redis server)
  3. Filesystem Cache (persistent, 20-50ms, .cache/ directory)

Architecture

CacheManager (Unified API)

┌────────┼────────┐
▼        ▼        ▼
Memory  Redis   File
LRU    (opt.)  .cache/

Lookup order: memory → Redis → filesystem → LLM API call → store in all tiers.

Configuration

yaml
# config/settings.yaml
cache:
  enabled: true
  tiers:
    - memory       # Always recommended
    - redis        # Optional (requires Redis server)
    - filesystem   # Always recommended
  
  redis:
    host: localhost
    port: 6379
    db: 0
    password: null
  
  ttl: 3600  # 1 hour (seconds)
  
  filesystem:
    directory: .cache
    max_size_mb: 100

Usage

Automatic Caching (via AsyncLLMClient):

python
async with AsyncLLMClient() as client:
    # First call: cache MISS (calls LLM)
    response1 = await client.generate_content_async(prompt)
    
    # Second call with same prompt: cache HIT (<1ms)
    response2 = await client.generate_content_async(prompt)

Manual Cache Control:

python
from trinity.utils.cache_manager import CacheManager

cache = CacheManager()
cache.set("my-key", {"data": "value"}, ttl=3600)
value = cache.get("my-key")  # Returns dict or None
cache.delete("my-key")
cache.clear_all()

Cache Key Generation

Cache keys are derived from:

  • Prompt content (hashed)
  • Model name
  • Provider
  • Temperature/top_p settings
python
import hashlib

def generate_cache_key(prompt, model, provider):
    content = f"{provider}:{model}:{prompt}"
    return hashlib.sha256(content.encode()).hexdigest()[:16]

Redis Setup (Optional)

bash
# macOS
brew install redis && brew services start redis

# Ubuntu/Debian
sudo apt-get install redis-server && sudo systemctl start redis

# Docker
docker run -d -p 6379:6379 redis:7-alpine

# Verify
redis-cli ping  # Should return "PONG"

Structured Logging

Overview

Trinity uses structured logging:

  • Development: Human-readable colored output (default)
  • Production (TRINITY_ENV=Production): JSON format for log aggregation

Configuration

yaml
# config/logging.yaml
default_profile: development

profiles:
  development:
    level: DEBUG
    format: human
  
  production:
    level: INFO
    format: json
  
  testing:
    level: WARNING
    format: json

Usage

python
from trinity.utils.structured_logger import get_logger

logger = get_logger(__name__)

# Simple message
logger.info("server_started")

# With structured context
logger.info("request_processed", extra={
    "method": "POST",
    "path": "/generate",
    "duration_ms": 234,
    "status_code": 200
})

JSON Output (Production)

json
{
  "timestamp": "2025-01-27T12:34:56.789Z",
  "level": "INFO",
  "logger": "trinity.main",
  "message": "build_started",
  "correlation_id": "550e8400-e29b-41d4-a716-446655440000"
}

Log Profiles

bash
# Development (human-readable)
LOG_PROFILE=development python main.py

# Production (JSON)
LOG_PROFILE=production python main.py > logs/trinity.log

# Testing (minimal output)
LOG_PROFILE=testing pytest

Makefile Commands

bash
make logs              # View all logs
make logs-json         # View JSON logs
make logs-errors       # View only errors
make logs-analyze      # Analyze with jq
make logs-clear        # Clear all logs

Docker

Quick Start

bash
make docker-build
make docker-run
docker-compose logs -f trinity
make docker-stop

docker-compose.yml

yaml
version: '3.8'

services:
  trinity:
    build:
      context: .
      dockerfile: Dockerfile.dev
    container_name: trinity-core
    environment:
      - LOG_PROFILE=production
      - CACHE_REDIS_HOST=redis
    volumes:
      - ./logs:/app/logs
      - ./output:/app/output
      - ./.cache:/app/.cache
    depends_on:
      - redis
    networks:
      - trinity-network

  redis:
    image: redis:7-alpine
    container_name: trinity-redis
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
    networks:
      - trinity-network

volumes:
  redis-data:

networks:
  trinity-network:
    driver: bridge

Environment Variables

bash
LOG_LEVEL=INFO
LOG_FORMAT=json
LOG_PROFILE=production
TRINITY_ENV=Production  # Enable JSON telemetry to stdout

CACHE_REDIS_HOST=redis
CACHE_REDIS_PORT=6379

LLM_PROVIDER=ollama
LLM_API_URL=http://host.docker.internal:11434
LLM_MODEL=llama3.2:3b

Makefile Reference

bash
# Setup
make install          # Install dependencies
make setup           # Full setup (venv + deps)

# Testing
make test            # Run all tests
make test-coverage   # With coverage report
make test-async      # Only async tests

# Code Quality
make format          # Format with black
make lint            # Lint with ruff
make type-check      # Type check with mypy

# Build
make build           # Build with default theme

# Cache
make cache-clear     # Clear all caches

# Logs
make logs            # View all logs
make logs-errors     # View only errors

# Docker
make docker-build    # Build image
make docker-run      # Run container

See Development → Setup for full Makefile reference.


Next Steps

Released under the MIT License.