LLM Response Caching

Multi-tier caching to avoid redundant LLM API calls

Trinity implements a 3-tier cache for LLM responses to avoid repeating identical API calls across builds.

Overview

LLM API calls are slow (2-5 seconds) and potentially costly for cloud providers. When the same prompt is used in multiple builds, caching avoids redundant calls.

Cache tiers:

Memory cache: <1ms access (in-process LRU, cleared on restart)
Redis cache: 5-10ms access (distributed, optional - requires Redis)
Filesystem cache: 20-50ms access (persistent, .cache/ directory)

Architecture

CacheManager (Unified API)
          │
┌─────────┼─────────┐
▼         ▼         ▼
Memory   Redis    File
Tier    (opt.)   .cache/
LRU              persist

Lookup order:

Check memory cache (fastest)
If miss, check Redis cache (if configured)
If miss, check filesystem cache
If miss, call LLM API
Store result in all enabled cache tiers

Cache Tiers

Tier 1: Memory Cache

python

from cachetools import LRUCache

class MemoryCache:
    def __init__(self, max_size: int = 100):
        self.cache = LRUCache(maxsize=max_size)
    
    def get(self, key: str) -> Optional[Any]:
        return self.cache.get(key)
    
    def set(self, key: str, value: Any) -> None:
        self.cache[key] = value

Speed: <1ms
Capacity: 100 entries (configurable)
Persistence: cleared on restart
Scope: single process

Tier 2: Redis Cache (optional)

python

import redis
import json

class RedisCache:
    def __init__(self, host="localhost", port=6379, db=0, ttl=3600):
        self.redis = redis.Redis(host=host, port=port, db=db, decode_responses=True)
        self.ttl = ttl
    
    def get(self, key: str) -> Optional[Any]:
        value = self.redis.get(key)
        return json.loads(value) if value else None
    
    def set(self, key: str, value: Any) -> None:
        self.redis.setex(key, self.ttl, json.dumps(value))

Speed: 5-10ms
Persistence: configurable
Scope: shared across processes/servers
Requires: running Redis server

Tier 3: Filesystem Cache

python

import json
from pathlib import Path
import hashlib

class FilesystemCache:
    def __init__(self, cache_dir: Path = Path(".cache")):
        self.cache_dir = cache_dir
        self.cache_dir.mkdir(exist_ok=True)
    
    def _get_path(self, key: str) -> Path:
        key_hash = hashlib.sha256(key.encode()).hexdigest()
        return self.cache_dir / f"{key_hash}.json"
    
    def get(self, key: str) -> Optional[Any]:
        path = self._get_path(key)
        if path.exists():
            with open(path) as f:
                return json.load(f)
        return None
    
    def set(self, key: str, value: Any) -> None:
        path = self._get_path(key)
        with open(path, 'w') as f:
            json.dump(value, f)

Speed: 20-50ms
Persistence: survives restarts
Scope: single machine

Cache Key Generation

Cache keys are generated from:

Prompt content (hashed)
Model name
Provider
Generation parameters (temperature, top_p)

python

import hashlib
import json

def generate_cache_key(prompt, model, provider, temperature=0.7, top_p=0.9):
    key_data = {
        "prompt": prompt,
        "model": model,
        "provider": provider,
        "temperature": temperature,
        "top_p": top_p
    }
    key_string = json.dumps(key_data, sort_keys=True)
    key_hash = hashlib.sha256(key_string.encode()).hexdigest()
    return f"llm:{key_hash[:16]}"

Implementation

CacheManager

python

class CacheManager:
    def __init__(self, enabled_tiers=None, ttl=3600):
        self.enabled_tiers = enabled_tiers or [CacheTier.MEMORY, CacheTier.FILESYSTEM]
        self.memory = MemoryCache() if CacheTier.MEMORY in self.enabled_tiers else None
        self.redis = RedisCache(ttl=ttl) if CacheTier.REDIS in self.enabled_tiers else None
        self.filesystem = FilesystemCache() if CacheTier.FILESYSTEM in self.enabled_tiers else None
    
    def get(self, key: str) -> Optional[Any]:
        if self.memory:
            value = self.memory.get(key)
            if value is not None:
                return value
        
        if self.redis:
            value = self.redis.get(key)
            if value is not None:
                if self.memory:
                    self.memory.set(key, value)
                return value
        
        if self.filesystem:
            value = self.filesystem.get(key)
            if value is not None:
                if self.memory:
                    self.memory.set(key, value)
                if self.redis:
                    self.redis.set(key, value)
                return value
        
        return None
    
    def set(self, key: str, value: Any) -> None:
        if self.memory:
            self.memory.set(key, value)
        if self.redis:
            self.redis.set(key, value)
        if self.filesystem:
            self.filesystem.set(key, value)

Integration with LLM Client

Caching is applied automatically in the async LLM client:

python

async def generate_content_async(self, prompt, model="llama3.2:3b", use_cache=True):
    cache_key = generate_cache_key(prompt, model, self.provider)
    
    if use_cache:
        cached = self.cache_manager.get(cache_key)
        if cached is not None:
            return cached
    
    response = await self._call_llm(prompt, model)
    
    if use_cache:
        self.cache_manager.set(cache_key, response)
    
    return response

Configuration

yaml

# config/settings.yaml
cache:
  enabled: true
  
  tiers:
    - memory       # Always recommended
    - redis        # Optional (requires Redis server)
    - filesystem   # Always recommended
  
  memory:
    max_size: 100
  
  redis:
    host: localhost
    port: 6379
    db: 0
    password: null
    ttl: 3600
  
  filesystem:
    directory: .cache
    max_size_mb: 100

Environment Variables

bash

export CACHE_ENABLED=true
export CACHE_TTL=3600
export CACHE_REDIS_HOST=localhost
export CACHE_REDIS_PORT=6379
export CACHE_REDIS_DB=0

Cache Management

bash

# Clear all cache tiers
make cache-clear

# Or manually
rm -rf .cache/
redis-cli FLUSHDB  # if using Redis

Redis Setup (Optional)

bash

# macOS
brew install redis && brew services start redis

# Ubuntu/Debian
sudo apt-get install redis-server && sudo systemctl start redis

# Docker
docker run -d -p 6379:6379 redis:7-alpine

# Verify
redis-cli ping  # Returns "PONG"

Security Considerations

Do not cache API keys or credentials
Use Redis AUTH if exposing Redis externally
Validate cached responses if security is critical

Troubleshooting

Cache not working:

python

from trinity.utils.cache_manager import CacheManager

cache = CacheManager()
print(cache.enabled_tiers)

Redis connection issues:

bash

redis-cli ping  # Should return "PONG"

High memory usage:

yaml

cache:
  memory:
    max_size: 50  # Reduce from default 100

Next Steps

Async & MLOps - Async LLM client
Setup Guide - Installation
Self-Healing - Layout validation

LLM Response Caching ​

Overview ​

Architecture ​

Cache Tiers ​

Tier 1: Memory Cache ​

Tier 2: Redis Cache (optional) ​

Tier 3: Filesystem Cache ​

Cache Key Generation ​

Implementation ​

CacheManager ​

Integration with LLM Client ​

Configuration ​

Environment Variables ​

Cache Management ​

Redis Setup (Optional) ​

Security Considerations ​

Troubleshooting ​

Next Steps ​

LLM Response Caching

Overview

Architecture

Cache Tiers

Tier 1: Memory Cache

Tier 2: Redis Cache (optional)

Tier 3: Filesystem Cache

Cache Key Generation

Implementation

CacheManager

Integration with LLM Client

Configuration

Environment Variables

Cache Management

Redis Setup (Optional)

Security Considerations

Troubleshooting

Next Steps