Skip to content

LLM Response Caching

Multi-tier caching to avoid redundant LLM API calls

Trinity implements a 3-tier cache for LLM responses to avoid repeating identical API calls across builds.


Overview

LLM API calls are slow (2-5 seconds) and potentially costly for cloud providers. When the same prompt is used in multiple builds, caching avoids redundant calls.

Cache tiers:

  • Memory cache: <1ms access (in-process LRU, cleared on restart)
  • Redis cache: 5-10ms access (distributed, optional - requires Redis)
  • Filesystem cache: 20-50ms access (persistent, .cache/ directory)

Architecture

CacheManager (Unified API)

┌─────────┼─────────┐
▼         ▼         ▼
Memory   Redis    File
Tier    (opt.)   .cache/
LRU              persist

Lookup order:

  1. Check memory cache (fastest)
  2. If miss, check Redis cache (if configured)
  3. If miss, check filesystem cache
  4. If miss, call LLM API
  5. Store result in all enabled cache tiers

Cache Tiers

Tier 1: Memory Cache

python
from cachetools import LRUCache

class MemoryCache:
    def __init__(self, max_size: int = 100):
        self.cache = LRUCache(maxsize=max_size)
    
    def get(self, key: str) -> Optional[Any]:
        return self.cache.get(key)
    
    def set(self, key: str, value: Any) -> None:
        self.cache[key] = value
  • Speed: <1ms
  • Capacity: 100 entries (configurable)
  • Persistence: cleared on restart
  • Scope: single process

Tier 2: Redis Cache (optional)

python
import redis
import json

class RedisCache:
    def __init__(self, host="localhost", port=6379, db=0, ttl=3600):
        self.redis = redis.Redis(host=host, port=port, db=db, decode_responses=True)
        self.ttl = ttl
    
    def get(self, key: str) -> Optional[Any]:
        value = self.redis.get(key)
        return json.loads(value) if value else None
    
    def set(self, key: str, value: Any) -> None:
        self.redis.setex(key, self.ttl, json.dumps(value))
  • Speed: 5-10ms
  • Persistence: configurable
  • Scope: shared across processes/servers
  • Requires: running Redis server

Tier 3: Filesystem Cache

python
import json
from pathlib import Path
import hashlib

class FilesystemCache:
    def __init__(self, cache_dir: Path = Path(".cache")):
        self.cache_dir = cache_dir
        self.cache_dir.mkdir(exist_ok=True)
    
    def _get_path(self, key: str) -> Path:
        key_hash = hashlib.sha256(key.encode()).hexdigest()
        return self.cache_dir / f"{key_hash}.json"
    
    def get(self, key: str) -> Optional[Any]:
        path = self._get_path(key)
        if path.exists():
            with open(path) as f:
                return json.load(f)
        return None
    
    def set(self, key: str, value: Any) -> None:
        path = self._get_path(key)
        with open(path, 'w') as f:
            json.dump(value, f)
  • Speed: 20-50ms
  • Persistence: survives restarts
  • Scope: single machine

Cache Key Generation

Cache keys are generated from:

  1. Prompt content (hashed)
  2. Model name
  3. Provider
  4. Generation parameters (temperature, top_p)
python
import hashlib
import json

def generate_cache_key(prompt, model, provider, temperature=0.7, top_p=0.9):
    key_data = {
        "prompt": prompt,
        "model": model,
        "provider": provider,
        "temperature": temperature,
        "top_p": top_p
    }
    key_string = json.dumps(key_data, sort_keys=True)
    key_hash = hashlib.sha256(key_string.encode()).hexdigest()
    return f"llm:{key_hash[:16]}"

Implementation

CacheManager

python
class CacheManager:
    def __init__(self, enabled_tiers=None, ttl=3600):
        self.enabled_tiers = enabled_tiers or [CacheTier.MEMORY, CacheTier.FILESYSTEM]
        self.memory = MemoryCache() if CacheTier.MEMORY in self.enabled_tiers else None
        self.redis = RedisCache(ttl=ttl) if CacheTier.REDIS in self.enabled_tiers else None
        self.filesystem = FilesystemCache() if CacheTier.FILESYSTEM in self.enabled_tiers else None
    
    def get(self, key: str) -> Optional[Any]:
        if self.memory:
            value = self.memory.get(key)
            if value is not None:
                return value
        
        if self.redis:
            value = self.redis.get(key)
            if value is not None:
                if self.memory:
                    self.memory.set(key, value)
                return value
        
        if self.filesystem:
            value = self.filesystem.get(key)
            if value is not None:
                if self.memory:
                    self.memory.set(key, value)
                if self.redis:
                    self.redis.set(key, value)
                return value
        
        return None
    
    def set(self, key: str, value: Any) -> None:
        if self.memory:
            self.memory.set(key, value)
        if self.redis:
            self.redis.set(key, value)
        if self.filesystem:
            self.filesystem.set(key, value)

Integration with LLM Client

Caching is applied automatically in the async LLM client:

python
async def generate_content_async(self, prompt, model="llama3.2:3b", use_cache=True):
    cache_key = generate_cache_key(prompt, model, self.provider)
    
    if use_cache:
        cached = self.cache_manager.get(cache_key)
        if cached is not None:
            return cached
    
    response = await self._call_llm(prompt, model)
    
    if use_cache:
        self.cache_manager.set(cache_key, response)
    
    return response

Configuration

yaml
# config/settings.yaml
cache:
  enabled: true
  
  tiers:
    - memory       # Always recommended
    - redis        # Optional (requires Redis server)
    - filesystem   # Always recommended
  
  memory:
    max_size: 100
  
  redis:
    host: localhost
    port: 6379
    db: 0
    password: null
    ttl: 3600
  
  filesystem:
    directory: .cache
    max_size_mb: 100

Environment Variables

bash
export CACHE_ENABLED=true
export CACHE_TTL=3600
export CACHE_REDIS_HOST=localhost
export CACHE_REDIS_PORT=6379
export CACHE_REDIS_DB=0

Cache Management

bash
# Clear all cache tiers
make cache-clear

# Or manually
rm -rf .cache/
redis-cli FLUSHDB  # if using Redis

Redis Setup (Optional)

bash
# macOS
brew install redis && brew services start redis

# Ubuntu/Debian
sudo apt-get install redis-server && sudo systemctl start redis

# Docker
docker run -d -p 6379:6379 redis:7-alpine

# Verify
redis-cli ping  # Returns "PONG"

Security Considerations

  • Do not cache API keys or credentials
  • Use Redis AUTH if exposing Redis externally
  • Validate cached responses if security is critical

Troubleshooting

Cache not working:

python
from trinity.utils.cache_manager import CacheManager

cache = CacheManager()
print(cache.enabled_tiers)

Redis connection issues:

bash
redis-cli ping  # Should return "PONG"

High memory usage:

yaml
cache:
  memory:
    max_size: 50  # Reduce from default 100

Next Steps

Released under the MIT License.