New Relic's Infinite Tracing Processor is an implementation of the OpenTelemetry Collector tailsamplingprocessor. In addition to upstream features, it supports scalable and durabl distributed processing by using a distributed cache for shared state storage. This documentation how to configure it
Caches suportados
The processor supports any Redis-compatible cache implementation. It has been tested and validated with Redis and Valkey in both single-instance and cluster configurations. For production deployments, we recommend using cluster mode (sharded) to ensure high availability and scalability. To enable distributed caching, add the distributed_cache configuration to your tail_sampling processor section:
tail_sampling: distributed_cache: connection: address: redis://localhost:6379/0 password: 'local' trace_window_expiration: 30s # Default: how long to wait after last span before evaluating processor_name: "itc" # Nane of the processor data_compression: format: lz4 # Optional: compression format (none, snappy, zstd, lz4); lz4 recommendedImportante
Configuration behavior: When distributed_cache is configured, the processor automatically uses the distributed cache for state management. If distributed_cache is omitted entirely, the collector will use in-memory processing instead.
O parâmetro address deve especificar um endereço de servidor válido e compatível com o Redis, usando o formato padrão:
redis[s]://[[username][:password]@][host][:port][/db-number]Alternativamente, você pode incorporar as credenciais diretamente no parâmetro address :
tail_sampling: distributed_cache: connection: address: redis://:yourpassword@localhost:6379/0O processador é implementado em Go e utiliza a biblioteca cliente go-redis.
Parâmetro de configuração
A seção distributed_cache suporta o seguinte parâmetro:
Connection settings
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
connection.address | corda | obrigatório | Redis connection string (format: redis://host:port/db). For cluster mode, use comma-separated addresses (e.g., redis://node1:6379,redis://node2:6379) |
connection.password | corda | "" | Redis password for authentication |
Redis client timeouts and connection pool
All settings are optional and have defaults aligned with the 10s ingestion_response_timeout.
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
connection.dial_timeout | duration | 2s | Timeout for establishing new connections to Redis |
connection.read_timeout | duration | 500ms | Timeout for socket reads. Commands fail with timeout error if exceeded |
connection.write_timeout | duration | 500ms | Timeout for socket writes. Commands fail with timeout error if exceeded |
connection.pool_timeout | duration | 3s | Time to wait for connection from pool if all connections are busy |
connection.pool_size | int | 20 | Maximum number of socket connections per Redis node |
connection.min_idle_conns | int | 5 | Minimum number of idle connections to maintain for quick reuse |
connection.max_idle_conns | int | 10 | Maximum number of idle connections to keep open |
connection.conn_max_idle_time | duration | 1m | Maximum time a connection may be idle before being closed |
connection.conn_max_lifetime | duration | 30m | Maximum time a connection may be reused before being closed |
connection.max_retries | int | 3 | Maximum number of command retries before giving up |
connection.min_retry_backoff | duration | 100ms | Minimum backoff between retries |
connection.max_retry_backoff | duration | 1.5s | Maximum backoff between retries (exponential backoff capped at this value) |
connection.max_redirects | int | 5 | Maximum number of redirects to follow in cluster mode |
Timeout alignment:
The default Redis client timeouts are aligned with the ingestion_response_timeout (default: 10s) to ensure Redis operations complete before workers timeout:
- Worst case calculation:
PoolTimeout(3s) + Operation(0.5s) + 3 retries × (0.5s + backoff) ≈ 7s less than 10s✅
Tuning guidelines:
- High-latency Redis (cross-region, VPN): Increase timeouts to 2-3x defaults (e.g., 1-1.5s read/write) and reduce
max_retriesto 2 - Very fast Redis (same host/rack): Can reduce timeouts further (e.g., 250ms) for faster failure detection
- High throughput: Increase
pool_sizeto 30-50 to avoid connection pool exhaustion - Unreliable network: Increase
max_retriesto 5-7 and adjust backoff settings
Cluster replica options
The connection.replica section controls cluster replica routing (cluster mode only).
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
connection.replica.read_only_replicas | bool | true | Enable routing read commands to replica nodes. Default is true for improved scalability. Set to false if strong read consistency is required. |
connection.replica.route_by_latency | bool | false | Route commands to the closest node based on latency (automatically enables read_only_replicas) |
connection.replica.route_randomly | bool | false | Route commands to a random node (automatically enables read_only_replicas) |
Dica
Replica read benefits: When running with a Redis cluster that has replica nodes, enabling replica reads distributes read load across both primary and replica nodes, significantly improving read throughput (2-3x) and reducing load on primary nodes.
Important considerations:
- Replication lag: Replicas may lag behind the primary by milliseconds to seconds
- Cluster-only: These options only work with Redis cluster deployments
- Read operations (
Get,LRange) may be served by replica nodes - Write operations (
SetNX,Del, Lua scripts) always route to primary nodes
Data compression
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
data_compression | corda | none | Compression algorithm for trace data. Options: none, snappy, zstd, lz4 |
Dica
Compression tradeoffs:
none: No CPU overhead, highest Redis memory usagesnappy: Fast compression/decompression, good compression ratiozstd: Best compression ratio, more CPU usagelz4: Very fast, moderate compression ratioEscolha com base no seu gargalo: largura de banda da rede e armazenamento Redis versus disponibilidade do processador (CPU).
Trace management
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
trace_window_expiration | duration | 30s | How long to wait for spans before evaluating a trace |
traces_ttl | duration | 5m | Time-to-live for trace data in Redis |
cache_ttl | duration | 30m | Time-to-live for sampling decisions |
processor_name | corda | "" | Optional processor name for Redis keys and metrics (useful for multi-tenant deployments) |
TTL guidelines:
traces_ttlshould be long enough to handle retries and late spanscache_ttlshould be much longer thantraces_ttlto handle late-arriving spans- Longer
cache_ttlreduces duplicate evaluations but increases Redis memory usage
Retry and recovery
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
max_retries | int | 2 | Maximum retry attempts for failed trace evaluations |
in_flight_timeout | duration | Same as trace_window_expiration | Timeout for in-flight batch processing before considered orphaned |
recover_interval | duration | 5s | How often to check for orphaned batches |
Importante
Orphan recovery: Orphaned batches occur when a collector crashes mid-evaluation. The orphan recovery process re-queues these traces for evaluation by another collector instance.
Evaluation settings
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
evaluation_interval | duration | 1s | How often to check for traces ready for evaluation |
max_traces_per_batch | int | 1000 | Maximum number of traces to evaluate per batch |
rate_limiter | bool | false | Enable blocking rate limiter for concurrent trace processing |
Rate limiter:
The rate_limiter option controls backpressure behavior when the concurrent trace limit (num_traces) is reached:
false(default): No rate limiting. The processor accepts traces without blocking, relying on Redis for storage. This is the recommended setting for most Redis deployments.true: Enables a blocking rate limiter that applies backpressure whennum_tracesconcurrent traces are being processed. New traces will block until a slot becomes available.
When to enable:
- High-memory environments where you want strict control over concurrent trace processing
- When Redis memory is constrained and you need to limit the rate of trace ingestion
- To prevent overwhelming downstream consumers with sudden traffic bursts
Partitioning
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
partitions | int | 6 | Number of partitions for load distribution across Redis |
partition_workers | int | 6 | Number of concurrent evaluation workers |
partition_buffer_max_traces | int | 10000 | Maximum traces buffered per partition before flushing (2 workers per partition process in parallel) |
Partitioning benefits:
Distributes load across multiple Redis key ranges
Enables parallel evaluation across multiple workers
Improves throughput in multi-collector deployments
Dica
Partition scaling: A partition is a logical shard of trace data in Redis that enables parallel processing and horizontal scaling. Traces are assigned to partitions using consistent hashing on the trace ID. Each partition can be processed independently and concurrently, enabling both vertical scaling (more CPU cores) and horizontal scaling (more collector instances).
Important:
partitionsshould be at least 3x times the number of Redis nodes needed for your workload.partition_workersshould typically be less than or equal to the number ofpartitions.
Ingestion settings
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
ingestion_workers | int | 6 | Number of goroutines processing traces from the shared ingestion channel |
ingestion_buffer_size | int | 10000 | Capacity of the shared ingestion channel for buffering incoming traces |
ingestion_channel_timeout | duration | 500ms | Maximum time to wait when sending traces to the ingestion channel. If exceeded, traces are dropped |
ingestion_response_timeout | duration | 10s | Maximum time to wait for a worker to process and respond. Prevents indefinite blocking if workers are stuck |
hashing_strategy | corda | rendezvous | Hashing algorithm for partition selection. Options: rendezvous (recommended, 3x faster) or consistent |
Ingestion architecture:
The processor uses a shared channel with configurable workers for trace ingestion:
- Incoming traces are sent to a shared buffered channel
- Multiple workers pull from the channel and route traces to appropriate partitions
- Workers hash trace IDs using the configured hashing strategy to determine partition assignment
Configuration guidelines:
Buffer Size: Should absorb traffic bursts. Recommended: 10k-60k traces
Workers: Number of concurrent goroutines processing traces. Typically 1-2 workers per partition is optimal
Channel Timeout: How long to wait if buffer is full. Short timeout (500ms) fails fast on saturation
Response Timeout: Protects against stuck workers. Default: 10s is appropriate for normal Redis operations
Hashing Strategy: Algorithm for determining trace partition assignment
rendezvous(default): Provides superior load distribution for 2-99 partitions. Best choice for typical deployments.consistent: Maintains performance when using 100+ partitions where rendezvous becomes slow. Trades slightly less optimal load distribution for better performance at scale.- Both strategies ensure the same trace always maps to the same partition (deterministic)
- Choose rendezvous for better load distribution (up to 99 partitions), consistent for performance at scale (100+)
Core configuration (applies to Redis mode)
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
num_traces | int | 50000 | Maximum concurrent processing traces |
policies | matriz | obrigatório | Sampling policy definitions |
Complete configuration example
processors: tail_sampling: num_traces: 5_000_000 distributed_cache: # Connection connection: address: "redis://redis-cluster:6379/0" password: "your-redis-password"
# Connection pool settings (optional - tune for your environment) pool_size: 30 read_timeout: 2s write_timeout: 2s pool_timeout: 5s max_retries: 5
# Replica read options (cluster mode only) replica: read_only_replicas: true # Default: enabled for improved scalability route_by_latency: true # Route to closest node (recommended)
# Compression data_compression: snappy
# Trace Management trace_window_expiration: 30s traces_ttl: 2m # 120s (allow extra time for retries) cache_ttl: 1h # 3600s (keep decisions longer) processor_name: "prod-cluster-1"
# Retry and Recovery max_retries: 3 in_flight_timeout: 45s recover_interval: 10s
# Evaluation evaluation_interval: 1s max_traces_per_batch: 10000 rate_limiter: false # Recommended for Redis mode
# Partitioning partitions: 8 partition_workers: 8 partition_buffer_max_traces: 1000
# Ingestion ingestion_workers: 12 # 1.5 workers per partition ingestion_buffer_size: 40000 # 40k trace buffer ingestion_channel_timeout: 500ms ingestion_response_timeout: 10s hashing_strategy: rendezvous # default, best for less than 100 partitions
# Sampling policies policies: - name: errors type: status_code status_code: {status_codes: [ERROR]} - name: slow-traces type: latency latency: {threshold_ms: 1000} - name: sample-10-percent type: probabilistic probabilistic: {sampling_percentage: 10}Trace evaluation
This section covers the parameters that control when traces are evaluated and how long data persists in Redis.
Evaluation timing and frequency
These parameters control when and how often the processor evaluates traces for sampling decisions:
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
evaluation_interval | duration | 1s | How often to check for traces ready for evaluation |
max_traces_per_batch | int | 1000 | Maximum number of traces to evaluate per batch |
partition_workers | int | 6 | Number of concurrent evaluation workers processing partitions |
How evaluation works:
- Every
evaluation_interval, workers check for traces that have been idle for at leasttrace_window_expiration - Up to
max_traces_per_batchtraces are pulled from Redis per evaluation cycle partition_workersevaluate batches concurrently across partitions
Tuning guidance:
- Faster decisions: Decrease
evaluation_interval(e.g., 500ms) for lower latency, but increases Redis load - Higher throughput: Increase
max_traces_per_batch(e.g., 5000-10000) to process more traces per cycle - More parallelism: Increase
partition_workersto match available CPU cores
TTL e expiração
The processor uses multiple TTL layers that work together to ensure traces are properly evaluated while managing Redis memory efficiently.
How TTL works in distributed mode
When using distributed_cache, the processor implements a multi-stage TTL system that differs from the in-memory processor:
Trace lifecycle stages:
- Collection phase: Spans arrive and are stored in Redis
- Evaluation phase: After
trace_window_expiration, the trace is ready for sampling decision - Retention phase: Trace data persists for
traces_ttlto handle retries and late spans - Cache phase: Sampling decisions persist for
cache_ttlto prevent duplicate evaluations
Importante
Key difference from in-memory mode: The trace_window_expiration parameter replaces decision_wait and implements a sliding window approach:
- Each time new spans arrive for a trace, the evaluation timer resets
- Traces with ongoing activity stay active longer than traces that have stopped receiving spans
- This dynamic behavior better handles real-world span arrival patterns
Why cascading TTLs matter:
The TTL hierarchy ensures data availability throughout the trace lifecycle:
cache_ttl(longest) handles late-arriving spans hours after evaluationtraces_ttl(medium) provides buffer for retries and orphan recoverytrace_window_expiration(shortest) controls when evaluation begins
Properly configured TTLs prevent data loss, duplicate evaluations, and incomplete traces while optimizing Redis memory usage.
Dica
Configuration principle: Each TTL should be significantly longer than the one before it (typically 5-10x). This creates safety buffers that account for processing delays, retries, and late-arriving data.
Hierarquia TTL e valores padrão
The processor uses a cascading TTL structure where each layer provides protection and buffer time for the layer below. Understanding these relationships is critical for reliable operation:
trace_window_expiration (30s) ↓ [trace ready for evaluation]in_flight_timeout (30s default) ↓ [evaluation completes or times out]traces_ttl (5m) ↓ [trace data deleted from Redis]cache_ttl (30m) ↓ [decision expires, late spans re-evaluated]1. Trace collection window: trace_window_expiration
Default: 30s | Config: distributed_cache.trace_window_expiration
- Purpose: Controls when a trace is ready for sampling evaluation
- Behavior: Sliding window that resets each time new spans arrive for a trace
- Example: If a trace receives spans at t=0s, t=15s, and t=28s, evaluation begins at t=58s (28s + 30s window)
Tuning guidance:
- Shorter values (15-20s): Faster sampling decisions, but risk of incomplete traces if spans arrive slowly
- Longer values (45-60s): More complete traces, but higher latency and memory usage
- Typical range: 20-45 seconds depending on your span arrival patterns
2. Batch processing timeout: in_flight_timeout
Default: Same as trace_window_expiration | Config: distributed_cache.in_flight_timeout
- Purpose: Maximum time a batch can be in processing before being considered orphaned
- Behavior: Prevents data loss if a collector crashes during evaluation
- Orphan recovery: Batches exceeding this timeout are automatically re-queued for evaluation by another collector
Tuning guidance:
Should be ≥
trace_window_expiration: Ensures enough time for normal evaluationIncrease if: Your evaluation policies are computationally expensive (complex OTTL, regex)
Monitor:
otelcol_processor_tail_sampling_sampling_decision_timer_latencyto ensure evaluations complete within this windowDica
Relationship with trace_window_expiration: Setting
in_flight_timeoutequal totrace_window_expirationworks well for most deployments. Only increase if you observe frequent orphaned batch recoveries due to slow policy evaluation.
3. Trace data retention: traces_ttl
Default: 5m | Config: distributed_cache.traces_ttl
- Purpose: How long trace span data persists in Redis after initial storage
- Behavior: Provides buffer time for retries, late spans, and orphan recovery
- Critical constraint: Must be significantly longer than
trace_window_expiration+in_flight_timeout
Recommended formula:
traces_ttl ≥ (trace_window_expiration + in_flight_timeout + max_retries × evaluation_interval) × 2Example with defaults:
traces_ttl ≥ (30s + 30s + 2 retries × 1s) × 2 = 124s ≈ 5m ✅Tuning guidance:
Memory-constrained: Use shorter TTL (2-3m) but risk losing data for very late spans
Late span tolerance: Use longer TTL (10-15m) to handle delayed span arrivals
Standard production: 5-10 minutes provides good balance
Importante
Too short = data loss: If
traces_ttlis too short, traces may be deleted before evaluation completes, especially during retries or orphan recovery. This results in partial or missing traces.
4. Decision cache retention: cache_ttl
Default: 30m | Config: distributed_cache.cache_ttl
- Purpose: How long sampling decisions (sampled/not-sampled) are cached
- Behavior: Prevents duplicate evaluation when late spans arrive after trace has been evaluated
- Critical constraint: Must be much longer than
traces_ttl
Recommended formula:
cache_ttl ≥ traces_ttl × 6Why much longer?
- Late-arriving spans can arrive minutes or hours after the trace completed
- Decision cache prevents re-evaluating traces when very late spans arrive
- Without cached decision, late spans would be evaluated as incomplete traces (incorrect sampling decision)
Tuning guidance:
- Standard production: 30m-2h balances memory usage and late span handling
- High late-span rate: 2-4h ensures decisions persist for very delayed data
- Memory-constrained: 15-30m minimum, but expect more duplicate evaluations
Memory impact:
Each decision: ~50 bytes per trace ID
At 10,000 spans/sec with 20 spans/trace → 500 traces/sec
30-minute cache: ~900,000 decisions × 50 bytes = ~45 MB
2-hour cache: ~3.6M decisions × 50 bytes = ~180 MB
Dica
Monitor cache effectiveness: Track
otelcol_processor_tail_sampling_early_releases_from_cache_decisionmetric. High values indicate the cache is preventing duplicate evaluations effectively.
TTL configuration examples
Low-latency, memory-constrained:
distributed_cache: trace_window_expiration: 20s in_flight_timeout: 20s traces_ttl: 2m cache_ttl: 15m evaluation_interval: 500ms max_traces_per_batch: 2000High-throughput, late-span tolerant:
distributed_cache: trace_window_expiration: 45s in_flight_timeout: 60s traces_ttl: 10m cache_ttl: 2h evaluation_interval: 1s max_traces_per_batch: 10000Balanced production (recommended):
distributed_cache: trace_window_expiration: 30s in_flight_timeout: 45s # Extra buffer for complex policies traces_ttl: 5m cache_ttl: 30m evaluation_interval: 1s max_traces_per_batch: 5000Retry and recovery
| Parâmetro | Tipo | Padrão | Descrição |
|---|---|---|---|
max_retries | int | 2 | Maximum retry attempts for failed trace evaluations |
recover_interval | duration | 5s | How often to check for orphaned batches |
Orphan recovery:
Orphaned batches occur when a collector crashes mid-evaluation. The orphan recovery process runs every recover_interval and:
- Identifies batches that have exceeded
in_flight_timeout - Re-queues these traces for evaluation by another collector instance
- Ensures no traces are lost due to collector failures
Tuning guidance:
- Increase
max_retries(3-5) if experiencing transient Redis errors - Decrease
recover_interval(2-3s) for faster recovery in high-availability environments - Monitor recovery metrics to identify if collectors are crashing frequently
Partitioning and scaling
Partitions are the key to achieving high throughput and horizontal scalability in Redis-based tail sampling. This section explains how partitions work and how to configure them for optimal performance.
What is a partition?
A partition is a logical shard of trace data in Redis that enables parallel processing and horizontal scaling. Think of partitions as separate queues where traces are distributed based on their trace ID.
Conceitos-chave:
Each partition maintains its own pending traces queue in Redis
Traces are assigned to partitions using a configurable hashing strategy (rendezvous or consistent) on the trace ID
Each partition can be processed independently and concurrently
Partitions enable both vertical scaling (more CPU cores) and horizontal scaling (more collector instances)
Cuidado
Important: Changing the number of partitions when there's a cluster already running will cause data loss, since traces cannot be located anymore with a different partition count.
How partitioning works
Incoming Traces | v┌─────────────────────────────┐│ Hashing Strategy │ trace_id → rendezvous or consistent hash│ (rendezvous by default) │└─────────────────────────────┘ | ├──────────┬──────────┬──────────┐ v v v v┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐│Partition│ │Partition│ │Partition│ │Partition││ 0 │ │ 1 │ │ 2 │ │ 3 ││ (Redis) │ │ (Redis) │ │ (Redis) │ │ (Redis) │└─────────┘ └─────────┘ └─────────┘ └─────────┘ | | | | v v v v┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐│ Worker │ │ Worker │ │ Worker │ │ Worker ││ 0 │ │ 1 │ │ 2 │ │ 3 ││(Goroutine)│(Goroutine)│(Goroutine)│(Goroutine)│└─────────┘ └─────────┘ └─────────┘ └─────────┘ | | | | └──────────┴──────────┴──────────┘ | v Sampled TracesFlow:
- Ingestion: Trace ID is hashed using the configured hashing strategy to determine partition assignment
- Storage: Trace data stored in Redis under partition-specific keys
- Evaluation: Worker assigned to that partition pulls and evaluates traces
- Concurrency: All partition workers run in parallel, processing different traces simultaneously
Hashing strategy
The processor supports two hashing algorithms for partition selection. The choice depends on the number of partitions:
| Strategy | Load Distribution | desempenho | Best For |
|---|---|---|---|
rendezvous (default) | Superior load balancing | Fast for up to 99 partitions | Standard deployments (2-99 partitions) - best load distribution for typical production workloads |
consistent | Good distribution | Maintains performance with 100+ partitions | Very large scale (100+ partitions) - preserves performance when rendezvous becomes slow |
Key characteristics:
- Both strategies are deterministic - the same trace always maps to the same partition
- Rendezvous provides better load distribution but becomes slow with 100+ partitions
- Consistent hashing maintains performance at high partition counts (100+)
- Choose based on partition count: rendezvous for better distribution (2-99), consistent for performance at scale (100+)
Standard configuration (most deployments):
distributed_cache: hashing_strategy: rendezvous # default, best load distribution for 2-99 partitions partitions: 8Very large scale configuration (100+ partitions):
distributed_cache: hashing_strategy: consistent # maintains performance with 100+ partitions partitions: 200Importante
Choosing the right strategy:
- Rendezvous (default): Use for deployments with up to 99 partitions. Provides superior load distribution for the vast majority of production workloads.
- Consistent: Use when scaling to 100+ partitions where rendezvous becomes slow. Trades slightly less optimal distribution for maintained performance at scale.
- Important: Once chosen, changing strategies requires clearing existing data as traces will map to different partitions.
Partition configuration parameters
Use partitions to control how many logical shards you have and partition_workers to set how many workers process them:
distributed_cache: partitions: 8 # Number of logical shards in Redis partition_workers: 8 # Number of workers processing partitionsWorker behavior:
- 8 partitions + 8 workers: Each worker processes one partition every
evaluation_interval✅ Balanced - 8 partitions + 16 workers: Each partition evaluated twice per interval (redundant, wastes resources)
- 8 partitions + 4 workers: Only half the partitions evaluated per interval (slower, but less CPU/Redis load)
Dica
Tuning tip: Setting fewer workers per instance (partition_workers < partitions) reduces stress on Redis and the collector, useful when running many collector instances.
Partition sizing guidelines
| Cenário | Partitions | Partition Workers | Reasoning |
|---|---|---|---|
| Desenvolvimento | 2-4 | 2-4 | Minimal overhead, easy debugging |
| Standard Production (15k spans/sec) | 4-8 | 4-8 | Balanced parallelism and Redis key count |
| High Volume (moe than 100k spans/sec) | 12-24 | 12-24 | Maximize throughput |
Importante
Important sizing rules:
partitionsshould be at least 3x the number of Redis nodes needed for your workloadpartition_workersshould typically be ≤partitions- Changing partition count loses existing data - traces cannot be located after partition count changes
Partition configuration examples
Single collector (4-core machine):
distributed_cache: partitions: 4 partition_workers: 4 partition_buffer_max_traces: 5000Multi-collector (3 instances, 8-core each):
distributed_cache: partitions: 12 # 3x more than single collector partition_workers: 6 # Each collector processes 6 partitions partition_buffer_max_traces: 10000High-volume (10+ collectors):
distributed_cache: partitions: 24 partition_workers: 4 # Fewer per collector to share load partition_buffer_max_traces: 20000Dimensionamento e desempenho
Cuidado
Critical bottlenecks: Redis performance for tail sampling is primarily constrained by CPU and network, not memory. Focus your sizing and optimization efforts on:
- Network throughput and latency between collectors and Redis
- CPU capacity for compression/decompression and Lua script execution
- Memory capacity (typically sufficient if CPU and network are properly sized)
Proper Redis instance sizing requires understanding your workload characteristics:
- Spans per second: Example assumes 10,000 spans/sec throughput
- Average span size: Example assumes 900 bytes (marshaled protobuf format)
1. CPU requirements
CPU is the primary bottleneck for Redis in tail sampling workloads due to:
Compression/decompression overhead:
- Every span is compressed before storage and decompressed on retrieval
snappyorlz4: ~5-15% CPU overhead per operationzstd: ~15-30% CPU overhead (higher compression ratio but more CPU intensive)- For 10,000 spans/sec, expect 1-2 CPU cores dedicated to compression alone
Lua script execution:
- Atomic batch operations use Lua scripts for consistency
- Scripts execute on a single Redis core (Redis is single-threaded per operation)
- High evaluation rates can saturate a single core
- Recommendation: Use Redis cluster mode to distribute Lua execution across multiple nodes
CPU sizing guidelines:
Single Redis instance: Minimum 4 vCPUs for 10,000 spans/sec with compression
Redis cluster: 3+ nodes with 4 vCPUs each for high availability and load distribution
Without compression: Reduce CPU requirements by ~30-40% but increase network and memory needs
Dica
Monitoring CPU: Watch for CPU saturation (more than 80% utilization) as the first indicator of scaling needs. If CPU-bound, either add cluster nodes or reduce compression overhead.
2. Network requirements
Network bandwidth and latency directly impact sampling throughput:
Bandwidth calculations:
For 10,000 spans/sec at 900 bytes per span:
- Ingestion traffic (collectors → Redis):
10,000 × 900 bytes = 9 MB/sec = ~72 Mbps - Evaluation traffic (Redis → collectors):
~9 MB/sec = ~72 Mbps(reading traces for evaluation) - Total bidirectional:
~18 MB/sec = ~144 Mbps
With 25% compression (snappy/lz4):
- Compressed traffic:
~108 Mbpsbidirectional
Network sizing guidelines:
Co-located (same datacenter/VPC): 1 Gbps network interfaces are sufficient for most workloads
Cross-region: Expect 10-50ms latency - increase timeouts and use compression to reduce bandwidth
Connection pooling: Default
pool_size: 20supports ~5,000-10,000 spans/sec. Increase to 30-50 for higher throughputImportante
Network is critical: Round-trip time between collectors and Redis directly impacts end-to-end sampling latency. Deploy Redis with low-latency network connectivity (less than 5ms) to collectors. Use cluster mode with replica reads to distribute network load.
3. Memory requirements
While memory is less constrained than CPU and network, proper sizing prevents evictions and ensures data availability.
Fórmula de estimativa de memória
Total Memory = (Trace Data) + (Decision Caches) + (Overhead)Trace data storage
Os dados de rastreamento são armazenados no Redis durante todo o período traces_ttl para suportar intervalos que chegam com atraso e recuperação trace :
Armazenamento por intervalo:
~900 bytes(protobuf serializado)Duração do armazenamento: Controlada por
traces_ttl(padrão: 1 hora)Janela de coleta ativa: Controlada por
trace_window_expiration(padrão: 30s)Fórmula:
Memory ≈ spans_per_second × traces_ttl × 900 bytesImportante
Janela ativa vs. retenção total: os traços são coletados durante uma janela ativa
~30-second(trace_window_expiration), mas persistem no Redis durante todo o período de 1 horatraces_ttl. Isso permite que o processador lide com intervalos que chegam com atraso e recupere traços órfãos. O dimensionamento do seu Redis deve levar em consideração todo o período de retenção, e não apenas a janela ativa.
Exemplo de cálculo: A 10.000 vãos/segundo com 1 hora traces_ttl:
10,000 spans/sec × 3600 sec × 900 bytes = 32.4 GBCom compressão lz4 (observamos uma redução de 25%):
32.4 GB × 0.75 = 24.3 GBNota: Este cálculo representa o principal consumidor de memória. O consumo real de memória do Redis pode ser ligeiramente maior devido aos caches de decisão e às estruturas de dados internas.
Decision cache storage
Ao usar distributed_cache, os caches de decisão são armazenados no Redis sem limites de tamanho explícitos. Em vez disso, o Redis usa sua política de remoção LRU nativa (configurada via maxmemory-policy) para gerenciar a memória. Cada ID trace requer aproximadamente 50 bytes de armazenamento:
Cache amostrado: gerenciado pelo Redis com remoção LRU.
Cache não amostrado: gerenciado pelo processo de remoção LRU do Redis.
Sobrecarga típica por ID trace :
~50 bytesDica
gerenciamento de memória: Configure Redis com
maxmemory-policy allkeys-lrupara permitir a remoção automática de entradas antigas do cache de decisão quando os limites de memória forem atingidos. As chaves do cache de decisão usam expiração baseada em TTL (controlada porcache_ttl) em vez de limites de tamanho fixos.
Batch processing overhead
- Fila de lotes atual: Mínima (IDs trace + pontuações no conjunto ordenado)
- Lotes a bordo:
max_traces_per_batch × average_spans_per_trace × 900 bytes
Exemplo de cálculo: 500 traços por lote (padrão) com 20 intervalos por trace em média:
500 × 20 × 900 bytes = 9 MB per batchO tamanho do lote influencia o uso de memória durante a avaliação. A memória de processamento em lote durante o voo é temporária e liberada após a conclusão do processamento.
Exemplo completo de dimensionamento
Workload parameters:
- Taxas de transferência: 10.000 spans/segundo
- Tamanho médio do intervalo: 900 bytes
- Período de armazenamento: 1 hora (
traces_ttl) - Deployment: Redis cluster with 3 nodes
Resource requirements:
| Recurso | Without Compression | With lz4 Compression (25% reduction) |
|---|---|---|
| CPU per node | 2-3 vCPUs | 3-4 vCPUs (compression overhead) |
| Network bandwidth | ~144 Mbps bidirectional | ~108 Mbps bidirectional |
| Memory (total) | ~40.5 GB + decision cache | ~30.4 GB + decision cache |
Memory breakdown with compression:
| Componente | memória necessária |
|---|---|
| dados de rastreamento (retenção de 1 hora) | 24,3 GB |
| Caches de decisão | Variável (gerenciada por LRU) |
| Processamento em lote | ~7 MB |
| Sobrecarga do Redis (25%) | ~6.1 GB |
| Total (mínimo) | ~30.4 GB + decision cache |
Recommended Redis cluster configuration:
# 3-node Redis cluster (e.g., AWS cache.r6g.xlarge)Nodes: 3vCPUs per node: 4Memory per node: 25 GB (75 GB total cluster)Network: 1 Gbps or betterRegion: Co-located with collectors (5ms latency)Importante
Sizing guidance:
- CPU-first approach: Size for CPU requirements first, then verify memory and network adequacy
- Cluster mode strongly recommended: Distributes CPU, network, and memory load across nodes
- Monitoring: Track CPU utilization, network throughput, and memory usage to identify bottlenecks
- Scaling: If CPU-bound (more than 70% utilization), add cluster nodes. If network-bound, enable compression or add nodes
- Buffer for spikes: Provision 20-30% additional capacity beyond steady-state requirements
Default configuration architecture
The default configuration values are designed for a reference deployment supporting 1 million spans per minute (~16,000 spans/sec):
Collector deployment:
- 3 collector instances
- 4 vCPUs per instance
- 8 GB RAM per instance
Redis cluster:
- 3 Redis instances (AWS cache.r6g.xlarge: 4 vCPUs, 25.01 GiB memory each)
- Configured as a cluster for high availability and load distribution
- Co-located with collectors for low-latency access
This reference architecture provides a starting point for production deployments. Adjust based on your actual throughput and latency requirements.
Metrics reference
The tail sampling processor emits the following metrics in Redis-distributed mode to help you monitor performance and diagnose issues.
Métricas disponíveis
| Nome da métrica | Dimensões | Descrição | Use Case |
|---|---|---|---|
otelcol_processor_tail_sampling_batches | partition, processor | Number of batch operations | Monitor batch processing rate across partitions |
otelcol_processor_tail_sampling_sampling_decision_timer_latency | partition, processor | Sampling decision timer latency (ms) | Track overall evaluation performance per partition |
otelcol_processor_tail_sampling_sampling_policy_evaluation_error | partition, processor | Policy evaluation error count | Detect policy configuration issues |
otelcol_processor_tail_sampling_count_traces_sampled | policy, decision, partition, processor | Count of traces sampled/not sampled per policy | Track per-policy sampling decisions |
otelcol_processor_tail_sampling_count_spans_sampled | policy, decision, partition, processor | Count of spans sampled/not sampled per policy | Span-level sampling statistics |
otelcol_processor_tail_sampling_global_count_traces_sampled | decision, partition, processor | Global count of traces sampled by at least one policy | Overall sampling rate monitoring |
otelcol_processor_tail_sampling_early_releases_from_cache_decision | sampled | Spans immediately released due to cache hit | Decision cache effectiveness |
otelcol_processor_tail_sampling_new_trace_id_received | partition, processor | Count of new traces received | Trace ingestion rate per partition |
otelcol_processor_tail_sampling_new_span_received | partition, processor | Count of new spans received | Span ingestion rate per partition |
otelcol_processor_tail_sampling_traces_dropped | partition, processor | Traces dropped due to saving errors | Error detection and troubleshooting |
otelcol_processor_tail_sampling_spans_dropped | partition, processor | Spans dropped due to saving errors | Error detection and troubleshooting |
otelcol_processor_tail_sampling_count_traces_deleted | deleted, partition, processor | Count of traces deleted from storage | Cleanup monitoring |
Dimension details
policy: Name of the sampling policy that made the decisionsampled: Whether the decision was to sample (true/false)decision: The sampling decision type (sampled,not_sampled,dropped)deleted: Whether deletion was successful (true/false)partition: Partition identifier (hex-encoded hash, e.g.,{a1b2c3d4...}) - ensures Redis Cluster hash tag compatibilityprocessor: Processor instance identifier (fromdistributed_cache.processor_nameconfig)
Dica
Partition identifiers: Partition values are deterministic SHA256 hashes of the partition index combined with the processor name. Check collector logs at startup to see the mapping of partition indices to hash values.
Requisitos de cache compatíveis com Redis
O processador utiliza o cache como armazenamento distribuído para os seguintes dados trace :
- atributo de rastreamento e extensão
- Dados trace ativos
- Cache de decisão de amostragem
O processador executa um script Lua para interagir atomicamente com o cache Redis. O suporte script Lua geralmente está habilitado por padrão em caches compatíveis com Redis. Nenhuma configuração adicional é necessária, a menos que você tenha desativado explicitamente esse recurso.