Skip to content

How Senren Works

Senren uses a state-based synchronization architecture to solve multi-cloud infrastructure management.

The Core Insight

Instead of sending individual commands ("create this", "delete that"), the control plane publishes complete desired state. Regional planes compute diffs and reconcile.

This provides:

  • Idempotency: Apply the same state 100 times = same result
  • Eventual consistency: Network partitions don't break the system
  • Regional autonomy: Each region operates independently
  • Auditability: Full state history in Kafka

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     Control Plane                            │
│  ┌──────────────┐      ┌───────────┐      ┌──────────────┐ │
│  │ Python SDK   │─────▶│ gRPC API  │─────▶│ PostgreSQL   │ │
│  │ (your code)  │      │           │      │ (state store)│ │
│  └──────────────┘      └───────────┘      └──────────────┘ │
│                              │                               │
│                              ▼                               │
│                       ┌─────────────┐                        │
│                       │   Kafka     │                        │
│                       │ (messaging) │                        │
└───────────────────────┴─────────────┴────────────────────────┘
                ┌───────────────┼───────────────┐
                ▼               ▼               ▼
      ┌─────────────┐   ┌─────────────┐ ┌─────────────┐
      │AWS us-east-1│   │GCP us-cent1 │ │GCP eu-west1 │
      │Regional     │   │Regional     │ │Regional     │
      │Plane        │   │Plane        │ │Plane        │
      │             │   │             │ │             │
      │  ┌────────┐ │   │  ┌────────┐ │ │  ┌────────┐ │
      │  │K8s API │ │   │  │K8s API │ │ │  │K8s API │ │
      │  └────────┘ │   │  └────────┘ │ │  └────────┘ │
      │  ┌────────┐ │   │  ┌────────┐ │ │  ┌────────┐ │
      │  │Operator│ │   │  │Operator│ │ │  │Operator│ │
      │  └────────┘ │   │  └────────┘ │ │  └────────┘ │
      └─────────────┘   └─────────────┘ └─────────────┘

Components

1. Control Plane

The control plane is the single source of truth for desired state.

Responsibilities: - Receives gRPC requests from Python SDK - Stores desired state in PostgreSQL - Publishes complete RegionalState to Kafka (compacted topic) - Consumes status updates from regional planes

Key characteristics: - Single deployment (HA for production) - Stateful (PostgreSQL database) - Cloud-agnostic (runs anywhere)

2. Regional Planes

Each cloud provider + region combination has a regional plane.

Responsibilities: - Consumes RegionalState from Kafka - Lists current Kubernetes CRDs - Computes diff between desired and current state - Directly applies CRD changes to Kubernetes API - Polls CRD status every 5 seconds - Publishes status to Kafka via outbox pattern

Key characteristics: - Stateless (can restart and reconcile instantly) - Autonomous (operates independently if Kafka is available) - One per cloud/region/cluster combination

Example deployments: - aws-us-east-1-prod → AWS us-east-1 production cluster - aws-us-east-1-shadow → AWS us-east-1 shadow traffic cluster - gcp-us-central1-prod → GCP us-central1 production cluster

3. Controllers

Kubernetes operators that provision actual infrastructure.

Responsibilities: - Watch CRDs (RedisDatabase, etc.) - Provision infrastructure (StatefulSets, Services) - Update CRD .status with runtime info (ready, host, port)

Key characteristics: - One per resource type (Redis controller, Aerospike controller, etc.) - Standard Kubernetes controller pattern - Deployed in each regional cluster

Critical insight: Without controllers, CRDs exist but nothing actually runs. Controllers bridge CRDs to actual infrastructure.

Data Flow

Applying State (Control → Regional)

  1. Client sends gRPC request

    client.apply_state(databases=[...])
    

  2. Control plane stores in PostgreSQL - Inserts desired state into state table - Creates outbox entry for Kafka

  3. Outbox processor publishes to Kafka - Topic: {prefix}-state (compacted) - Key: region (e.g., aws:us-east-1) - Value: Complete RegionalState for that region

  4. Regional plane consumes Kafka message - Receives complete desired state - Lists current CRDs from Kubernetes - Computes diff: desired - current

  5. Regional plane applies to Kubernetes - Creates new CRDs - Updates modified CRDs - Deletes removed CRDs - Direct Kubernetes API calls (no outbox for this!)

  6. Controller provisions infrastructure - Watches CRD creation/updates - Creates StatefulSet + Service - Updates CRD .status.ready = true

Status Reporting (Regional → Control)

  1. Regional plane polls CRD status - Every 5 seconds - Reads .status from Kubernetes API

  2. Regional plane publishes via outbox - Inserts into regional PostgreSQL outbox - Outbox processor → Kafka - Topic: {prefix}-status

  3. Control plane consumes status - Updates status in PostgreSQL - Available via gRPC GetStatus API

Key Design Decisions

State-based, not event-based

Why? Event-based systems require tracking event history to reconstruct state. With state-based sync: - Regional planes don't need to track history - Can restart and immediately reconcile - Kafka topic compaction keeps only latest state (space-efficient)

Kafka for messaging

Why not direct database replication? - Kafka provides natural pub/sub (1 control → N regional planes) - Audit trail of all state changes - Decouples control and regional planes - Regional planes can be offline and catch up

Outbox pattern

Why? Ensures transactional consistency: - Control plane: DB write + Kafka publish must be atomic - Regional plane: Status update + Kafka publish must be atomic - Outbox pattern guarantees no lost messages

Exception: Regional plane → Kubernetes API doesn't use outbox because Kubernetes API is idempotent (same request = same result).

Regional autonomy

Why? Centralized orchestration doesn't scale: - Each regional plane operates independently - No central bottleneck - Network partition doesn't break regional operations - Can add new regions without control plane changes

Timing Expectations

Based on test-event-flow.md:

  1. Client → Control plane: Immediate (gRPC)
  2. Control plane → Kafka: ~100ms (outbox processing)
  3. Kafka → Regional plane: ~1s (consumer poll interval)
  4. Regional plane → Kubernetes: Immediate
  5. Controller → Infrastructure: ~5-10s (StatefulSet creation)
  6. CRD status update: Immediate (controller watch)
  7. Status → Kafka: ~100ms (outbox processing)
  8. Kafka → Control plane: ~1s (consumer poll interval)

Total latency (end-to-end): ~10-15 seconds from apply_state() to infrastructure ready.

Next Steps