The Middleware Stack: Auth, Rate Limiting, and Idempotency

Every API request to 0fee.dev passes through a middleware stack before reaching a route handler. This stack handles three concerns: verifying that the caller is authorized, ensuring they are not exceeding their rate limit, and preventing duplicate payments when network issues cause retries. These are not optional features for a payment platform -- they are the foundation of trust between the platform and its merchants.

API Key Authentication

Key Format and Scopes

0fee.dev uses prefix-based API keys that encode environment and type information directly in the key string:

Prefix	Type	Environment	Usage
`sk_live_`	Secret key	Production	Server-side API calls
`sk_sand_`	Secret key	Sandbox	Server-side testing
`pk_live_`	Publishable key	Production	Client-side (checkout widget)
`pk_sand_`	Publishable key	Sandbox	Client-side testing

The prefix convention was borrowed from Stripe, and for good reason: it allows developers to instantly distinguish between key types. A pk_ key visible in JavaScript source code is not a security incident -- it is designed for client-side use with limited permissions.

Scope Enforcement

Each API key has a set of scopes that determine which operations it can perform:

python# middleware/auth.py

KEY_SCOPES = {
    "sk_": ["payments", "checkout", "apps", "webhooks", "customers"],
    "pk_": ["checkout", "payments:read"],
}

async def authenticate_api_key(
    request: Request,
) -> dict:
    """
    Extract and validate API key from Authorization header.

    Returns:
        Dict with app_id, environment, scopes, and key_type.
    """
    auth_header = request.headers.get("Authorization", "")

    if not auth_header.startswith("Bearer "):
        raise HTTPException(
            status_code=401,
            detail="Missing or invalid Authorization header"
        )

    api_key = auth_header[7:]  # Strip "Bearer "

    # Determine key type and environment from prefix
    if api_key.startswith("sk_live_"):
        key_type = "secret"
        environment = "production"
    elif api_key.startswith("sk_sand_") or api_key.startswith("sk_test_"):
        key_type = "secret"
        environment = "sandbox"
    elif api_key.startswith("pk_live_"):
        key_type = "publishable"
        environment = "production"
    elif api_key.startswith("pk_sand_") or api_key.startswith("pk_test_"):
        key_type = "publishable"
        environment = "sandbox"
    else:
        raise HTTPException(
            status_code=401,
            detail="Invalid API key format"
        )

    # Hash the key and look it up in the database
    key_hash = hashlib.sha256(api_key.encode()).hexdigest()

    with get_db() as conn:
        key_row = conn.execute(
            """
            SELECT ak.id, ak.app_id, ak.scopes, ak.is_active,
                   a.user_id, a.name as app_name, a.is_active as app_active
            FROM api_keys ak
            JOIN apps a ON ak.app_id = a.id
            WHERE ak.key_hash = ?
            """,
            (key_hash,)
        ).fetchone()

    if not key_row:
        raise HTTPException(status_code=401, detail="Invalid API key")

    if not key_row["is_active"]:
        raise HTTPException(status_code=401, detail="API key has been revoked")

    if not key_row["app_active"]:
        raise HTTPException(status_code=401, detail="Application is inactive")

    return {
        "app_id": key_row["app_id"],
        "user_id": key_row["user_id"],
        "environment": environment,
        "key_type": key_type,
        "scopes": json.loads(key_row["scopes"]),
        "app_name": key_row["app_name"],
    }

Session Token Auth for Dashboard

Dashboard users do not use API keys. Instead, they authenticate with email/password and receive a session token stored in DragonflyDB:

pythonasync def authenticate_session(request: Request) -> dict | None:
    """
    Authenticate a dashboard session from cookie or header.

    Returns:
        User dict if authenticated, None otherwise.
    """
    session_token = (
        request.cookies.get("zerofee_session")
        or request.headers.get("X-Session-Token")
    )

    if not session_token:
        return None

    # Look up session in DragonflyDB
    from cache import cache_client
    session_data = await cache_client.get(f"session:{session_token}")

    if not session_data:
        return None

    return json.loads(session_data)

The dual authentication system means API endpoints can be accessed by either mechanism. Route handlers check for API key auth first, then fall back to session auth:

pythonasync def get_auth_context(request: Request) -> dict:
    """Get authentication context from either API key or session."""
    # Try API key first
    auth_header = request.headers.get("Authorization", "")
    if auth_header.startswith("Bearer ") and (
        auth_header[7:].startswith("sk_") or
        auth_header[7:].startswith("pk_")
    ):
        return await authenticate_api_key(request)

    # Try session
    session = await authenticate_session(request)
    if session:
        return session

    raise HTTPException(status_code=401, detail="Authentication required")

Billing Suspension Check

After authentication, the middleware checks whether the merchant's account is suspended due to unpaid invoices:

pythonasync def check_billing_status(app_id: str):
    """
    Check if the app has unpaid invoices that trigger suspension.

    Returns 402 Payment Required if the account is suspended.
    """
    with get_db() as conn:
        unpaid = conn.execute(
            """
            SELECT COUNT(*) as count FROM invoices
            WHERE app_id = ?
              AND status = 'overdue'
              AND due_date < date('now', '-7 days')
            """,
            (app_id,)
        ).fetchone()

    if unpaid and unpaid["count"] > 0:
        raise HTTPException(
            status_code=402,
            detail={
                "error": "payment_required",
                "message": "Your account has overdue invoices. "
                           "Please settle your balance to continue "
                           "processing payments.",
                "dashboard_url": "https://dashboard.0fee.dev/billing",
            }
        )

The 402 status code -- Payment Required -- is rarely used in practice, but it is semantically perfect for this case. The merchant's API integration receives a clear signal that the issue is billing, not authentication or authorization.

Redis-Based Rate Limiting

The Sliding Window Algorithm

Rate limiting uses a sliding window algorithm implemented in DragonflyDB. Each API key gets a configurable request budget per time window:

python# middleware/rate_limit.py
from cache import cache_client

# Rate limits per key type
RATE_LIMITS = {
    "secret": {"requests": 1000, "window": 60},      # 1000/min
    "publishable": {"requests": 100, "window": 60},   # 100/min
    "session": {"requests": 500, "window": 60},        # 500/min
}

async def check_rate_limit(
    identifier: str,
    key_type: str = "secret"
) -> dict:
    """
    Check and enforce rate limit for an API key.

    Args:
        identifier: The API key prefix or session token
        key_type: "secret", "publishable", or "session"

    Returns:
        Dict with limit, remaining, and reset fields.

    Raises:
        HTTPException(429) if rate limit exceeded.
    """
    limits = RATE_LIMITS.get(key_type, RATE_LIMITS["secret"])
    max_requests = limits["requests"]
    window_seconds = limits["window"]

    cache_key = f"ratelimit:{identifier}"
    now = time.time()

    try:
        # Use Redis sorted set for sliding window
        pipe = cache_client.pipeline()

        # Remove entries older than the window
        pipe.zremrangebyscore(cache_key, 0, now - window_seconds)

        # Count entries in the current window
        pipe.zcard(cache_key)

        # Add the current request
        pipe.zadd(cache_key, {str(now): now})

        # Set TTL on the key
        pipe.expire(cache_key, window_seconds)

        results = await pipe.execute()
        current_count = results[1]

        remaining = max(0, max_requests - current_count - 1)
        reset_time = int(now + window_seconds)

        if current_count >= max_requests:
            raise HTTPException(
                status_code=429,
                detail="Rate limit exceeded",
                headers={
                    "X-RateLimit-Limit": str(max_requests),
                    "X-RateLimit-Remaining": "0",
                    "X-RateLimit-Reset": str(reset_time),
                    "Retry-After": str(window_seconds),
                },
            )

        return {
            "limit": max_requests,
            "remaining": remaining,
            "reset": reset_time,
        }

    except ConnectionError:
        # DragonflyDB is down -- fail open
        return {
            "limit": max_requests,
            "remaining": max_requests,
            "reset": int(now + window_seconds),
        }

Graceful Degradation

The most important design decision in the rate limiter is the except ConnectionError block at the bottom. When DragonflyDB is unavailable -- whether due to a restart, network issue, or crash -- the rate limiter fails open. Requests pass through unthrottled.

This is a deliberate trade-off. For a payment platform, a temporarily missing rate limiter is far less harmful than blocking legitimate payment requests. If DragonflyDB is down for 30 seconds, some extra requests might get through. If the rate limiter blocks payments because it cannot reach the cache, merchants lose revenue.

Rate Limit Headers

Every API response includes rate limit headers, following the standard convention:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1702234567

When the limit is exceeded, the response includes a Retry-After header:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1702234567
Retry-After: 60

{
    "error": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after 60 seconds."
}

Idempotency Key Handling

Why Idempotency Matters for Payments

Consider this scenario: a merchant's server sends a payment request to 0fee.dev. The request succeeds, and 5,000 XOF is charged to the customer's Orange Money account. But the HTTP response is lost due to a network timeout. The merchant's server, not knowing the payment succeeded, retries the request. Without idempotency handling, the customer is charged twice.

Idempotency keys prevent this. The merchant includes an Idempotency-Key header with a unique identifier (typically a UUID). If the same key is sent twice, the second request returns the cached response from the first -- no new payment is created.

Implementation

python# middleware/idempotency.py
from cache import cache_client
import json

IDEMPOTENCY_TTL = 86400  # 24 hours

async def check_idempotency(
    request: Request,
    app_id: str
) -> dict | None:
    """
    Check if this request has been seen before.

    Returns:
        Cached response if duplicate, None if new request.
    """
    idempotency_key = request.headers.get("Idempotency-Key")

    if not idempotency_key:
        return None  # No idempotency key -- process normally

    cache_key = f"idempotency:{app_id}:{idempotency_key}"

    try:
        cached = await cache_client.get(cache_key)

        if cached:
            return json.loads(cached)

    except ConnectionError:
        pass  # Cache unavailable -- process normally

    return None

async def store_idempotency_response( request: Request, app_id: str, response_data: dict ): """ Store the response for an idempotent request. """ idempotency_key = request.headers.get("Idempotency-Key") BLANK if not idempotency_key: return BLANK cache_key = f"idempotency:{app_id}:{idempotency_key}" BLANK try: await cache_client.setex( cache_key, IDEMPOTENCY_TTL, json.dumps(response_data) ) except ConnectionError: pass # Best effort -- cache unavailable ```

Usage in Payment Routes

The idempotency middleware wraps the payment creation endpoint:

python@router.post("/v1/payments")
async def create_payment(
    request: Request,
    data: PaymentInitiate,
    auth: dict = Depends(get_auth_context)
):
    # Check idempotency first
    cached_response = await check_idempotency(request, auth["app_id"])
    if cached_response:
        return cached_response

    # Check rate limit
    rate_info = await check_rate_limit(
        auth["app_id"], auth["key_type"]
    )

    # Check billing status
    await check_billing_status(auth["app_id"])

    # Process the payment
    result = await process_payment(data, auth)

    # Store idempotency response
    await store_idempotency_response(
        request, auth["app_id"], result
    )

    return result

Edge Cases

Several edge cases required careful handling:

Same key, different parameters. If a merchant sends the same idempotency key with different request bodies, the system returns the original response. Some implementations reject mismatched requests; 0fee.dev chose to return the cached response because the most common cause of mismatched retries is serialization differences, not intentional abuse.

Failed requests. If the original request failed (e.g., invalid payment method), the idempotency response stores the error. Retrying with the same key returns the same error. The merchant must use a new idempotency key to retry with corrected parameters.

TTL expiration. Idempotency keys expire after 24 hours. After that, the same key can be reused. This prevents the cache from growing indefinitely while providing a generous window for retries.

Cache unavailability. Like rate limiting, idempotency handling fails open when DragonflyDB is unavailable. This means duplicate payments are theoretically possible during cache outages. The webhook delivery system and reconciliation tasks serve as secondary safeguards.

The Full Middleware Chain

When a request arrives at 0fee.dev, it passes through this chain:

Request arrives
    |
    v
[1] Extract API key or session token
    |
    v
[2] Validate credentials (DB lookup)
    |
    v
[3] Check billing suspension
    |-- 402 Payment Required? -> Return error
    |
    v
[4] Check idempotency key
    |-- Duplicate? -> Return cached response
    |
    v
[5] Check rate limit
    |-- Exceeded? -> Return 429
    |
    v
[6] Route handler (business logic)
    |
    v
[7] Store idempotency response
    |
    v
[8] Add rate limit headers to response
    |
    v
Response sent

Each step can short-circuit the chain. A revoked API key never reaches the rate limiter. A duplicate request never reaches the business logic. A suspended account never processes a payment. The ordering is intentional: authentication is cheapest (single hash comparison), so it comes first. Business logic is most expensive, so it comes last.

This middleware stack is invisible to merchants -- they send requests and receive responses. But it is the mechanism that ensures 0fee.dev is secure, fair, and reliable. Without it, the API would be a direct pipe to payment providers with no guardrails.

This article is part of the "How We Built 0fee.dev" series. 0fee.dev is a payment orchestrator covering 53+ providers across 200+ countries, built by Juste A. GNIMAVO and Claude from Abidjan with zero human engineers. Follow the series for the complete build story.