Case Study: Payments Platform Re-platform

High-level architecture: clients → API Gateway → gRPC services on Kubernetes
Grafana dashboard showing error rates, latency, and saturation
OpenAPI contract and generated SDKs in a mono-repo layout
CI/CD pipeline stages: lint, test, security scan, build, deploy, canary

From monolith REST to Go + gRPC with a REST gateway

A fintech client’s Node.js monolith was hitting latency and reliability limits during peak settlement windows. RDG Engineering led a re-platform to a service-oriented, contract-first API using Go and gRPC with a REST gateway for web/mobile.

Throughput increase
-63%
P95 latency
99.95%
SLO achieved
Problem
  • Batch spikes caused queue backlogs and timeouts during settlements.
  • Coupled modules made deployments risky; rollbacks were slow.
  • Limited visibility: logs only, no traces; alert fatigue from noisy rules.
Approach
  • Contract-first: OpenAPI + Proto as the source of truth; generated SDKs for web, Android, iOS.
  • Service boundaries: Payment Intent, Capture, Refund, and Reconciliation separated with clear ownership and SLOs.
  • Delivery: GitHub Actions → OPA policy gates → progressive delivery (canary/auto-rollback) on Kubernetes.
  • Observability: OpenTelemetry traces, RED/Saturation metrics in Prometheus; actionable alerts only.
  • Security by design: per-service IAM, mTLS between services, SBOM + image signing, secrets via Vault.
Stack

Languages: Go, TypeScript

Protocols: gRPC (internal), REST (public), async events (Kafka)

Infra: Kubernetes, Helm, Terraform, GitOps

Obs: OpenTelemetry, Prometheus, Tempo/Jaeger, Loki, Grafana

Result: Peak hour error budget stayed green for 3 consecutive quarters; deployment frequency increased from weekly to multiple times per day.

Project information

  • Category Platform & APIs
  • Client Pinstripe Pay (fictional)
  • Project date Jan–Jun 2025
  • Scope Re-platform, SRE, DevSecOps
  • Public docs API gateway spec
  • Back to Work