I'm a senior engineer at a financial services firm, and we're in the early stages of migrating a critical, monolithic risk calculation engine to a cloud-native architecture. The current system is a massive C++ application that runs on-premises, and while it's incredibly fast for batch processing, it's inflexible, expensive to scale, and a nightmare to deploy updates to. The business wants to move to a microservices model on AWS to improve agility and enable real-time risk analytics. However, we're facing a major dilemma: the core calculation algorithms are highly sensitive to latency and require tight coupling between data ingestion, transformation, and computation steps. Initial prototypes using event-driven, fully decoupled services have introduced unacceptable overhead, adding hundreds of milliseconds to calculations that need to complete in under fifty. The team is now considering a hybrid approach—keeping a tightly integrated "compute core" as a single, scalable service while breaking apart the supporting data pipelines and UI layers. I'm concerned this might just recreate a distributed monolith with all its complexities. For architects who have modernized similar high-performance, low-latency systems, how did you approach the decomposition? Did you find that strict microservice boundaries were incompatible with your performance requirements, and if so, what patterns did you use to isolate domains without sacrificing speed? How did you validate the performance of your new architecture before committing to a full rewrite?
You’re not alone. A pragmatic path is hybrid: keep the compute core tight, expose it through a fast boundary, and migrate other pieces behind that boundary with strangler steps. Set a strict latency budget on the boundary to prevent slowdowns.
Agree on a two-path approach: the compute core remains the hot path, while data ingestion, transformation, and UI live behind a lean, high‑throughput API boundary (think gRPC or a high‑throughput REST gateway). Keep hot data co-located in the same region/AZ and migrate one pain point at a time to avoid a big rewrite or a distributed monolith.
Testing beyond chaos engineering matters here: build a latency model for the hot path, run deterministic failure replay, and use shadow deployments where the new logic runs in parallel with the old one. End‑to‑end benchmarks with realistic workloads are essential, plus invariants checks (ordering, exactly-once, idempotence).
Key metrics to lock in early: end‑to‑end latency percentiles (p95/p99), tail latency under peak load, throughput, CPU/memory/Gc impact, cross‑service latency, data locality, and rollout KPIs like time-to-detect and rollback frequency. Tie decisions to concrete SLOs and a clear rollback plan.
Patterns and pitfalls: avoid turning the boundary into a new bottleneck or creating a distributed monolith by keeping cross‑service calls lean and well-scoped. Use the strangler pattern to migrate hot pain points first, maintain a single source of truth for critical state, and ensure idempotent retries and a safe rollback path. Consider a two-tier architecture: a tightly integrated compute core plus a separate data/UX layer behind a minimal boundary.
If you want, share rough numbers (latency budgets, data volumes, region constraints, stack—Kubernetes, EC2, or serverless) and I’ll sketch a concrete 4–6 week validation plan and an architecture diagram for the boundary between core compute and the rest.