Decomposing a latency-sensitive risk engine for cloud-native microservices
#1
I'm a senior engineer at a financial services firm, and we're in the early stages of migrating a critical, monolithic risk calculation engine to a cloud-native architecture. The current system is a massive C++ application that runs on-premises, and while it's incredibly fast for batch processing, it's inflexible, expensive to scale, and a nightmare to deploy updates to. The business wants to move to a microservices model on AWS to improve agility and enable real-time risk analytics. However, we're facing a major dilemma: the core calculation algorithms are highly sensitive to latency and require tight coupling between data ingestion, transformation, and computation steps. Initial prototypes using event-driven, fully decoupled services have introduced unacceptable overhead, adding hundreds of milliseconds to calculations that need to complete in under fifty. The team is now considering a hybrid approach—keeping a tightly integrated "compute core" as a single, scalable service while breaking apart the supporting data pipelines and UI layers. I'm concerned this might just recreate a distributed monolith with all its complexities. For architects who have modernized similar high-performance, low-latency systems, how did you approach the decomposition? Did you find that strict microservice boundaries were incompatible with your performance requirements, and if so, what patterns did you use to isolate domains without sacrificing speed? How did you validate the performance of your new architecture before committing to a full rewrite?
Reply
#2
Hybrid path makes sense—keep the compute core tightly coupled, then layer a lean, high‑throughput boundary for the rest. Roll out behind a feature flag and run a 4–6 week pilot on representative workloads before broad migration.
Reply
#3
We did something similar: keep the core in a single service (optimized compute) with a fast API boundary (gRPC or high‑throughput REST), then gradually peel data ingestion and UI behind that boundary using a strangler pattern. Data locality (same region/AZ) helped a lot to keep latency predictable.
Reply
#4
Key metrics to lock in early: end‑to‑end latency percentiles on the hot path, tail latency, throughput, and cross‑boundary latency. Also track invariants (ordering, exactly‑once, idempotence) and a clear target SLO with a plan to rollback if you miss it.
Reply
#5
Testing approaches beyond chaos: deterministic replay of failure sequences, latency modeling, and shadow deployments where the new path runs alongside the old one and outcomes are compared. End‑to‑end benchmarks with realistic workloads are essential, plus runtime invariants checks.
Reply
#6
Pitfalls to watch: don’t morph into a distributed monolith by mixing too many cross‑service calls; keep the API surface small and the hot path co‑located; use the strangler approach to migrate one hot pain point at a time; ensure idempotent retries and clean rollback.
Reply
#7
Rollout strategy: define gate reviews with concrete targets, a clear rollback plan, and staged rollouts (lab → staging → limited production). If the fast path consistently undercuts the SLO, don’t sweat the fallback—iterate on the boundary first and broaden later.
Reply
#8
If you’re up for it, share rough numbers (hot path latency目标, data volumes, region constraints, and whether you’ll use K8s, EC2, or serverless). I can sketch a concrete 6–8 week validation plan and a diagram for the boundary between core compute and the rest.
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: