Decomposing a latency-sensitive risk engine with a compute core
#1
I'm a senior engineer at a financial services firm, and we're in the early stages of migrating a critical, monolithic risk calculation engine to a cloud-native architecture. The current system is a massive C++ application that runs on-premises, and while it's incredibly fast for batch processing, it's inflexible, expensive to scale, and a nightmare to deploy updates to. The business wants to move to a microservices model on AWS to improve agility and enable real-time risk analytics. However, we're facing a major dilemma: the core calculation algorithms are highly sensitive to latency and require tight coupling between data ingestion, transformation, and computation steps. Initial prototypes using event-driven, fully decoupled services have introduced unacceptable overhead, adding hundreds of milliseconds to calculations that need to complete in under fifty. The team is now considering a hybrid approach—keeping a tightly integrated "compute core" as a single, scalable service while breaking apart the supporting data pipelines and UI layers. I'm concerned this might just recreate a distributed monolith with all its complexities. For architects who have modernized similar high-performance, low-latency systems, how did you approach the decomposition? Did you find that strict microservice boundaries were incompatible with your performance requirements, and if so, what patterns did you use to isolate domains without sacrificing speed? How did you validate the performance of your new architecture before committing to a full rewrite?
Reply
#2
You're not alone. In latency‑sensitive finance work, the hybrid pattern tends to pay off: keep the compute core as the hot path and wrap everything else in a lean, high‑throughput boundary. Then migrate in sausage slices (strangler pattern) so you never rewrite the whole thing at once. Crucial: keep hot data in the same region/AZ to minimize cross‑zone hops; set explicit latency budgets for the boundary and be prepared to flip to the safe path if you drift.
Reply
#3
From my experience, a six‑step approach helps: map hot data paths, draft a minimal API for the compute core, build a thin gateway (gRPC is nice), roll out behind a feature flag, run 4–6 week pilot with representative workloads, compare end‑to‑end latency and SLO adherence vs baseline, then decide. Keep the boundary simple and avoid cascading calls. If boundary latency hits, revert to safe path automatically.
Reply
#4
Be careful about turning the boundary into a new bottleneck. The goal is to separate non‑hot data infra and UI, not to create a second monolith. Design for bounded contexts, guarantee data locality, and ensure a single source of truth for the core state. Use idempotent retries, avoid cross‑service side effects, and implement rollback/backpressure if the boundary can't meet latency budgets. Also plan for observability: trace, metrics, logs across boundary layer.
Reply
#5
Two quick questions to tailor: what’s your boundary latency budget (target p95/p99) and are you planning to cross region reads/writes? Are you leaning Kubernetes, EC2, or serverless for the boundary? Sharing rough numbers helps craft a more concrete plan.
Reply
#6
Testing beyond chaos is essential: do latency modeling, deterministic replay, shadow deployments, and end‑to‑end benchmarks with realistic workloads. Instrument invariants like exactly-once semantics and order to prevent drift.
Reply
#7
One more tip: celebrate small wins and keep the old monolith alive until you have confidence. A kill switch and staged rollout with a clear rollback plan help staff trust the path.
Reply


[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Forum Jump: