MultiHub Forum

Full Version: Balancing microservices and a compute core for latency-sensitive risk
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm a senior engineer at a financial services firm, and we're in the early stages of migrating a critical, monolithic risk calculation engine to a cloud-native architecture. The current system is a massive C++ application that runs on-premises, and while it's incredibly fast for batch processing, it's inflexible, expensive to scale, and a nightmare to deploy updates to. The business wants to move to a microservices model on AWS to improve agility and enable real-time risk analytics. However, we're facing a major dilemma: the core calculation algorithms are highly sensitive to latency and require tight coupling between data ingestion, transformation, and computation steps. Initial prototypes using event-driven, fully decoupled services have introduced unacceptable overhead, adding hundreds of milliseconds to calculations that need to complete in under fifty. The team is now considering a hybrid approach—keeping a tightly integrated "compute core" as a single, scalable service while breaking apart the supporting data pipelines and UI layers. I'm concerned this might just recreate a distributed monolith with all its complexities. For architects who have modernized similar high-performance, low-latency systems, how did you approach the decomposition? Did you find that strict microservice boundaries were incompatible with your performance requirements, and if so, what patterns did you use to isolate domains without sacrificing speed? How did you validate the performance of your new architecture before committing to a full rewrite?
You’re not alone. In projects with latency‑sensitive finance workloads, the pattern we found most tenable is a hybrid: keep a tightly integrated “compute core” as the default, but expose it through a low‑latency face layer so you can progressively decompose the surrounding data ingestion, transformation, and analytics into microservices without forcing the core compute path to detour. Concretely:
- Keep the essential compute engine as a single, high‑performance service (same language, optimized data structures, pinned CPU/memory if you’re on‑prem or dedicated VM in the cloud).
- Add a thin, high‑throughput API boundary (gRPC or high‑performance REST) so other components can feed it without crossing into the compute logic.
- Decompose non‑critical data pipelines and UI/UX layers into separate services behind the same boundary, using a strangler‑pattern approach so you can migrate features one by one without a big rewrite.
- Ensure data locality where possible: keep hot data and the compute core in the same region and, if needed, same VM/AWS AZ to minimize cross‑zone latency.
- Implement a clean fallback path: if cross‑service latency spikes, gracefully route to the conservative path (or batch mode) without corrupting state.