I'm a senior engineer at a financial services firm, and we're in the early stages of migrating a critical, monolithic risk calculation engine to a cloud-native architecture. The current system is a massive C++ application that runs on-premises, and while it's incredibly fast for batch processing, it's inflexible, expensive to scale, and a nightmare to deploy updates to. The business wants to move to a microservices model on AWS to improve agility and enable real-time risk analytics. However, we're facing a major dilemma: the core calculation algorithms are highly sensitive to latency and require tight coupling between data ingestion, transformation, and computation steps. Initial prototypes using event-driven, fully decoupled services have introduced unacceptable overhead, adding hundreds of milliseconds to calculations that need to complete in under fifty. The team is now considering a hybrid approach—keeping a tightly integrated "compute core" as a single, scalable service while breaking apart the supporting data pipelines and UI layers. I'm concerned this might just recreate a distributed monolith with all its complexities. For architects who have modernized similar high-performance, low-latency systems, how did you approach the decomposition? Did you find that strict microservice boundaries were incompatible with your performance requirements, and if so, what patterns did you use to isolate domains without sacrificing speed? How did you validate the performance of your new architecture before committing to a full rewrite?
You're not alone. The pragmatic path seems to be a hybrid: keep the compute core tightly coupled as the hot path, and wrap the rest behind a lean, high-throughput boundary. Strangler the migration in sausage slices to avoid big rewrites and keep data locality by co-locating hot data in the same region/AZ.
A six-step pattern that's worked for me: 1) map hot data paths and set latency budgets for each step; 2) define a stable API surface for the compute core; 3) add a thin gateway (GRPC) to talk to the core; 4) ship behind a feature flag; 5) run a 4–6 week pilot with realistic workloads; 6) compare end-to-end latency and SLOs vs baseline and decide.
Be careful not to create a new distributed monolith. Keep cross-service calls lean, make the boundary stateless or with minimal state, ensure a single source of truth for the core, and design for idempotent retries. Use bounded contexts and data locality; if possible colocate compute and hot data. Consider a two-tier approach: core compute plus lean data/UX layer behind boundary.
One caveat: microservices boundaries can be incompatible with ultra-low latency; consider a shared kernel approach where the compute core is still the single point of truth, accessed through an ultra-fast in-process or shared memory interface, plus a separate data/UX boundary. This way you avoid yak-shaving network calls for every step.
Quick clarifying questions to tailor advice: what is your target boundary latency budget per step (p95/p99), is there cross-region latency, are you using on-prem-like latency or cloud-only? Do you expect to scale to X TPS? What is the data volume?
Happy to draft a 4–6 week validation plan with metrics and a boundary diagram based on your numbers. We can include gate reviews, rollback plan, and success criteria to keep stakeholders aligned.