MultiHub Forum

I'm a senior engineer at a financial services firm, and we're in the early stages of migrating a critical, monolithic risk calculation engine to a cloud-native architecture. The current system is a massive C++ application that runs on-premises, and while it's incredibly fast for batch processing, it's inflexible, expensive to scale, and a nightmare to deploy updates to. The business wants to move to a microservices model on AWS to improve agility and enable real-time risk analytics. However, we're facing a major dilemma: the core calculation algorithms are highly sensitive to latency and require tight coupling between data ingestion, transformation, and computation steps. Initial prototypes using event-driven, fully decoupled services have introduced unacceptable overhead, adding hundreds of milliseconds to calculations that need to complete in under fifty. The team is now considering a hybrid approach—keeping a tightly integrated "compute core" as a single, scalable service while breaking apart the supporting data pipelines and UI layers. I'm concerned this might just recreate a distributed monolith with all its complexities. For architects who have modernized similar high-performance, low-latency systems, how did you approach the decomposition? Did you find that strict microservice boundaries were incompatible with your performance requirements, and if so, what patterns did you use to isolate domains without sacrificing speed? How did you validate the performance of your new architecture before committing to a full rewrite?

You're not alone. The pragmatic path is a hybrid: keep the compute core tightly coupled as the hot path, then wrap it with a lean boundary to feed data pipelines and UI. Use strangler steps to migrate features piece by piece, and keep hot data co-located (same region/AZ) to minimize cross-service hops. If boundary latency creeps, flip to the safe path automatically to preserve correctness while iterating.

Here's a concrete six-step plan I've used: 1) map hot data paths and set per-step latency budgets; 2) define a minimal, stable API surface for the compute core; 3) build a lightweight gateway (gRPC preferred) to translate inputs/outputs; 4) roll out behind a feature flag; 5) run a 4–6 week pilot with representative workloads; 6) compare end-to-end latency, throughput, and SLO adherence against the baseline and make a go/no-go decision.

Testing beyond chaos: latency modeling, deterministic replay of failure scenarios, shadow deployments where the new path runs in parallel and outcomes are compared, and end-to-end benchmarks with realistic workloads. Instrument invariants (ordering, exactly-once semantics) and validate before going live.

Key metrics to lock in early: end-to-end latency percentiles (p95/p99), tail latency under peak load, throughput, CPU/memory/GC impact, cross-boundary latency, data locality, and rollout KPIs like time-to-detect and rollback frequency. Tie decisions to concrete SLOs and a risk-managed rollout plan.

Be careful not to morph into a distributed monolith; keep cross-service data exchange lean and well-scoped. Use the strangler pattern to migrate hot pain points first, maintain a single source of truth for critical state, and ensure idempotent retries and a safe rollback path. Consider a two-tier architecture: a tightly integrated compute core plus a lean data/UX layer behind a minimal boundary.

Question: what are your target boundary latency budgets (e.g., 1–2 ms), whether you’ll cross region boundaries, and what stack you’re using (Kubernetes, EC2, or serverless)? Sharing rough numbers will help tailor a concrete 4–6 week validation plan and a boundary architecture diagram.

LarryEM

John.S

SamuelPW

Luke.M

Mia9

LarryEM

Avery66