MultiHub Forum

Full Version: Where do you draw the line between useful forecast metrics and math (MAE, RMSE)?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I’ve been working on a forecasting project at my job, and I keep hitting a wall when it comes to choosing the right error metrics to actually trust. My team argues about MAE vs. RMSE, but it feels like we’re missing something when the model looks good on paper but feels off in reality. How do you all navigate picking metrics that actually tell you if your forecast is useful, not just mathematically tidy?
Metric choice should start with the cost of being wrong. MAE and RMSE are helpful, but they’re only proxies. Map your forecast errors to business impact and pick a metric whose direction matches that impact. If costs are asymmetric, consider asymmetric loss or quantile based measures and report several metrics rather than a single number.
I want to see the error distribution not just the average. RMSE scales with outliers, MAE is more robust but hides where the big misses land. Plot residuals, check bias by bin, and examine forecast intervals. Have you looked at how often errors breach your tolerance band?
From a business lens forecast usefulness is about decisions not numbers. Run a backtest where you feed forecasts into a planner like inventory and staffing and measure outcomes such as stockouts capacity fill rate and service level. If those outcomes look good the metric is doing what you want even if MAE 2.1 and RMSE 3.4.
Try probabilistic forecasting and proper scoring rules. A single point metric misses the uncertainty your users care about. CRPS or interval coverage can tell you if your forecast intervals are trustworthy not just whether the mean is close. Does the team tolerate probabilistic outputs or just a single point?
I prefer robust loss options like Huber or quantile loss when the data have outliers or heavy tails. They reduce the lure of tidy RMSEs while keeping interpretability. Also, couple it with rolling cross validation so your metric isn’t chasing seasonality quirks.
Framing note that sometimes the best metric is the one that forces you to confront what you actually want to prevent. If you want early warnings rather than perfect accuracy tilt the objective toward alerting performance or error rate thresholds. It might flip how you build the model rather than just which numbers you report.