I'm working on a machine learning project and implemented k fold cross validation to get a better sense of my model's performance. The variance across the different folds is pretty high, though. Does that mean my model is unstable, or is my dataset just too small for this method to be reliable?
High variance across folds isn’t a red flag so much as a signal that your estimate is unstable, especially with a small dataset. Try using k-fold cross validation with stratification and repeat it several times; report the distribution (median and IQR or a CI) rather than a single number.
Double check that you’re not leaking information between folds. For example, scale or normalize data inside each train fold, not on the whole set, and only use training data to fit the scaler and any feature selection.
Consider simplifying the model or adding regularization if variance is due to overfitting. A more complex model on tiny data tends to swing a lot between folds.
Look at the class distribution and how it shows up in each fold. If one fold has many rare cases and another doesn’t, that will tilt results. Stratified folds help, but you might also want to look at a different split ratio or more folds.
Compute a proper uncertainty: 95% CI for the performance across folds, or use a bootstrap approach. This helps distinguish true signal from noise.
If data is limited, leave one or two folds as a baseline hold-out to get an independent check on final performance. Use cross validation for model selection but keep the final evaluation on a separate test set.
If you want, tell me size of dataset, task type (classification/regression), model, and how many folds you used; I can suggest a tuned plan and a quick checklist to debug.