Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework
Best AI papers explained - Podcast tekijän mukaan Enoch H. Kang - Perjantaisin

Kategoriat:
This paper proposes a new method for optimizing the data mixtures used to train large language models (LLMs). Traditional approaches often rely on costly trial and error or deterministic extrapolations that don't account for uncertainty, limiting their effectiveness and transferability. The authors introduce a multi-fidelity multi-scale Bayesian optimization framework, treating data curation as a sequential decision-making process where decisions about data mixture, model scale, and training duration are adaptively chosen to balance training costs and potential performance gains. This framework uses a probabilistic model to explicitly model performance uncertainty and allows for learning from less expensive, smaller-scale experiments to inform decisions for larger, more costly training runs. Empirical results show that this approach, even with simple implementations, can significantly accelerate the process of finding optimal data mixtures compared to existing methods.