OpenAI has shattered the boundaries of AI customisation with the debut of reinforcement fine-tuning (RFT) for its o1 models on the second day of its ‘12 Days of OpenAI’ livestream series. This new breakthrough marks the end of traditional fine-tuning as we know it. With RFT, models don’t just replicate—they reason.
By employing reinforcement learning, OpenAI looks to empower organisations to build expert-level AI for complex tasks in law, healthcare, finance, and beyond. This new approach enables organisations to train models using reinforcement learning to handle domain-specific tasks with minimal data, sometimes as few as 12 examples.
By using reference answers to evaluate and refine model outputs, RFT improves reasoning and accuracy in expert-level tasks. OpenAI demonstrated this technique by fine-tuning the o1-mini model, allowing it to predict genetic diseases more accurately than its previous version.
Redefining Model Fine-Tuning
Unlike traditional fine-tuning, RFT focuses on teaching models to think and reason through problems, as Mark Chen, …