Fine-Tuning is Dead, Long Live Reinforcement Fine-Tuning [Video]

OpenAI has shattered the boundaries of AI customisation with the debut of reinforcement fine-tuning (RFT) for its o1 models on the second day of its ‘12 Days of OpenAI’ livestream series. This new breakthrough marks the end of traditional fine-tuning as we know it. With RFT, models don’t just replicate—they reason.

By employing reinforcement learning, OpenAI looks to empower organisations to build expert-level AI for complex tasks in law, healthcare, finance, and beyond. This new approach enables organisations to train models using reinforcement learning to handle domain-specific tasks with minimal data, sometimes as few as 12 examples.

By using reference answers to evaluate and refine model outputs, RFT improves reasoning and accuracy in expert-level tasks. OpenAI demonstrated this technique by fine-tuning the o1-mini model, allowing it to predict genetic diseases more accurately than its previous version.