Résumé IA
Amazon lance le Nova Forge SDK, un outil qui simplifie la personnalisation de grands modèles de langage (LLM) en éliminant les contraintes de gestion des dépendances et de configuration d'infrastructure. Le SDK supporte l'ensemble du spectre de personnalisation, du fine-tuning supervisé (SFT) au Reinforcement Fine-Tuning (RFT), en s'appuyant sur Amazon SageMaker AI et Amazon Nova Forge. Une étude de cas sur la classification de 60 000 questions Stack Overflow en trois catégories de qualité illustre comment le SDK permet d'entraîner et déployer des modèles personnalisés avec des jeux de données à partir de 3 500 exemples.
With a wide array of Nova customization offerings, the journey to customization and transitioning between platforms has traditionally been intricate, necessitating technical expertise, infrastructure setup, and considerable time investment. This disconnect between potential and practical applications is precisely what we aimed to address. Nova Forge SDK makes large language model (LLM) customization accessible, empowering teams to harness the full potential of language models without the challenges of dependency management, image selection, and recipe configuration. We view customization as a continuum within the scaling ladder, therefore, the Nova Forge SDK supports all customization options, ranging from adaptations based on Amazon SageMaker AI to deep customization using Amazon Nova Forge capabilities. In the last post , we introduced the Nova Forge SDK and how to get started with it along with the prerequisites and setup instructions. In this post, we walk you through the process of using the Nova Forge SDK to train an Amazon Nova model using Amazon SageMaker AI Training Jobs. We evaluate our model’s baseline performance on a StackOverFlow dataset, use Supervised Fine-Tuning (SFT) to refine its performance, and then apply Reinforcement Fine Tuning (RFT) on the customized model to further improve response quality. After each type of fine-tuning, we evaluate the model to show its improvement across the customization process. Finally, we deploy the customized model to an Amazon SageMaker AI Inference endpoint. Next, let’s understand the benefits of Nova Forge SDK by going through a real-world scenario of automatic classification of Stack Overflow questions into three well-defined categories (HQ, LQ EDIT, LQ CLOSE). Case study: classify the given question into the correct class Stack Overflow has thousands of questions, varying greatly in quality. Automatically classifying question quality helps moderators prioritize their efforts and guide users to improve their posts. This solution demonstrates how to use the Amazon Nova Forge SDK to build an automated quality classifier that can distinguish between high-quality posts, low-quality posts requiring edits, and posts that should be closed. We use the Stack Overflow Question Quality datase t containing 60,000 questions from 2016-2020, classified into three categories: HQ (High Quality): Well-written posts without edits LQ_EDIT (Low Quality – Edited): Posts with negative scores and multiple community edits, but remain open LQ_CLOSE (Low Quality – Closed): Posts closed by the community without edits For our experiments, we randomly sampled 4700 questions and split them as follows: Split Samples Percentage Purpose Training (SFT) 3,500 ~75% Supervised fine-tuning Evaluation 500 ~10% Baseline and post-training evaluation RFT 700 + (3,500 from SFT) ~15% Reinforcement fine-tuning For RFT, we augmented the 700 RFT-specific samples with all 3,500 SFT samples (total: 4,200 samples) to prevent catastrophic forgetting of supervised capabilities while learning from reinforcement signals. The experiment consists of four main stages: baseline evaluation to measure out-of-the-box performance, supervised fine-tuning (SFT) to teach domain-specific patterns, and reinforcement fine-tuning (RFT) on SFT checkpoint to optimize for specific quality metrics and finally deployment to Amazon SageMaker AI. For fine-tuning, each stage builds upon the previous one, with measurable improvements at every step. We used a common system prompt for all the datasets: This is a stack overflow question from 2016-2020 and it can be classified into three categories: * HQ: High-quality posts without a single edit. * LQ_EDIT: Low-quality posts with a negative score, and multiple community edits. However, they remain open after those changes. * LQ_CLOSE: Low-quality posts that were closed by the community without a single edit. You are a technical assistant who will classify the question from users into any of above three categories. Respond with only the category name: HQ, LQ_EDIT, or LQ_CLOSE. **Do not add any explanation, just give the category as output**. Stage 1: Establish baseline performance Before fine-tuning, we establish a baseline by evaluating the pre-trained Nova 2.0 model on our evaluation set. This gives us a concrete baseline for measuring future improvements. Baseline evaluation is critical because it helps you understand the model’s out-of-the-box capabilities, identify performance gaps, set measurable improvement goals, and validate that fine-tuning is necessary. Install the SDK You can install the SDK with a simple pip command: pip install amzn-nova-forge Import the key modules: from amzn_nova_forge import ( NovaModelCustomizer, SMTJRuntimeManager, TrainingMethod, EvaluationTask, CSVDatasetLoader, Model, ) Prepare eval uation data The Amazon Nova Forge SDK provides powerful data loading utilities that handle validation and transformation automatically. We begin by loading our