Weave and Models integration demo
5 minute read
This notebook shows how to use W&B Weave together with W&B Models. Specifically, this example considers two different teams.
- The Model Team: the model building team fine-tunes a new Chat Model (Llama 3.2) and saves it to the registry using W&B Models.
- The App Team: the app development team retrieves the Chat Model to create and evaluate a new RAG chatbot using W&B Weave.
Find the public workspace for both W&B Models and W&B Weave here.

The workflow covers the following steps:
- Instrument the RAG app code with W&B Weave
- Fine-tune an LLM (such as Llama 3.2, but you can replace it with any other LLM) and track it with W&B Models
- Log the fine-tuned model to the W&B Registry
- Implement the RAG app with the new fine-tuned model and evaluate the app with W&B Weave
- Once satisfied with the results, save a reference to the updated Rag app in the W&B Registry
Note:
The RagModel
referenced below is top-level weave.Model
that you can consider a complete RAG app. It contains a ChatModel
, Vector database, and a Prompt. The ChatModel
is also another weave.Model
which contains the code to download an artifact from the W&B Registry and it can change to support any other chat model as part of the RagModel
. For more details see the complete model on Weave.
1. Setup
First, install weave
and wandb
, then log in with an API key. You can create and view your API keys at https://wandb.ai/settings.
2. Make ChatModel
based on Artifact
Retrieve the fine-tuned chat model from the Registry and create a weave.Model
from it to directly plug into the RagModel
in the next step. It takes in the same parameters as the existing ChatModel just the init
and predict
change.
The model team fine-tuned different Llama-3.2 models using the unsloth
library to make it faster. Hence use the special unsloth.FastLanguageModel
or peft.AutoPeftModelForCausalLM
models with adapters to load in the model once downloaded from the Registry. Copy the loading code from the “Use” tab in the Registry and paste it into model_post_init
.
Now create a new model with a specific link from the registry:
And finally run the evaluation asynchronously:
3. Integrate new ChatModel
version into RagModel
Building a RAG app from a fine-tuned chat model can provide several advantages, particularly in enhancing the performance and versatility of conversational AI systems.
Now retrieve the RagModel
(you can fetch the weave ref for the current RagModel
from the use tab as shown in the image below) from the existing Weave project and exchange the ChatModel
to the new one. There is no need to change or re-create any of the other components (VDB, prompts, etc.)!

4. Run new weave.Evaluation
connecting to the existing models run
Finally, evaluate the new RagModel
on the existing weave.Evaluation
. To make the integration as easy as possible, include the following changes.
From a Models perspective:
- Getting the model from the registry creates a new
wandb.run
which is part of the E2E lineage of the chat model - Add the Trace ID (with current eval ID) to the run config so that the model team can click the link to go to the corresponding Weave page
From a Weave perspective:
- Save the artifact / registry link as input to the
ChatModel
(that isRagModel
) - Save the run.id as extra column in the traces with
weave.attributes
5. Save the new RAG model on the Registry
In order to effectively share the new RAG Model, push it to the Registry as a reference artifact adding in the weave version as an alias.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.