How model choice affects your Tool Calling

06 Apr 2025 on Rust

Motivation
- Local
- Software
RIG Example
Diagnosis
Updating the Example
Caveat
Conclusions

Motivation Permalink

Local Permalink

I am a fan of running things locally.

Previously, I had been running a box with 12 onboard SATA drives ( there is a pic here ).

Recently I built a k3s cluster for my local cabinet, which I really need to spend more time with.

I’ve been testing various frameworks. I have been trying to focus on those that will work with Ollama locally (or perhaps GGUF files directly).

My end goal will be to host a NVidia DGX Spark in my cabinet and have everything local connect to it. My Framework works well for smaller models using ROCm; but the Spark should allow me to use larger models. You’ll likely be wondering why I chose the NVidia box instead of the Framework desktop. This is purely because I wanted Nvidia/CUDA. I wanted it for my laptop as well, but it is not available yet. Some models don’t work well on AMD GPUs.

Software Permalink

I am mostly interested in a multi-agent system. Ideally, one that can run 24x7. This makes local control even more important because I don’t want to pay the cost of it running all the time in the cloud.

I plan on using it to assist with some dev tasks, research, home automation, whatever.

As such, I have been testing different agent systems.

When testing Junie, it does a really good job of writing up a list of tasks to complete. Unfortunately, if it gets any errors completing those tasks, it gets a bit stuck. As such, it’s not yet to the point of being autonomous.

Goose works really well. It’s open source and written in Rust. The biggest problem I have with Goose is that every time I connect it to Ollama, the tools just respond with the json rather than doing any work. It works fine with Gemini (other than rate limits); but that’s online.

So today, we are looking at RIG. It is open source, written in Rust and has a lot of examples. Today we will be looking at just one of those examples.

RIG Example Permalink

The RIG example we are going to look at today is ollama_streaming_with_tools.rs.

use anyhow::Result;
use rig::streaming::stream_to_stdout;
use rig::{completion::ToolDefinition, providers, streaming::StreamingPrompt, tool::Tool};
use serde::{Deserialize, Serialize};
use serde_json::json;

#[derive(Deserialize)]
struct OperationArgs {
    x: i32,
    y: i32,
}

#[derive(Debug, thiserror::Error)]
#[error("Math error")]
struct MathError;

#[derive(Deserialize, Serialize)]
struct Adder;
impl Tool for Adder {
    const NAME: &'static str = "add";

    type Error = MathError;
    type Args = OperationArgs;
    type Output = i32;

    async fn definition(&self, _prompt: String) -> ToolDefinition {
        ToolDefinition {
            name: "add".to_string(),
            description: "Add x and y together".to_string(),
            parameters: json!({
                "type": "object",
                "properties": {
                    "x": {
                        "type": "number",
                        "description": "The first number to add"
                    },
                    "y": {
                        "type": "number",
                        "description": "The second number to add"
                    }
                },
                "required": ["x", "y"]
            }),
        }
    }

    async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> {
        let result = args.x + args.y;
        Ok(result)
    }
}

#[derive(Deserialize, Serialize)]
struct Subtract;
impl Tool for Subtract {
    const NAME: &'static str = "subtract";

    type Error = MathError;
    type Args = OperationArgs;
    type Output = i32;

    async fn definition(&self, _prompt: String) -> ToolDefinition {
        serde_json::from_value(json!({
            "name": "subtract",
            "description": "Subtract y from x (i.e.: x - y)",
            "parameters": {
                "type": "object",
                "properties": {
                    "x": {
                        "type": "number",
                        "description": "The number to subtract from"
                    },
                    "y": {
                        "type": "number",
                        "description": "The number to subtract"
                    }
                },
                "required": ["x", "y"]
            }
        }))
        .expect("Tool Definition")
    }

    async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> {
        let result = args.x - args.y;
        Ok(result)
    }
}

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    tracing_subscriber::fmt().init();
    // Create agent with a single context prompt and two tools
    let calculator_agent = providers::ollama::Client::new()
        .agent("llama3.2")
        .preamble(
            "You are a calculator here to help the user perform arithmetic 
            operations. Use the tools provided to answer the user's question. 
            make your answer long, so we can test the streaming functionality, 
            like 20 words",
        )
        .max_tokens(1024)
        .tool(Adder)
        .tool(Subtract)
        .build();

    println!("Calculate 2 - 5");
    let mut stream = calculator_agent.stream_prompt("Calculate 2 - 5").await?;
    stream_to_stdout(calculator_agent, &mut stream).await?;
    Ok(())
}

As you can see, the crux of the app is asking the LLM to use a Subtract tool. In theory, it should take 2 - 5 and return -3.

What do we actually see?

When I run it locally:

Calculate 2 - 5
Response: 2025-04-06T20:48:03.629415Z  INFO rig: Calling tool subtract with args:
"{\"x\":\"2\",\"y\":\"5\"}"
Error: ToolCallError: JsonError: invalid type: string "2", expected i32 at line 1 column 8

So what are we seeing here? We told Ollama that 2 and 5 were numbers (i32 specifically) and they were given back to RIG as Strings.

Is Ollama or RIG at fault here?

Diagnosis Permalink

Let’s try calling Ollama directly without RIG in the mix.

llama3.2:1b Permalink

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
  "model": "llama3.2:1b",
  "messages": [
    {"role": "system", "content": "You are a math assistant. Use the `subtract` tool for arithmetic."},
    {"role": "user", "content": "Calculate 5 minus 3."}
  ],
  "tools": [
    {
      "name": "subtract",
      "description": "Subtract two numbers.",
      "parameters": {
        "type": "object",
        "properties": {
          "a": { "type": "integer", "format": "i32" },
          "b": { "type": "integer", "format": "i32" }
        }
      }
    }
  ]
}'
{"model":"llama3.2:1b","created_at":"2025-04-06T19:12:25.886010847Z","message":{"role":"assistant","content":"{\"type\":\"numeric\", \"function\":\"subtract\", \"parameters\": {\"a\": \"5\", \"b\": \"3\"}}"},"done_reason":"stop","done":true,"total_duration":374119253,"load_duration":13042547,"prompt_eval_count":151,"prompt_eval_duration":117861289,"eval_count":26,"eval_duration":241922157}

When called directly with cURL, we can see that the returned parameters are json strings.

Let’s try another model.

qwen2.5:latest Permalink

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
  "model": "qwen2.5:latest",
  "messages": [
    {"role": "system", "content": "You are a math assistant. Use the `subtract` tool for arithmetic."},
    {"role": "user", "content": "Calculate 5 minus 3."}
  ],
  "tools": [
    {
      "name": "subtract",
      "description": "Subtract two numbers.",
      "parameters": {
        "type": "object",
        "properties": {
          "a": { "type": "integer", "format": "i32" },
          "b": { "type": "integer", "format": "i32" }
        }
      }
    }
  ]
}'
{"model":"qwen2.5:latest","created_at":"2025-04-06T19:24:35.087296724Z","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"subtract","arguments":{"minuend":5,"subtrahend":3}}}]},"done":false}
{"model":"qwen2.5:latest","created_at":"2025-04-06T19:24:35.137474283Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":12999461211,"load_duration":12002129024,"prompt_eval_count":139,"prompt_eval_duration":42131461,"eval_count":30,"eval_duration":949711483}

Well, this is much worse. Qwen changed the name of our parameters.

gemma3:4b Permalink

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
  "model": "gemma3:4b",
  "messages": [
    {"role": "system", "content": "You are a math assistant. Use the `subtract` tool for arithmetic."},
    {"role": "user", "content": "Calculate 5 minus 3."}
  ],
  "tools": [
    {
      "name": "subtract",
      "description": "Subtract two numbers.",
      "parameters": {
        "type": "object",
        "properties": {
          "a": { "type": "integer", "format": "i32" },
          "b": { "type": "integer", "format": "i32" }
        }
      }
    }
  ]
}'
{"error":"registry.ollama.ai/library/gemma3:4b does not support tools"}

Oops, gemma doesn’t even support tool calling. Let’s try another.

michaelneale/deepseek-r1-goose:latest Permalink

DeepSeek will not run on the FW16 / ROCm without core dumping. This version was recommended by Goose specifically for tool calling.

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
  "model": "michaelneale/deepseek-r1-goose:latest",
  "messages": [
    {"role": "system", "content": "You are a math assistant. Use the `subtract` tool for arithmetic."},
    {"role": "user", "content": "Calculate 5 minus 3."}
  ],
  "tools": [
    {
      "name": "subtract",
      "description": "Subtract two numbers.",
      "parameters": {
        "type": "object",
        "properties": {
          "a": { "type": "integer", "format": "i32" },
          "b": { "type": "integer", "format": "i32" }
        }
      }
    }
  ]
}'
{"model":"michaelneale/deepseek-r1-goose:latest","created_at":"2025-04-06T19:23:02.932426378Z","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"subtract","arguments":{"a":5,"b":3}}}]},"done":false}
{"model":"michaelneale/deepseek-r1-goose:latest","created_at":"2025-04-06T19:23:05.558074544Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":54412886209,"load_duration":20797813483,"prompt_eval_count":158,"prompt_eval_duration":718908948,"eval_count":394,"eval_duration":32893678846}

This one had the correct response but was very very slow.

llama3.1:8b-instruct-q4_0 Permalink

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
  "model": "llama3.1:8b-instruct-q4_0",
  "messages": [
    {"role": "system", "content": "You are a math assistant. Use the `subtract` tool for arithmetic."},
    {"role": "user", "content": "Calculate 5 minus 3."}
  ],
  "tools": [
    {
      "name": "subtract",
      "description": "Subtract two numbers.",
      "parameters": {
        "type": "object",
        "properties": {
          "a": { "type": "integer", "format": "i32" },
          "b": { "type": "integer", "format": "i32" }
        }
      }
    }
  ]
}'
{"model":"llama3.1:8b-instruct-q4_0","created_at":"2025-04-06T19:26:24.56003022Z","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"subtract","arguments":{"a":5,"b":3}}}]},"done":false}
{"model":"llama3.1:8b-instruct-q4_0","created_at":"2025-04-06T19:26:24.584434326Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":12760865597,"load_duration":12037184762,"prompt_eval_count":154,"prompt_eval_duration":41198083,"eval_count":19,"eval_duration":681068100}

This one has the correct response and is fast. Let’s try to use this one.

Updating the Example Permalink

Switching the agent line in the example: .agent("llama3.1:8b-instruct-q4_0")

Calculate 2 - 5
Response: 2025-04-06T20:55:47.720048Z  INFO rig: Calling tool subtract with args:
"{\"x\":2,\"y\":5}"

Result: -3

This is great! It works!

Caveat Permalink

Let’s just try doing cargo run a few more times without any code changes.

I tried running it 10 more times.

9 times it got the right result
1 time it got 3 instead of -3

It should be noted that I thought to try this because I did see it respond with the strings instead of numbers ONCE during testing.

Conclusions Permalink

We were able to diagnose why it wasn’t work and fix it.

Not all models are the equal - even if they say they support tool calling.

We should follow best coding practices and code defensively against it being the wrong type.

We could ask it to run more than once and then pick the most frequent answer; however this could become problematic if the tool is doing things like filesystem changes or filing taxes.

I think the next steps will be:

Try to set the temperature to 1
Modify the prompt to encourage the correct answer
Maybe a custom model, model template or training