Python Function Calling: How to Give LLMs Access to Real-World Tools

https://ift.tt/ur2iGqK You've probably noticed that LLMs are remarkably good at reasoning — but on their own, they can't check toda...

https://ift.tt/ur2iGqK

You've probably noticed that LLMs are remarkably good at reasoning — but on their own, they can't check today's weather, query your database, or look up a customer record. They generate text. Function calling is what bridges that gap.

Function calling (also called tool calling) lets you connect an LLM to real Python functions. The model decides when a function is needed, tells your application which one to call and with what arguments, and then uses the result to compose its final response. By the end of this article, you'll understand exactly how that loop works and have a complete, runnable Python example you can build from.

What Is Function Calling?
How Does the LLM Decide to Call a Function?
A Complete Python Example, Step by Step
Handling Multiple Function Calls
Common Mistakes and How to Fix Them
When to Use Function Calling
FAQ

What Is Function Calling?

Function calling is a pattern where an LLM returns structured output — specifically, a JSON object describing a function name and arguments — instead of a text answer. Your application reads that output, executes the actual function, and sends the result back to the model. The model then uses that result to write its final response to the user.

The key thing to understand is that the LLM never runs your code directly. It reasons about what needs to happen and returns a description of the action it wants taken. Your application is the one doing the actual work.

That request-and-response loop looks like this:

Python Function Calling Steps

The terms "function calling" and "tool calling" refer to the same thing. Newer API documentation (including OpenAI's) tends to use "tool calling," but you'll see both used interchangeably.

How Does the LLM Decide to Call a Function?

When you make an API request, you pass the model a list of tool definitions alongside the user's message. Each definition describes a function: its name, what it does, and what arguments it takes (using JSON Schema).

The model reads the user's message and the tool definitions together. If it determines that the user's request requires an action your tools can handle, it returns a tool call rather than a text response. If the user's message can be answered from the model's training knowledge alone, it responds normally.

Here's what a single tool definition looks like in Python:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather conditions for a given city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'London' or 'Tokyo'"
                    }
                },
                "required": ["city"],
                "additionalProperties": False
            },
            "strict": True
        }
    }
]

The description field is the most important part for the model's decision-making. It uses your description to understand when this function is appropriate. Vague descriptions lead to inconsistent behavior — a description like "Gets weather" is much less reliable than "Get the current weather conditions for a given city." Be specific about what the function does and when it should be used.

Two response fields tell you what happened. When the model wants to call a function, the response's finish_reason will be "tool_calls" and the message will contain a tool_calls list. When the model has enough information to answer directly (including after receiving your function results), finish_reason will be "stop" and the message will contain plain text in content.

A Complete Python Function Calling Example, Step by Step

The following example walks through the full function calling loop using the OpenAI Python SDK. We'll use a mock weather function — labeled clearly as mock data — so you can run this with just an OpenAI API key.

Install the SDK first if you haven't already:

pip install openai

Then set your API key as an environment variable:

export OPENAI_API_KEY="your-api-key-here"

Step 1: Define the Tool and the Mock Function

import json
from openai import OpenAI

client = OpenAI()

# The tool definition — this is what you pass to the model
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather conditions for a given city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'London' or 'Tokyo'"
                    }
                },
                "required": ["city"],
                "additionalProperties": False
            },
            "strict": True
        }
    }
]

# Mock function — in a real app, this would call a weather API
def get_weather(city: str) -> str:
    mock_data = {
        "London": "Partly cloudy, 15°C",
        "Tokyo": "Sunny, 22°C",
        "New York": "Rainy, 10°C"
    }
    return mock_data.get(city, f"No weather data available for {city}")

Note: This post uses the Chat Completions API, which still works but is no longer OpenAI's recommended endpoint for tool calling. The conceptual loop (request → tool call → execute → result → final response) is the same in both APIs, but the request and response shapes differ.

Step 2: Send the First Request

user_message = "What's the weather like in London right now?"

messages = [
    {"role": "user", "content": user_message}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools
)

print(response.choices[0].finish_reason)
# tool_calls

print(response.choices[0].message.tool_calls)
# [ChatCompletionMessageToolCall(id='call_abc123', function=Function(arguments='{"city":"London"}', name='get_weather'), type='function')]

The model returned finish_reason: "tool_calls" rather than a text answer. It recognized that answering the question requires live data it doesn't have, and it's asking your application to fetch it.

Notice that arguments is a JSON-encoded string, not a Python dict. You'll need to parse it before using it.

Step 3: Execute the Function

# Get the tool call from the response
tool_call = response.choices[0].message.tool_calls[0]

# Parse the arguments — they come as a JSON string
arguments = json.loads(tool_call.arguments)
city = arguments["city"]

# Call your actual function
result = get_weather(city)
print(result)
# Partly cloudy, 15°C

Step 4: Send the Result Back

This next step is a common ‘gotcha’ that can confuse people new to tool calling. You need to build a new messages list that includes the full conversation history: the original user message, the assistant's tool call message, and a new tool role message containing your function's result.

# Append the assistant's tool call message to the history
messages.append(response.choices[0].message)

# Append the function result as a "tool" role message
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": result
})

After step 4, your messages list looks like this:

Role	Content
`user`	"What's the weather like in London right now?"
`assistant`	(tool call: `get_weather` with `{"city": "London"}`)
`tool`	"Partly cloudy, 15°C"

The tool_call_id in your result message must match the id from the original tool call. This is how the model tracks which result belongs to which request — it matters especially when you have multiple function calls in a single turn.

Step 5: Get the Final Response

final_response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools
)

print(final_response.choices[0].finish_reason)
# stop

print(final_response.choices[0].message.content)
# The weather in London right now is partly cloudy with a temperature of 15°C.

This time, finish_reason is "stop" — the model has everything it needs and returned a natural-language response.

Here's the complete working script:

import json
from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather conditions for a given city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'London' or 'Tokyo'"
                    }
                },
                "required": ["city"],
                "additionalProperties": False
            },
            "strict": True
        }
    }
]

def get_weather(city: str) -> str:
    # Mock data — replace this with a real weather API call
    mock_data = {
        "London": "Partly cloudy, 15°C",
        "Tokyo": "Sunny, 22°C",
        "New York": "Rainy, 10°C"
    }
    return mock_data.get(city, f"No weather data available for {city}")

# Step 1: Initial request
messages = [{"role": "user", "content": "What's the weather like in London right now?"}]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools
)

# Step 2: Check if the model wants to call a function
if response.choices[0].finish_reason == "tool_calls":
    tool_call = response.choices[0].message.tool_calls[0]

    # Step 3: Execute the function
    arguments = json.loads(tool_call.arguments)
    result = get_weather(arguments["city"])

    # Step 4: Add the assistant message and result to history
    messages.append(response.choices[0].message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": result
    })

    # Step 5: Get the final response
    final_response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        tools=tools
    )
    print(final_response.choices[0].message.content)
else:
    # No tool call needed — the model answered directly
    print(response.choices[0].message.content)

Handling Multiple Function Calls

The model can request more than one function in a single turn. This is called parallel tool calling, and it happens when the model determines that multiple independent lookups are needed to answer the user's question — for example, if someone asks for the weather in both London and Tokyo at the same time.

Because tool_calls is always a list, you should loop over it rather than assuming there's exactly one call:

if response.choices[0].finish_reason == "tool_calls":
    # Append the assistant's message first
    messages.append(response.choices[0].message)

    # Then loop over all tool calls and execute each one
    for tool_call in response.choices[0].message.tool_calls:
        arguments = json.loads(tool_call.arguments)
        result = get_weather(arguments["city"])

        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": result
        })

If you need the model to call functions one at a time rather than in parallel (for example, when each function call depends on the result of the previous one), set parallel_tool_calls=False in your API request. This ensures the model issues at most one tool call per turn.

Common Function Calling Mistakes and How to Fix Them

Most function calling bugs fall into a small set of patterns. If something isn't working, check this table first.

Mistake	What goes wrong	Fix
Treating `arguments` as a dict	`KeyError` or `TypeError` when accessing fields	Always parse with `json.loads(tool_call.arguments)`
Skipping the assistant message in history	Model loses context; may repeat the tool call	Append `response.choices[0].message` before the tool result
Vague function descriptions	Model calls the wrong function or never calls it	Write descriptions that specify exactly when the function should be used
Not handling `finish_reason: "stop"` on first call	Code crashes if no tool call was made	Always check `finish_reason` before accessing `tool_calls`
Not validating the function name	Crashes if the model hallucinates a nonexistent function name	Check `tool_call.function.name` against a known list before calling

The last mistake is worth extra attention if you're using open-source models. Models like Llama, running locally via Ollama, are more likely to hallucinate function names that don't exist. Validating the function name before calling it prevents hard-to-debug crashes.

When to Use Function Calling

Function calling is the right pattern for a specific set of problems. It's not the right tool for everything.

Use it when:

You need real-time or dynamic data the model can't have in its training data (weather, stock prices, live inventory, user-specific records)
You want the model to trigger actions in your system (create a calendar event, send a message, update a database row)
You're building a natural-language interface over an existing API or set of services
You want the model to coordinate multiple data sources to answer a single question

Skip it when:

The model can answer from its training knowledge without external data
You just want structured JSON output from a plain text prompt — for that, use structured outputs (the response_format parameter), which is simpler and doesn't require a function loop
You're building a simple chatbot where the model doesn't need to take actions

The structured outputs comparison comes up constantly in practice. If you just want the model to format its response as JSON matching a schema, you don't need function calling. Function calling is for cases where the model needs to fetch or act on information your application controls.

What You Can Build From Here

Function calling transforms an LLM from a text generator into an active participant in your application. Once you understand the loop — define tools, receive a tool call, execute the function, return the result, get the final response — you can apply it to almost any situation where your users need answers from systems outside the model's training data.

The pattern we covered here is the foundation for more complex setups: chaining multiple function calls, coordinating between multiple agents, or building natural-language interfaces over your own APIs. The best way to internalize it is to swap out the mock weather function for something from your own work.

If you want to go further with Python and LLM development, Dataquest's Python for Data Engineering and AI fundamentals paths cover the programming foundations you'll need to build production-grade pipelines and applications. Start with the basics and build toward the tools you want to work with.

FAQ

What's the difference between function calling and tool calling?

They're the same thing. "Function calling" was the original term introduced in 2023. OpenAI's newer APIs use "tool calling" to reflect that the pattern can extend beyond functions to other types of tools. Most developers use both terms interchangeably.

Can I use function calling with open-source models like Llama?

Yes — many open-source models support tool calling, including Llama and Mistral. You can run them locally with Ollama using a compatible API. The mechanics are similar to the OpenAI implementation, but open-source models are more likely to hallucinate function names, so validating the function name before executing it is especially important.

What's the difference between function calling and RAG?

Retrieval-Augmented Generation (RAG) pulls relevant documents into the model's context before it generates a response. Function calling lets the model request specific data or actions during a conversation. The two patterns can be used together: a function call could trigger a vector search, and the results feed back into the conversation.

How many functions can I pass to the model at once?

There's no hard limit, but OpenAI's own guidance recommends keeping the initially available tool count below 20 for best accuracy. More tools mean more tokens used and more potential for the model to choose the wrong one. If you need a large tool surface, look into tool search, which lets the model load tools on demand rather than receiving all of them upfront.

Does function calling work with streaming?

Yes. When streaming is enabled, you receive choices[0].delta.tool_calls[i].function.arguments events that contain partial argument JSON, which you accumulate into the full arguments string. The overall flow is the same — you still execute the function and send the result back. See the OpenAI streaming documentation for implementation details.

What's the difference between function calling and structured outputs?

Function calling is for when the model needs to request external data or trigger an action. Structured outputs (the response_format parameter) are for when you want the model to format its answer in a specific JSON schema — no external calls needed. If you just want the model to return a JSON object instead of prose, use structured outputs. Reserve function calling for when real-world interaction is actually required.

from Dataquest https://ift.tt/4Ciu2oD
via RiYo Analytics

Page Nav

Ads Place

Python Function Calling: How to Give LLMs Access to Real-World Tools

https://ift.tt/ur2iGqK You've probably noticed that LLMs are remarkably good at reasoning — but on their own, they can't check toda...

Table of Contents

What Is Function Calling?

How Does the LLM Decide to Call a Function?

A Complete Python Function Calling Example, Step by Step

Step 1: Define the Tool and the Mock Function

Step 2: Send the First Request

Step 3: Execute the Function

Step 4: Send the Result Back

Step 5: Get the Final Response

Handling Multiple Function Calls

Common Function Calling Mistakes and How to Fix Them

When to Use Function Calling

What You Can Build From Here

FAQ

What's the difference between function calling and tool calling?

Can I use function calling with open-source models like Llama?

What's the difference between function calling and RAG?

How many functions can I pass to the model at once?

Does function calling work with streaming?

What's the difference between function calling and structured outputs?

Related Posts

No comments

Connect WIth Us

Top of the month

Project Tutorial: Build a Multi-Provider LLM Gateway

DataCamp vs Coursera: Which Is Worth It in 2026?

The Best $20 AI Plan: ChatGPT Plus vs Claude Pro vs Gemini Pro

GraphRAG vs Vector RAG: Which Retrieval Method is Best?

Latest Posts

Cloud Labels

Search This Blog

Report Abuse

Contributors

Happy To Help You

Popular Tag

Latest Articles

Explainability Using Bayesian Networks

4 Key Players in Python Data Visualization Ecosystem: Matplotlib, Seaborn, Altair, and Plotly

Do Data-based Solutions Give Enterprises a Decision Advantage?

Coinbase Launches International Exchange for Bitcoin and Ether Perpetual Futures

Popular Posts

Spider-Man: No Way Home Torrents May Contain Crypto Malware, Cybersecurity Firm Warns

10 Impressive Tableau Projects for Your Portfolio

3air Leverages Blockchain Technology to Deliver Extensive Broadband Connectivity in Africa

Onecoin Victims Petition Bulgaria for Seizure of Assets and Compensation