The video mentions using the Pydantic library to define data models for structured output.
In the "tool use" example, the application handles the direct call to the external API. The LLM's role is to identify that a tool needs to be called and to provide the specific parameters (like latitude and longitude for a weather API) required for that tool. The Python script then takes these parameters and makes the actual API call.
This video is a beginner's course on building AI agents using pure Python and directly interacting with large language model (LLM) APIs, such as OpenAI. It demonstrates how to leverage fundamental building blocks like basic API calls, structured output, and tool usage to create effective AI systems without relying on external frameworks. The course also covers workflow patterns like prompt chaining, routing, and parallelization, offering practical examples and insights for developers.
Yes, the video displays several other functionalities of AI agents beyond just generating text:
In the context of the video, the LLM response itself doesn't directly call a function. Instead, the LLM is configured to output a specific format that indicates which function should be called and with what arguments.
Here's how it works and what triggers the function call:
response_format or similar) to output a structured response that signifies a tool call. This response typically includes:
name of the tool (function) it wants to call.arguments it has determined for that function, formatted according to the tool's definition.finish_reason indicates a tool_calls. This is the primary trigger.tool_name and arguments provided by the LLM.if tool_name == "get_weather":) to match the LLM's request to the actual Python function defined in your code.arguments that the LLM extracted.So, the LLM generates a structured request to call a function, and your application's logic acts upon that request to actually execute the function. The "trigger" is the application detecting the specific tool-call format in the LLM's response.
You're touching on a crucial aspect of advanced LLM usage: handling more complex reasoning processes like Chain-of-Thought (CoT) alongside structured outputs. The key is to design your prompts and parsing logic to accommodate these intermediate thinking steps.
Here's how you can approach handling "thinking models" or CoT outputs before a final structured output:
Prompt Engineering for CoT + Structured Output:
You are an AI assistant tasked with extracting event details.
First, think step-by-step to determine if the user's request describes a calendar event.
Show your reasoning.
Once you have completed your reasoning and confirmed it's a calendar event, extract the following information and provide it as a JSON object with keys: "name", "date", "participants".
If it's not a calendar event, simply state "Not a calendar event."
User request: [user_input_here]
LLM Configuration for "Function Calling" or Tool Use:
tools (functions) as before. The LLM's goal is still to identify if a tool call is appropriate.Handling the LLM's Response: This is where your application logic becomes more sophisticated:
Scenario A: LLM provides the structured tool call directly (most common with function calling)
tool_calls object, specifying the function name and arguments. Your application parses this as described previously. The CoT is usually internal to the LLM's generation process and not directly exposed in the structured output intended for function calls.Scenario B: LLM provides CoT and then structured output (less common for direct function calls, more for general structured data)
{ ) or the specific structured format you requested.Scenario C: Hybrid Approach (using tool use for the final step)
tool_calls object. If you want to also capture the CoT, you might need to:
What Triggers a Function Call?
The "trigger" for a function call, especially when using LLM APIs with built-in function calling capabilities, is the LLM's output being structured to indicate a tool call. Specifically, the API response often contains a field (like finish_reason: "tool_calls") and a tool_calls array detailing the function name and arguments.
If you're not using a dedicated function-calling API feature and are just parsing a general JSON output after CoT, the trigger is your application's code:
In essence, the LLM generates a description of what needs to be done (including reasoning and desired final output/action), and your application code interprets that description and executes the corresponding actions or function calls.
Parsing the LLM's output to call a function typically involves these steps, especially when using APIs designed for function calling or tool use:
Receive the LLM Response: Your application receives a response object (often JSON) from the LLM API.
Check for Tool Calls: The first crucial step is to determine if the LLM decided to call a tool. Many LLM APIs indicate this in the response. For example, OpenAI's API has a finish_reason field that might be set to "tool_calls". You'll need to check this field.
Access the tool_calls Array: If the finish_reason indicates tool calls, you'll then access the specific part of the response that contains the list of tool calls. In OpenAI's API, this is typically under a key like message['tool_calls'].
Iterate Through Tool Calls (if multiple): A single LLM response might contain instructions to call multiple tools. You'll likely iterate through each item in the tool_calls array.
Extract Function Name: For each tool call, extract the intended function's name. This is usually found under a key like function['name'].
Extract Function Arguments: Also extract the arguments the LLM determined for that function. These are typically found under function['arguments']. This is often a JSON string that needs to be parsed further.
Parse the Arguments String: Since the arguments are usually provided as a JSON string, you'll need to use a JSON parser (like Python's built-in json module) to convert this string into a Python dictionary or object.
import json
arguments_string = tool_call['function']['arguments']
try:
arguments_dict = json.loads(arguments_string)
except json.JSONDecodeError as e:
print(f"Error decoding arguments JSON: {e}")
# Handle the error, perhaps by logging or skipping this call
continue
Map Function Name to Actual Python Function: You'll need a way to map the function name provided by the LLM (e.g., "get_weather") to the actual Python function defined in your codebase (e.g., def get_weather(latitude: float, longitude: float): ...). A common way to do this is:
available_tools = {
"get_weather": get_weather_function,
"schedule_event": schedule_event_function,
# ... other tools
}
if/elif/else statements based on the tool_name.Call the Python Function with Parsed Arguments: Once you have the Python function object and the parsed arguments dictionary, you can call the function. You'll need to ensure the arguments from the dictionary match the expected parameters of your Python function. Python's **kwargs can be very helpful here.
tool_name = tool_call['function']['name']
if tool_name in available_tools:
function_to_call = available_tools[tool_name]
try:
# Safely call the function with arguments
tool_response = function_to_call(**arguments_dict)
# Now you have the result from your tool
except Exception as e:
print(f"Error calling function {tool_name}: {e}")
# Handle execution errors
else:
print(f"Unknown tool: {tool_name}")
# Handle cases where the LLM requested a tool your app doesn't recognize
Handle the Tool's Output: The tool_response you get back from calling your Python function is then typically sent back to the LLM in a subsequent API call to allow the LLM to formulate a final user-facing response.
Simplified Example Flow (Python):
import json
# Assume 'llm_response' is the JSON object from the API
# Assume 'available_tools' is a dictionary mapping tool names to functions
# Assume 'get_weather' and 'schedule_event' are your Python functions
if llm_response.get("choices")[0].get("finish_reason") == "tool_calls":
for tool_call in llm_response["choices"][0]["message"]["tool_calls"]:
function_name = tool_call["function"]["name"]
arguments_string = tool_call["function"]["arguments"]
try:
arguments = json.loads(arguments_string)
except json.JSONDecodeError:
print(f"Failed to parse arguments for tool: {function_name}")
continue # Skip to the next tool call
if function_name in available_tools:
function_to_execute = available_tools[function_name]
try:
# Call the actual Python function with the parsed arguments
tool_result = function_to_execute(**arguments)
print(f"Successfully called {function_name} with result: {tool_result}")
# In a real app, you'd append this result back to the conversation history
# to send back to the LLM for a final response.
except TypeError as e:
print(f"Argument mismatch for function {function_name}: {e}")
except Exception as e:
print(f"Error executing function {function_name}: {e}")
else:
print(f"LLM requested unknown tool: {function_name}")
This process ensures that the LLM's intent to use a tool is correctly interpreted by your application, and the appropriate Python function is invoked with the right parameters.
To chain more functions, concatenate their results, and pass them to the LLM for further processing or a final response, you'll build upon the principles of handling single tool calls. The core idea is to create a loop or a sequence that manages multiple function calls and their outputs.
Here's a breakdown of how to chain functions and concatenate results for the LLM:
Define All Functions (Tools):
available_tools dictionary) that includes all these functions, mapping their LLM-recognizable names to your Python function objects.Handle Multiple Tool Calls in the LLM Response:
tool_calls in a single response. Your parsing logic needs to handle this by iterating through the tool_calls array as shown previously.Sequential Function Calling with LLM Orchestration:
tool_calls and executes the first function.tool_result from its execution. This is crucial for the LLM to understand what happened.
# Example structure to append to message history
message_history.append({
"role": "tool",
"tool_call_id": tool_call["id"], # Keep track of which call this answers
"name": function_name,
"content": str(tool_result) # Convert result to string
})
message_history (including the tool result) back to the LLM. Ask it: "Based on the previous steps and the results, what should I do next?"tool_result, and may decide to:
Concatenating Results for the Prompt (for the next LLM call):
tool_result to the message history, you're effectively concatenating the results. The LLM receives the full context:
content of the "tool" role message is how the results are concatenated and fed back into the LLM's "prompt" (which is technically the entire conversation history).Example: Chaining Two Functions
Let's say you have get_user_location(user_id) and get_weather(latitude, longitude).
get_user_location and get_weather.tool_calls for get_user_location with user_id="Bob".tool_calls for get_user_location.get_user_location(user_id="Bob"). Let's say it returns {"latitude": 34.0522, "longitude": -118.2437}.message_history.append({
"role": "tool",
"tool_call_id": "call_abc123", # ID from LLM response
"name": "get_user_location",
"content": json.dumps({"latitude": 34.0522, "longitude": -118.2437})
})
message_history is sent.get_weather using the latitude and longitude it just received. It might return a tool_calls for get_weather(latitude=34.0522, longitude=-118.2437).tool_calls for get_weather.get_weather(latitude=34.0522, longitude=-118.2437). Let's say it returns {"temperature": 75, "unit": "Fahrenheit", "description": "Sunny"}.message_history.append({
"role": "tool",
"tool_call_id": "call_def456", # ID from LLM response
"name": "get_weather",
"content": json.dumps({"temperature": 75, "unit": "Fahrenheit", "description": "Sunny"})
})
message_history is sent again.Key Considerations:
message_history) throughout the chain.finish_reason is not tool_calls (e.g., stop).try-except blocks when parsing JSON arguments, as LLM output can sometimes be malformed.By iteratively calling the LLM and feeding back the results of executed functions, you can orchestrate complex workflows that chain multiple functions together.