Parallel tool calling in Azure OpenAI
We have learned to perform single- and multi-tool calling with the Azure OpenAI API for chat completions. This part of the series on Azure OpenAI will describe the parallel tool calling feature and how to implement it. Parallel tool calling allows you to perform multiple calls together. This enables parallel execution, result retrieval, and fewer calls to the LLM. Parallelizing tool calls improves overall performance.
In a previous example on retrieving weather at a given location, we examined how to iterate over the LLM response for tool calls and append it to the conversation history before making the next LLM API call.
|
|
In this method, we request the final response from LLM, causing it to use the response from the tool immediately. If we change the prompt from “What’s the weather like in Bengaluru?” to “What’s the weather like in Bengaluru, London, and Austin?”, the above logic will run in a sequence and result in three LLM API calls. We can make a trivial change to this function and reduce the number of LLM API calls.
|
|
As you see above, we moved the final response block out of the for loop. This sends one consolidated API call to the LLM with all three responses retrieved from the get_weather tool.
|
|
This is a short post for today. We will continue to learn about Azure OpenAI in this series of articles.
Comments
Comments Require Consent
The comment system (Giscus) uses GitHub and may set authentication cookies. Enable comments to join the discussion.