Using chat completion API in Azure OpenAI
So far in this series, we have looked at the Azure OpenAI completion API, which generates a response for a given prompt. This is a legacy API, and using the chat completion API is recommended. We can build conversational chatbots and similar applications with the chat completion API. This article will examine how to use the Azure OpenAI chat completion API. In the earlier articles, we used the client.completions.create()
function to generate a response. We need to use the client.chat.completions.create()
in the openai
library to build a conversation with the LLM.
|
|
This example uses the
rich.console
andrich.markdown
packages to convert the markdown response from the LLM to formatted output for the console. Make sure you add these packages to requirements.txt and install the dependencies.
The client.chat.completions.create()
function is similar to the earlier one we used. However, this function has many more parameters than the simple completion API. We are most interested in the messages
parameter, a list of messages. Each message in the list is a dictionary that contains keys such as role
and content
. The value of the role
key can be developer
, system
, user
, assistant
, tool
, and function
. The developer
and system
roles are interchangeable and are used to specify the system prompt – a message that tells the LLM how it should respond. The assistant
role is used to indicate the message from the LLM. The user
role is used to tag the user-supplied prompt. We shall look at tool
and function
roles in a later article.
Coming back to this example, we have two messages. A system message that indicates to the LLM how it needs to interpret and respond to the user prompt. The second message is the user prompt. The system prompt is important as it can provide a specific task or instruction to the model. In this example, we ask the LLM to return the response in a tabular format whenever a user asks for a comparison between two things. We will learn more about it when discussing prompt engineering.
Once a response is generated, we can supply the response.choices[0].message.content
to the Markdown()
function to transform the text to a format that can be printed to the console. Finally, using the console.print(md)
function, we print the output to the console.
You can continue this conversation by adding the response from the LLM as the assistant message and then supply a new user prompt. We do not have any user interface to interact with the model as a conversation. However, we can implement a simple loop to “chat” with the model at the command line.
|
|
This updated example is not very different from what we have already tried. It simply prompts the user from a message until it is exit
. Both the user_prompt
and the response from the LLM, assistant_message
, are added to the conversation history. The conversation history is provided to the LLM as messages
; therefore, the LLM always has the complete chat context.
In general, the LLMs have a knowledge cut-off date. This means the LLM has been trained only until a certain date and will not know the most recent or real-time information, which typically results in hallucinations.
|
|
One way to address this issue is to supply the LLM with the latest knowledge in the form of Retrieval Augmented Generation (RAG) or using tools/functions. We shall look at tool/function calling in the next article.
Share on: