With the growing popularity of AI search optimization, analyzing the behavior of large language models (LLMs) becomes a necessity. The insights into how AI models aggregate data from sources, which websites they quote, and what patterns such models follow can be very useful for building AI search optimization strategies.
The problem is finding a way to track LLM responses data on a scale. Working with the popular LLMs directly, using their UI, can be tedious, as you need to scrape and structure the response data from each conversation manually. Besides, some of the features, like advanced web search or reasoning, are available only with pricey subscriptions.
Fortunately, with the new LLM Responses endpoints of DataForSEO AI Optimization API, you can effortlessly and cost-efficiently track the responses of the popular LLMs. Moreover, you will get all the response data in a structured format, perfect for in-depth analysis.
Let’s take a closer look at the LLM Responses endpoints.
Explaining the LLM Responses endpoints
The LLM responses endpoints are designed to retrieve structured responses directly from the four most popular LLMs – ChatGPT, Claude, Gemini, and Perplexity. The endpoints feature an extensive set of input parameters that allow you to send highly customized requests to LLMs and receive tailored responses.
For instance, you can specify:
- Instructions for the AI behavior to define the AI’s role and tone;
- The conversation history in the form of a message chain for additional LLM context;
- Certain large language model type;
- The temperature and top p parameters to control the AI’s randomness and diversity of the response;
- The number of output tokens to control the response size;
- The use of a web search for generating a response.
With such robust request customization options, you can effortlessly test different working scenarios with LLMs and tailor their behavior for your research needs.
The other important feature of LLM Responses endpoints is the transparent, pay-as-you-go pricing system. The price of using each of the LLM responses endpoints consists of the base price per request plus the cost of tokens and features (like web search) used by the selected AI model. That means you pay only for what you use, scaling your spending up or down as needed without costly subscription commitments. To learn more about how we calculate the price for using LLM Responses endpoints, see this Help Center article.
Now, let’s see how to use the LLM Responses endpoints step-by-step.
How to use the LLM Responses endpoints
The LLM Responses API includes separate endpoints for ChatGPT, Claude, Gemini and Perplexity models. The general structure of requests and responses for these endpoints is the same, the main difference is in the use of specific request parameters.
In this example, we’ll use the ChatGPT LLM Responses Live endpoint.
1. Call the ChatGPT LLM Responses endpoint:
POST: https://api.dataforseo.com/v3/ai_optimization/chat_gpt/llm_responses/live
2. Write a prompt for the AI model in the user_prompt
field. The prompt size must not exceed 500 characters;
3. Specify the LLM version with the model_name
parameter. If you specify the model family name (e.g., gpt-4.1-mini
), the system will use the latest model version from the specified family.
To get a list of all available models, call the separate LLM Responses Models endpoint (available separately for ChatGPT, Claude, Gemini, and Perplexity).
Example:
“model_name”: “gpt-4.1-mini-2025-04-14”
or “gpt-4.1-mini”
4. If necessary, limit the token usage by the AI model with the max_output_tokens
parameter. For ChatGPT non-reasoning models, the minimum limit you can set is 16 tokens, and the maximum is 2048. The reasoning models (model name starts with “o”) have the minimum and maximum token limits of 1024 and 4096, respectively.
Note that if the model uses web_search
to generate a response, the output token count may exceed the specified limit.
5. Use the temperature
parameter to adjust the randomness of the AI response. You can specify a float value between 0 and 2 for ChatGPT non-reasoning models. The ChatGPT reasoning models don’t support this parameter.
6. Set the top_p
parameter to limit token selection and control the diversity of the response. For ChatGPT non-reasoning models, this parameter can take a float value between 0 and 2. The ChatGPT reasoning models don’t support this parameter as well.
7. If you want the AI model to access current web information, set the web_search
parameter to true
. For ChatGPT models, you can enable web_search_country_iso_code
and web_search_city
to make web search focused to a specific location.
Note that not all LLM versions support the web_search
. You can check which model versions support the web_search
by calling the dedicated LLM Responses Models endpoint.
Besides, parameters related to web search work differently depending on the AI model. For example, the web_search_city
parameter is available only for ChatGPT and Claude models, and in the Perplexity Sonar models, the web search is enabled by default. For more details, refer to the documentation.
8. Additionally, you can enable the force_web_search
parameter if you use web_search
. With the force_web_search
parameter set, the AI will always attempt to use web search when generating a response. You can use this parameter only with ChatGPT and Claude models.
9. Provide instructions for AI behavior in the system_message
field. Note that the system message must not be more than 500 characters long;
10. You can also add conversation history with the message_chain
array. In the array, you can include up to 10 message objects representing previous conversation turns. Each message object must contain two fields: role
(specifying either "user"
or "ai"
) and message
(containing the message content with a maximum of 500 characters).
The request may be structured like this:
[
{
"system_message": "communicate as if we are in a business meeting",
"message_chain": [
{
"role": "user",
"message": "Hello, what's up?"
},
{
"role": "ai",
"message": "Hello! I’m doing well, thank you. How can I assist you today? Are there any specific topics or projects you’d like to discuss in our meeting?"
}
],
"max_output_tokens": 200,
"web_search_country_iso_code": "FR",
"web_search_city": "Paris",
"model_name": "gpt-4.1",
"temperature": 0.8,
"top_p": 0.75,
"web_search": true,
"force_web_search": true,
"user_prompt": "provide information on how relevant the amusement park business is in France now"
}
]
The response will return as follows:
{
"version": "0.1.20250724",
"status_code": 20000,
"status_message": "Ok.",
"time": "4.2327 sec.",
"cost": 0.037976,
"tasks_count": 1,
"tasks_error": 0,
"tasks": [
{
"id": "07281622-1535-0612-0000-0b23a880232c",
"status_code": 20000,
"status_message": "Ok.",
"time": "4.1055 sec.",
"cost": 0.037976,
"result_count": 1,
"path": [
"v3",
"ai_optimization",
"chat_gpt",
"llm_responses",
"live"
],
"data": {
"api": "ai_optimization",
"function": "llm_responses",
"se": "chat_gpt",
"system_message": "communicate as if we are in a business meeting",
"message_chain": [
{
"role": "user",
"message": "Hello, what's up?"
},
{
"role": "ai",
"message": "Hello! I’m doing well, thank you. How can I assist you today? Are there any specific topics or projects you’d like to discuss in our meeting?"
}
],
"max_output_tokens": 200,
"web_search_country_iso_code": "FR",
"web_search_city": "Paris",
"model_name": "gpt-4.1",
"temperature": 0.8,
"top_p": 0.75,
"web_search": true,
"force_web_search": true,
"user_prompt": "provide information on how relevant the amusement park business is in France now"
},
"result": [
{
"model_name": "gpt-4.1-2025-04-14",
"input_tokens": 368,
"output_tokens": 205,
"web_search": true,
"money_spent": 0.037376,
"datetime": "2025-07-28 16:23:01 +00:00",
"items": [
{
"type": "message",
"sections": [
{
"type": "text",
"text": "The amusement park industry in France remains a significant and growing sector within the country's tourism landscape. In 2024, the market generated approximately USD 3.25 billion in revenue and is projected to reach USD 4.27 billion by 2030, reflecting a compound annual growth rate (CAGR) of 4.4% from 2025 to 2030. ([grandviewresearch.com](https://www.grandviewresearch.com/horizon/outlook/amusement-parks-market/france?utm_source=openai))\n\nFrance hosts some of Europe's most visited theme parks. Disneyland Paris, for instance, attracted over 10.4 million visitors in 2023, maintaining its position as the most visited amusement park in Europe. ([scribd.com](https://www.scribd.com/document/840177818/AECOM-Theme-Index-2023?utm_source=openai)) Similarly, Parc Astérix welcomed approximately 2.8 million visitors in the same year, marking a 7 ",
"annotations": [
{
"title": "France Amusement Parks Market Size & Outlook, 2030",
"url": "https://www.grandviewresearch.com/horizon/outlook/amusement-parks-market/france?utm_source=openai"
},
{
"title": "AECOM-Theme-Index-2023 | PDF | Amusement Park | Disneyland",
"url": "https://www.scribd.com/document/840177818/AECOM-Theme-Index-2023?utm_source=openai"
}
]
}
]
}
]
}
]
}
]
}
In the response, you can find input_tokens
and output_tokens
fields, which indicate how many input and output tokens were processed to send the chat message and generate the response. In addition, the money_spent
field displays the price charged by the AI provider for processing tokens.
The AI response data is located in the items
array, which contains the sections
array with different parts of the AI response.
The sections
array has the following elements:
text
field which holds the AI-generated text content;annotations
array that contains data about all sources used to generate the responses. The source data includes thetitle
of a quoted source and the sourceurl
;
It is important to note that you can receive source annotations only when the web_search
parameter is set to true
. However, if the AI attempts to retrieve web information but does not find relevant results, the annotations
array returns empty.
As you can see, the LLM Responses endpoints open a new way to work with AI models and retrieve the AI response data at scale. With the advanced request customization options and structured data output, you can track and analyze AI responses programmatically – all while maintaining control over costs and model behavior.