HELP CENTER

How to Scrape Google Search Results with Python Using DataForSEO SERP API

Google frequently updates its SERP layout, which often breaks self-hosted scrapers by changing block structures or class names. Developers then spend real time updating selectors, which makes in-house Google search scrapers expensive to maintain. The cost is not just proxies and captcha solutions; it is mostly engineering hours.

There is a way to get the same structured data without owning that maintenance burden. In this guide, we will walk through scraping Google search results with Python using our SERP API: setting up credentials, choosing the right endpoints, making a live request, parsing the JSON, and scaling to high volume. All of it uses Advanced methods, and all of it is, in principle, runnable once you plug in your own login.

Why use our SERP API instead of a hand-rolled scraper

A DIY scraper has three recurring costs, but only one is obvious. Proxies you can budget for. Captcha-solving, you can outsource. The third cost is re-parsing markup after Google changes the page, and that one never ends, because each update can silently break your selectors.

Our SERP API removes that whole layer. DataForSEO maintains the scraping infrastructure, proxy rotation, and parsers, and returns the results to you in structured JSON. You send a keyword, a location, and a language; you get back ranked results in a predictable schema. When Google changes a layout, the parser that gets updated is ours, not yours.

The schema is also wider than most homegrown scrapers bother to handle. Beyond the ten blue links, the Advanced response captures the elements that increasingly define a modern SERP: Featured Snippets, People Also Ask, Local Packs, AI Overviews, and more. You can see the full set of supported elements on our SERP features page. For a customer-facing feature, that coverage matters. It is the difference between a rank tracker that reports a #1 position and one that knows the #1 result sits below an AI Overview. With that out of the way, let’s get the credentials in place.

Setting up — credentials and the endpoints you’ll use

Authentication uses HTTP Basic auth. You take your login and password from the dashboard, and every request carries them, so there is no token management to deal with. The examples below use our RestClient helper, a small file you can download from the docs that handles the authentication header and JSON encoding for you. If you would rather see the raw request flow first, our guide to making your first SERP API call covers it step by step.

For Google Organic results, you will work with two retrieval paths, and we will use the Advanced function for both. The Advanced function returns the complete set of SERP elements, whereas Regular and HTML provide less structure to work with. The first path is the Live method: a single real-time call that returns the data in one response. The second is the Standard method, a task_posttask_get/advanced flow built for volume. You can send up to 2,000 API calls per minute, and a Standard POST can carry up to 100 tasks at once. The trade-offs between the two are covered in the Google Organic SERP overview.

One parameter is worth introducing now, because it controls both how many results you get and what you pay: max_crawl_pages. It sets how many SERP pages we crawl, and each page contains roughly 10 organic results, so max_crawl_pages=2 returns about the top 20. It works alongside depth, and billing is per page crawled, so it doubles as a cost control. You ask for exactly the depth your feature needs and no more.

For the Standard method, you also choose how you learn when a task finishes. With a pingback, we send a notification to a URL you specify, and you then fetch the data with task_get. With a postback, we send the final result directly to your endpoint. Both are explained in our pingbacks and postbacks article, and we will use them in the scaling section.

Scraping Google search results with Python — the Live Advanced call

The Live Advanced endpoint is the fastest way to see real data. It takes a single task and returns the SERP in a single response, making it the right starting point before you wrap anything in a loop or queue. The snippet below requests the top results for one keyword in the United States.

from client import RestClient
# RestClient ships in the helper file linked in our docs:
# https://cdn.dataforseo.com/v3/examples/python/python_Client.zip
client = RestClient("login", "password")

post_data = dict()
# live/advanced accepts a single task per call
post_data[len(post_data)] = dict(
    language_code="en",
    location_code=2840,    # United States; pull the full list from serp/google/locations
    keyword="best serp api",
    max_crawl_pages=2,     # crawl 2 SERP pages (~10 organic results per page)
)

response = client.post("/v3/serp/google/organic/live/advanced", post_data)
if response["status_code"] == 20000:
    print(response)
else:
    print("error. Code: %d Message: %s" % (response["status_code"], response["status_message"]))

Endpoint used: serp/google/organic/live/advanced. See the Live Advanced docs. The status_code == 20000 check is the convention worth keeping in production: a 20000 means the call succeeded, and anything else points you to the message you will want to log. Now that the data is in hand, the next job is reading it.

Parsing the JSON response — pulling the fields you actually need

A live response nests the SERP a few levels deep. The path is consistent, though, so once you know it, the parsing is short. Each call returns a tasks array; each task holds a result array; each result holds the items array, and items is where the SERP elements live. Here is what the structure looks like, with the repetitive arrays and feature objects collapsed so the shape stays readable:

{
  "version": "0.1.x",
  "status_code": 20000,
  "status_message": "Ok.",
  "tasks_count": 1,
  "tasks": [
    {
      "id": "01011234-...-abcdef",
      "status_code": 20000,
      "result": [
        {
          "keyword": "best serp api",
          "type": "organic",
          "se_domain": "google.com",
          "location_code": 2840,
          "language_code": "en",
          "check_url": "https://www.google.com/search?q=best+serp+api&...",
          "item_types": [ "organic", "people_also_ask", "ai_overview", "..." ],
          "se_results_count": 1040000000,
          "items_count": 17,
          "items": [
            {
              "type": "organic",
              "rank_group": 1,
              "rank_absolute": 1,
              "domain": "example.com",
              "title": "...",
              "url": "https://example.com/...",
              "description": "..."
            },
            { "type": "people_also_ask", "items": [ "..." ] },
            { "type": "ai_overview", "items": [ "..." ] }
          ]
        }
      ]
    }
  ]
}

The items array mixes element types, so in practice, you filter by type to get the elements you care about. The loop below walks down to items and pulls the organic results (rank, title, and URL), which is the core of most rank-tracking and SERP-monitoring features.

if response["status_code"] == 20000:
    # tasks[] -> result[] -> items[] is the standard path to the SERP elements
    items = response["tasks"][0]["result"][0]["items"]
    for item in items:
        # items[] mixes element types - organic, paid, featured_snippet, ai_overview, etc.
        if item["type"] == "organic":
            print(item["rank_absolute"], item["title"], item["url"])

The rank_absolute field is the one to lean on. It is the position among all elements in the SERP, so it stays accurate even when AI Overviews or featured snippets push the organic results down the page. To monitor a different element, swap the type you filter on; the access pattern does not change. That consistency is what makes the response easy to build on, and it is also what lets you move from one keyword to thousands without rewriting your parser. Which brings us to volume.

Scaling up — the Standard task-based method

The Live method is convenient, but for large batches, the Standard method is the workhorse. You post tasks, we process them in our queue, and you collect the results when they are ready. Because a single POST can carry up to 100 tasks, you can scrape thousands of keywords without keeping a connection open for each one.

from client import RestClient
client = RestClient("login", "password")

# Step 1 - queue the task (task_post accepts up to 100 tasks per call)
post_data = dict()
post_data[len(post_data)] = dict(
    language_code="en",
    location_code=2840,
    keyword="best serp api",
    max_crawl_pages=2,
    # pingback_url: we send a notification here once the task is ready;
    # $id/$tag are swapped for the real values, then you call task_get to pull the data
    # pingback_url="https://your-domain.com/ready?id=$id&tag=$tag",
    #
    # postback_url: instead of polling, we push the finished result straight here;
    # postback_data sets the delivery format and is required when postback_url is used
    # postback_url="https://your-domain.com/handler?id=$id",
    # postback_data="advanced",
)
post = client.post("/v3/serp/google/organic/task_post", post_data)
task_id = post["tasks"][0]["id"]

# Step 2 - once the task is ready, pull the Advanced result by its id
response = client.get("/v3/serp/google/organic/task_get/advanced/" + task_id)
if response["status_code"] == 20000:
    print(response)
else:
    print("error. Code: %d Message: %s" % (response["status_code"], response["status_message"]))

Endpoints used: serp/google/organic/task_post (docs) and serp/google/organic/task_get/advanced (docs). The commented-out parameters are where pingback and postback come in. With a pingback, we notify you the moment a task is done, so you are not guessing at the delay before step two; you still call task_get to retrieve the data. With a postback, you skip step two entirely. We push the result to your endpoint as soon as it is ready, which is the cleaner fit for an event-driven pipeline. In practice, postback scales better because it removes polling from your side of the integration. If neither is set, the tasks_ready endpoint lists the IDs that have finished, so you only fetch what is actually available.

Common mistakes when scraping Google with Python

Most of the trouble teams hit with a new integration falls into a handful of patterns. Here are the ones worth knowing before they cost you a debugging session.

  1. Authentication errors. The most common first-run failure is a malformed Basic auth header or credentials that do not match those on the dashboard. If your calls come back unauthorized, start here. Our guide to authenticating with the SERP API covers the setup and includes working code.
  2. Polling task_get too early. With the Standard method, calling task_get before the task finishes returns no useful data. Use tasks_ready, a pingback, or a postback to know when the data is actually there, rather than retrying on a timer.
  3. Letting parsed markup go stale. When Google changes a SERP layout, a result you already pulled can end up with gaps. You do not have to re-run the whole task. Our automatic re-parse markup feature reprocesses the stored page using the current parser, so you can recover missing fields without paying for a fresh crawl.

Ignoring the rate limits. The ceiling is 2,000 API calls per minute, with up to 100 tasks per Standard POST. Batching tasks into Standard calls, rather than firing one Live request per keyword, is what keeps a high-volume job inside those limits.

Conclusion

Scraping Google search results with Python does not necessarily require owning a scraper. The hard part of the DIY approach is not the first version. It is the maintenance; the parser rewrites every time Google changes the page. By moving the collection layer to our SERP API, you trade that recurring engineering cost for a single integration: HTTP Basic auth, an Advanced endpoint, and structured JSON that stays stable through layout changes. Whether you use the Live method for real-time lookups or the Standard method with pingbacks and postbacks for volume, the access pattern is the same, and your developers stay focused on the product rather than the plumbing.

Learn more about our SERP API.

Embed DataForSeo widget on your website


Embed code:
Preview: