Help Center

  • Fill out my online form
  • How to find broken (4xx/5xx) pages?

    A broken page is a web page that responds with a 4xx or 5xx code. Basically, such pages usually show no meaningful content except for an error. More importantly, they definitely don’t provide the content a website visitor expected to see.

    Once a person hits a broken page, they are unlikely to stay on a website let alone return to it at some point in the future.

    All in all, broken pages should be fixed. But before that, you need to find them.

    To find broken pages with DataForSEO On-Page API, first of all, you need to register and get your api key that you should use for authentication.

    Learn more about Authentication in our docs >>

    Once you’re done with that, you can get a list of broken pages in a few simple steps.

    1 Set a task with a POST request https://api.dataforseo.com/v3/on_page/task_post.

    For example:

    [
      {
        "target": "dataforseo.com",
        "max_crawl_pages": 10,
        "tag": "some_string_123",
        "pingback_url": "https://your-server.com/pingscript?id=$id&tag=$tag"
      }
    ]
    

    2 Use the task’s ID you received in the response from the API server to make a GET request to the Summary endpoint: https://api.dataforseo.com/v3/on_page/summary/$id

    Once you receive a response, go to the “checks” array and find the “is_broken” counter. If this field shows a value greater than 0, our crawler found the indicated number of broken pages on the target website.

    3 To extract the list of broken pages, use the task’s ID to make a POST request to the Pages endpoint: https://api.dataforseo.com/v3/on_page/pages. Note that you should apply the filter as in the example below:

    [
      {
        "id": "07281559-0695-0216-0000-c269be8b7592",
        "filters": [
          [
            "resource_type",
            "=",
            "html"
          ],
          "and",
          [
            [
              "checks.is_broken",
              "=",
              true
            ],
            "or",
            [
              "checks.is_4xx",
              "=",
              true
            ],
            "or",
            [
              "checks.is_5xx",
              "=",
              true
            ]
          ]
        ]
      }
    ]
    

    Once you receive the results, you can find the URLs of the broken pages in the “url” fields. The “status_code” fields will show you the status codes each returned crawled page responded with.