Site icon DataForSEO

Why is a website not scanned by On-Page API? Where can I check the reason why it was not crawled?

In some cases, our crawler is unable to scan websites you specify in the target field of the On-Page API Task POST array. It may happen for the following reasons:

Note that you pay only for the pages that were scanned successfully. If a page wasn’t crawled for one of the listed reasons, you would get a refund for it.

However, please also note that 4xx and 5xx pages are scannable, and we spend resources to crawl them — thus, there will be no refunds for crawled pages that returned 4xx and 5xx errors. The same is the case when the crawler is blocked by the Cloudflare filter. In such instances, the API server returns the invalid_page_status_code (4xx) error, so you will be charged for scanning one page.

Where to check the reason your website wasn’t crawled

To find out why your website wasn’t crawled, you should call the Summary endpoint using the ID of an unsuccessful task.

GET https://api.dataforseo.com/v3/on_page/summary/$id

The response from the API server will contain the extended_crawl_status line. It indicates why the crawler was unable to scan a particular website. The line can display the following values:

When the bot is blocked by Cloudflare, the extended_crawl_status displays the invalid_page_status_code value just as with 4xx and 5xx errors. To ensure your site wasn’t scanned due to Cloudflare limitations, find the server line in the API response. It should display the cloudflare string.

Exit mobile version