How to get the HTML of a page?
in
HTML data can help you build advanced website audit solutions.
To receive the raw HTML code of a page with DataForSEO OnPage API, add the store_raw_html
field to the On-Page API Task POST body and set it to true
.
Example:
[ { "target": "dataforseo.com", "max_crawl_pages": 10, "store_raw_html": "true" } ]
Once the task is created, copy its ID and call the Raw HTML endpoint.
Here, you should specify the ID of your task and the url of a page you want to receive HTML data for.
Example:
[ { "id": "09161530-2692-0216-0000-e429a13680de", "url": "https://dataforseo.com/apis" } ]
Once the crawl is finished, the Raw HTML endpoint will provide you with HTML data of the page you specified in the url
field.
{ "version": "0.1.20210917", "status_code": 20000, "status_message": "Ok.", "time": "0.1204 sec.", "cost": 0, "tasks_count": 1, "tasks_error": 0, "tasks": [ { "id": "10051429-2806-0216-0000-74f7a98536e8", "status_code": 20000, "status_message": "Ok.", "time": "0.0581 sec.", "cost": 0, "result_count": 1, "path": [ "v3", "on_page", "raw_html" ], "data": { "api": "on_page", "function": "raw_html", "url": "https://dataforseo.com/apis/serp-api", "target": "dataforseo.com", "max_crawl_pages": 10, "store_raw_html": "true" }, "result": [ { "crawl_progress": "finished", "crawl_status": { "max_crawl_pages": 10, "pages_in_queue": 0, "pages_crawled": 10 }, "items_count": 1, "items": { "html": "<!DOCTYPE html></html>" }