You know that feeling when you spend a significant amount of time and money to build an outstanding website, but then realize the volume of traffic falls short of expectations?
Or when competitors outrank the seemingly perfect website even though it has higher authority and better content?
The answer to both questions lies right on the surface: On-Page SEO is one of the most important, but often neglected, ways to achieve better rankings. According to the SEMrush study of ranking factors, on-page parameters is something we all need to pay close attention to in 2018.
We should admit, there are dozens of useful tools that help to check if this or that website is optimized and runs in line with the industry’s best practices. Nonetheless, what happens when you have a few dozens of customers’ sites and want to build a full profile of possible on-page factors influencing their rankings?
DataForSEO is here to help. Our On-page API provides structured SEO data that helped hundreds of agencies and software providers to build their custom tools for running thorough website audits.
On-page API. How does it work?
With On-Page API you get a wide variety of on-page data, which you can use to eliminate hidden website errors and, consequently, boost rankings.
You can also set up filtration parameters to crawl only those pages that match them. For example, it would be wise first to get broken pages, and then check if there are any links pointing at them. What’s more, you can specify the start and end points of crawling, along with a crawling depth.
All our APIs are REST-based, meaning that data exchange is made via the HTTP protocol. The response is structured in JSON or XML format.
If you want to know more about how APIs are functioning, check our blog post Introduction to APIs for SEO Software.
So, how exactly DataForSEO On-page API can help to improve website health? What parameters are involved and what results one ends up with? In the following paragraphs, we will provide answers to all these questions and put a few examples to help you figure out the best way to use On-page API.
Example #1. Identify links pointing at broken pages
Let’s try to find broken pages at rankactive.com website. In the following example, we’ll be using On-Page API and Postman to manage API calls and get the necessary data. Our ready-made Postman examples make it a lot easier to make calls and receive responses.
1 First things first, we need to find out if there are any broken pages on the website. To do that one need to specify the domain name (or a certain page from which the crawling process will start), the crawling depth, and the number of pages you want to crawl. You can also specify the custom setting of a robots.txt file and run the crawling process in the “merge” mode. Alternatively, you can change the robot’s mode to “override” if you want our crawler to ignore the robots.txt settings. This option will stand you in good stead if you can’t access robots.txt settings of the website you are about to crawl, or if crawling of the site is blocked (according to some research, 22% of websites are using robots.txt to hide content from crawlers).
Visit Docs to learn more about the additional properties of the crawler >>
Request Sample { "data": [ { "site":"rankactive.com", "crawl_max_pages":1000, "robots_mode":"override" } ] } Response Sample { "status": "ok", "results_time": "0.1029 sec.", "results_count": 1, "results": { "104574": { "post_id": 636555129, "post_site": "rankactive.com/blog", "task_id": 682051347, "status": "ok" } } }
Don’t forget to copy the unique task identifier (task_id) from the response. You’ll need it later to get the results.
2 The processing time depends on the number of pages sent for crawling. It normally takes no more than a couple of hours to crawl a website for more than 60 on-page parameters. You can set up the pingback_url option to receive the completed task result to a particular URL, or send a GET request manually.
Request Sample https://api.dataforseo.com/v2/op_tasks_get/682051347 Response Sample { "status": "ok", "results_time": "0.2136 sec.", "results_count": 1, "results": [ { "post_id": "0", "post_site": "rankactive.com ", "task_id": 636555129, "string_search_containment": "", "crawl_max_pages": 1000, "crawl_start": "2018-05-23 16:30:23.038056+03", "crawl_end": "2018-05-23 17:37:05.943334+03", "status": "crawled", "summary": [ { "absent_doctype": 0, "absent_encoding_meta_tag": 0, "absent_h1_tags": 11, "canonical_another": 8, "canonical_recursive": 0, "cms": "wordpress 4.9.5", "compression_disabled": 0, "content_invalid_rate": 127, "content_invalid_size": 4, "content_readability_bad": 13, "crawl_end": "2018-05-23T14:36:55.744+00:00", "crawl_start": "2018-05-23T13:30:23.199+00:00", "deprecated_html_tags": 8, "domain": "rankactive.com", "duplicate_meta_descriptions": 0, "duplicate_meta_tags": 0, "duplicate_titles": 12, "favicon_invalid": 0, "have_robots": true, "have_sitemap": true, "images_invalid_alt": 22, "images_invalid_title": 204, "ip": "104.24.96.1", "links_broken": 2, "links_external": 4235, "links_internal": 8697, "meta_description_empty": 164, "meta_description_inappropriate": 1, "meta_keywords_empty": 204, "meta_keywords_inappropriate": 0, "pages_broken": 1, "pages_http": 0, "pages_https": 217, "pages_invalid_size": 0, "pages_non_www": 217, "pages_total": 217, "pages_with_flash": 0, "pages_with_frame": 52, "pages_with_lorem_ipsum": 0, "pages_www": 0, "response_code_1xx": 0, "response_code_2xx": 212, "response_code_3xx": 4, "response_code_4xx": 1, "response_code_5xx": 0, "response_code_other": 0, "seo_friendly_url": 203, "seo_non_friendly_url": 1, "server": "cloudflare", "ssl": true, "ssl_certificate_expiration": "2018-11-27T23:59:59+00:00", "ssl_certificate_hash_algorithm": "sha256ECDSA", "ssl_certificate_issuer": "CN=COMODO ECC Domain Validation Secure Server CA 2, O=COMODO CA Limited, L=Salford, S=Greater Manchester, C=GB", "ssl_certificate_subject": "CN=sni163789.cloudflaressl.com, OU=PositiveSSL Multi-Domain, OU=Domain Control Validated", "ssl_certificate_valid": true, "ssl_certificate_x509_version": 3, "string_containment_check": 0, "test_canonicalization": 403, "test_directory_browsing": false, "test_server_signature": false, "test_trash_page": 404, "time_load_high": 0, "time_waiting_high": 17, "title_duplicate_tag": 0, "title_empty": 0, "title_inappropriate": 3, "title_long": 75, "title_short": 1, "www": false } ] } ] }
The response above indicates there’s only one broken page with a 404 code.
3 Now we can finally find out which page is broken and get its URL. In the response arrow, you’ll see not only the broken page’s web address but also other critical on-page parameters, including canonicals, meta tags, the number of referring links, etc.
Request Sample https://api.dataforseo.com/v2/op_tasks_get_broken_pages/682051347 Response Sample { "status": "ok", "results_time": "0.0747 sec.", "results_count": 5, "results": [ { "post_id": "0", "post_site": "rankactive.com ", "task_id": 636555129, "string_search_containment": "", "crawl_max_pages": 1000, "crawl_start": "2018-05-23 16:30:23.038056+03", "crawl_end": "2018-05-23 17:37:05.943334+03", "status": "crawled", "broken_pages": [ { "address_full": "https://rankactive.com/advert", "address_relative": "/advert", "canonical_another": false, "canonical_page": null, "canonical_page_recursive": "", "content_charset": 0, "content_count_words": 0, "content_encoding": "none", "content_readability_ari": 0, "content_readability_coleman_liau": 0, "content_readability_dale_chall": 0, "content_readability_flesh_kincaid": 0, "content_readability_smog": 0, "crawl_depth": 2, "crawl_end": "2018-05-23T14:00:47+00:00", "crawled": true, "deprecated_html_tags": [], "duplicate_meta_tags": [], "favicon": "", "h1_count": 0, "h2_count": 0, "h3_count": 0, "have_deprecated_tags": false, "have_doctype": false, "have_enc_meta_tag": false, "have_flash": false, "have_frame": false, "have_lorem_ipsum": false, "have_meta_description_duplicates": false, "have_page_duplicates": false, "have_recursive_canonical": false, "have_title_duplicates": false, "images_count": 0, "images_invalid_alt": 0, "images_invalid_title": 0, "links_broken": 0, "links_external": 0, "links_internal": 0, "links_referring": 2, "meta_description": null, "meta_description_consistency": -1, "meta_description_length": 0, "meta_keywords": "", "meta_keywords_consistency": -1, "page_allowed": true, "page_redirect": null, "page_size": 0, "plain_text_rate": 0, "plain_text_size": 0, "response_code": 404, "seo_friendly_url": false, "seo_friendly_url_characters_check": false, "seo_friendly_url_dynamic_check": false, "seo_friendly_url_keywords_check": false, "seo_friendly_url_relative_length_check": false, "ssl": true, "ssl_handshake_time": 6, "string_containment_check": false, "time_connection": 5, "time_download": 0, "time_sending_request": 0, "time_total_load": 153, "time_waiting": 142, "title": null, "title_consistency": -1, "title_duplicate_tag": false, "title_length": 0, "www": false }, ...
As you might’ve noticed, there are only two pages referring to the broken page. Now it’d have been useful to know the URLs of these pages.
4 We used the Get Broken Pages command to get pages referring to the broken URL. In the results enclosed below, we can see that it’s essentially a single page: the first referring URL is the blog post, and the second is its snippet on the tenth page of the blog.
Request Sample https://api.dataforseo.com/v2/op_tasks_get_links_to/682051347/'/advert' Response Sample { "status": "ok", "results_time": "0.0751 sec.", "results_count": 2, "results": [ { "post_id": "0", "post_site": "rankactive.com", "task_id": 682051347, "string_search_containment": "", "crawl_max_pages": 1000, "crawl_start": "2018-05-30 11:35:47.695667+03", "crawl_end": "2018-05-30 12:44:19.762859+03", "status": "crawled", "links_to": [ { "alt": null, "anchor": "wish", "link_from": "https://rankactive.com/blog/affiliate-program-launched-earn-up-to-25-from-all-referral-payments", "link_to": "https://rankactive.com/advert", "nofollow": false, "page_from": "/blog/affiliate-program-launched-earn-up-to-25-from-all-referral-payments", "page_to": "/advert", "relative": true, "ssl_from_use": true, "ssl_to_use": true, "state": "broken", "text_post": "(social networks, blogs, forums etc).", "text_pre": "promo-codes and URLs. You can distribute them wherever you", "type": "href", "www_from_use": false, "www_to_use": false }, { "alt": null, "anchor": "wish", "link_from": "https://rankactive.com/blog/page/10", "link_to": "https://rankactive.com/advert", "nofollow": false, "page_from": "/blog/page/10", "page_to": "/advert", "relative": true, "ssl_from_use": true, "ssl_to_use": true, "state": "broken", "text_post": "(social networks, blogs, forums etc).", "text_pre": "promo-codes and URLs. You can distribute them wherever you", "type": "href", "www_from_use": false, "www_to_use": false } ] } ] }
Now, as we’ve found all 404 pages and identified what URLs are pointing at them, it wouldn’t be difficult to address broken links and eliminate the troubled page.
Example #2. Identify duplicate pages
According to research by Raven Tools, 29% of pages on the web have duplicate content. While there’s no evidence of Google penalizing websites over duplicate content, it may (and in some cases for sure will) affect rankings.
In fact, the term “duplicate content” can mean anything from a quote you took from a research paper to a product description copied from the manufacturer’s website. Or even a title you borrowed from one of your own website’s pages.
Our On-Page API helps to find the most obvious – but nonetheless the most critical – duplicates: those that are very likely to appear on the website. DataForSEO crawler compares all website pages against each other and determines duplicate content proceeding from the three main parameters: “title,” “description,” and “page.” Note that the system is using the “title” parameter by default.
Let’s find out what duplicate pages On-Page API can identify at our example website and then figure out possible solutions to fix them.
1 First of all, we need to find out which pages of the website have duplicates. Our system allows to filter out pages that match specified parameters, such as meta tags, broken links, 404 pages, etc. Thanks to that option, we could easily find pages that have duplicates.
Request Sample { "data": [ { "task_id":682051347, "filters": [ ["have_page_duplicates","=",true] ] } ] } Response Sample { "status": "ok", "results_time": "0.0988 sec.", "results_count": 59, "results": [ { "post_id": "0", "post_site": "rankactive.com", "task_id": 682051347, "string_search_containment": "", "crawl_max_pages": 1000, "crawl_start": "2018-05-30 11:35:47.695667+03", "crawl_end": "2018-05-30 12:44:19.762859+03", "status": "crawled", "pages": [ { "address_full": "https://rankactive.com/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-pages", "address_relative": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-pages", "canonical_another": false, "canonical_page": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-pages", "canonical_page_recursive": "", "content_charset": 65001, "content_count_words": 94, "content_encoding": "gzip", "content_readability_ari": 8.47098351, "content_readability_coleman_liau": 10.0685244, "content_readability_dale_chall": 7.22771168, "content_readability_flesh_kincaid": 55.829937, "content_readability_smog": 17.410965, "crawl_depth": 2, "crawl_end": "2018-05-30T08:45:45+00:00", "crawled": true, "deprecated_html_tags": [], "duplicate_meta_tags": [], "favicon": "/wp-content/uploads/2018/03/cropped-favicon-180x180.png", "h1_count": 2, "h2_count": 0, "h3_count": 1, "have_deprecated_tags": false, "have_doctype": true, "have_enc_meta_tag": true, "have_flash": false, "have_frame": true, "have_lorem_ipsum": false, "have_meta_description_duplicates": false, "have_page_duplicates": true, "have_recursive_canonical": false, "have_title_duplicates": false, "images_count": 2, "images_invalid_alt": 0, "images_invalid_title": 2, "links_broken": 0, "links_external": 19, "links_internal": 40, "links_referring": 6, "meta_description": "", "meta_description_consistency": -1, "meta_description_length": 0, "meta_keywords": "", "meta_keywords_consistency": -1, "page_allowed": true, "page_redirect": null, "page_size": 47072, "plain_text_rate": 0.0290896725, "plain_text_size": 1368, "response_code": 200, "seo_friendly_url": true, "seo_friendly_url_characters_check": true, "seo_friendly_url_dynamic_check": true, "seo_friendly_url_keywords_check": true, "seo_friendly_url_relative_length_check": true, "ssl": true, "ssl_handshake_time": 7, "string_containment_check": false, "time_connection": 12, "time_download": 0, "time_sending_request": 0, "time_total_load": 496, "time_waiting": 477, "title": "How to set a trigger to be notified on changes of ranked pages | RankActive All in One SEO Platform", "title_consistency": 0.8888889, "title_duplicate_tag": false, "title_length": 99, "www": false }, ...
If you take a look at the response example above, you’ll see that there is one page that’s been identified as a duplicate. Nevertheless, we are yet to find out which page it duplicates.
2 That’s exactly the case when the Get Duplicate Pages option of the On-Page API comes in handy. To find duplicates of a page, specify it’s relative URL along with a task ID.
Request Sample https://api.dataforseo.com/v2/op_tasks_get_duplicates/682051347/'/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-pages' Response Sample { "status": "ok", "results_time": "0.1231 sec.", "results_count": 1, "results": [ { "post_id": "0", "post_site": "rankactive.com", "task_id": 682051347, "string_search_containment": "", "crawl_max_pages": 1000, "crawl_start": "2018-05-30 11:35:47.695667+03", "crawl_end": "2018-05-30 12:44:19.762859+03", "status": "crawled", "duplicates": [ { "accumulator": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-pages", "pages": [ { "accumulator": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-pages", "address_full": "https://rankactive.com/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "address_relative": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "title": "How to set a trigger to be notified on changes of ranked keywords | RankActive All in One SEO Platform" } ] } ] } ] } ...
3 Now, when we have two pages with similar content, we need to address the issue and decide which of them is essentially of a lower priority. The decision is based on the amount of traffic each of the pages receives. Ultimately, there’re only two solutions: put a canonical tag on the low-priority page, or remove it. While the first one is pretty obvious, the former one requires the removal of all links pointing at the “problematic” page. You can easily find referring links with the Get Links To Page option of On-Page API.
Request Sample https://api.dataforseo.com/v2/op_tasks_get_links_to/682051347/'/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords' Response Sample { "status": "ok", "results_time": "0.1108 sec.", "results_count": 6, "results": [ { "post_id": "0", "post_site": "rankactive.com", "task_id": 682051347, "string_search_containment": "", "crawl_max_pages": 1000, "crawl_start": "2018-05-30 11:35:47.695667+03", "crawl_end": "2018-05-30 12:44:19.762859+03", "status": "crawled", "links_to": [ { "alt": null, "anchor": "Leave a comment", "link_from": "https://rankactive.com/blog/category/video-tutorials", "link_to": "https://rankactive.com/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "nofollow": false, "page_from": "/blog/category/video-tutorials", "page_to": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "relative": true, "ssl_from_use": true, "ssl_to_use": true, "state": "alive", "text_post": "", "text_pre": "", "type": "href", "www_from_use": false, "www_to_use": false }, { "alt": null, "anchor": null, "link_from": "https://rankactive.com/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "link_to": "https://rankactive.com/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "nofollow": false, "page_from": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "page_to": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "relative": true, "ssl_from_use": true, "ssl_to_use": true, "state": "alive", "text_post": null, "text_pre": null, "type": "canonical", "www_from_use": false, "www_to_use": false }, { "alt": null, "anchor": "Leave a comment", "link_from": "https://rankactive.com/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "link_to": "https://rankactive.com/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "nofollow": false, "page_from": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "page_to": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "relative": true, "ssl_from_use": true, "ssl_to_use": true, "state": "alive", "text_post": "", "text_pre": "", "type": "href", "www_from_use": false, "www_to_use": false }, { "alt": null, "anchor": "", "link_from": "https://rankactive.com/blog/page/4", "link_to": "https://rankactive.com/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "nofollow": false, "page_from": "/blog/page/4", "page_to": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "relative": true, "ssl_from_use": true, "ssl_to_use": true, "state": "alive", "text_post": "", "text_pre": "", "type": "href", "www_from_use": false, "www_to_use": false }, { "alt": null, "anchor": "28 February 2017", "link_from": "https://rankactive.com/blog/page/4", "link_to": "https://rankactive.com/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "nofollow": false, "page_from": "/blog/page/4", "page_to": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "relative": true, "ssl_from_use": true, "ssl_to_use": true, "state": "alive", "text_post": "", "text_pre": "", "type": "href", "www_from_use": false, "www_to_use": false }, { "alt": null, "anchor": "Leave a comment", "link_from": "https://rankactive.com/blog/page/4", "link_to": "https://rankactive.com/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "nofollow": false, "page_from": "/blog/page/4", "page_to": "/blog/how-to-set-a-trigger-to-be-notified-on-changes-of-ranked-keywords", "relative": true, "ssl_from_use": true, "ssl_to_use": true, "state": "alive", "text_post": "", "text_pre": "", "type": "href", "www_from_use": false, "www_to_use": false } ] } ] } ...
After we found links pointing to the duplicate page, we can quickly remove them and erase the page without leaving a single 404 URL.
Final Thoughts
On-Page API has everything you need to build a robust tool for website analysis, which you can use yourself or sell to your clients. With dozens of available on-page parameters, extensive documentation, and affordable pricing DataForSEO is the best API solution you’ll ever come across.
What’s more, our friendly 24/7 customer support team is ready to assist you with API integration.
TRY A NEW APPROACH TO ON-PAGE ANALYTICS