Site icon DataForSEO

What is the difference between primary and secondary content in OnPage Content Parsing API?

The Content Parsing endpoint of OnPage API provides an opportunity to get structured content data from any page on the web.

If you checked the documentation of this endpoint, you might have noticed the parsed content is divided into “primary” and “secondary.” Each page contains "primary_content" and "secondary_content" sections. At the same time, the content of each section is broken into "primary_content" and "secondary_content" arrays.

In this Help Center article, we will walk through the algorithm that helps DataForSEO determine whether a certain piece of content has “primary” or “secondary” meaning on the page.

First, every piece of content is assigned a score that considers several criteria.

  1. Length of the word. Does it contain more than one character?
  2. The composition of the word. Do letters constitute more than half of the content?

In a nutshell, the number of words that match the above criteria determines the score of the content. In addition to this, the score for anchors is calculated following the same logic.

To determine whether a piece of content is “primary” or “secondary”, the threshold is set to three. Hence, content with a score greater than 3 and anchors taking less than 30% of the total volume is deemed “primary”. Content that doesn’t match this definition is therefore considered “secondary.”

Here’s an example of the "primary_content":

{
  "text": "Federal Bank creates an API banking system to better integrate with other organizations and ecosystems",
  "url": null
}

As you can see, it doesn’t contain anchor text and consists of letters.
Now, let’s take a look at “secondary” content.

{
  "text": "Read the data sheet (883 KB)",
  "url": "https://www.ibm.com/downloads/cas/WJX036AD"
}

In this case, all content is the anchor text for the download link, so it clearly falls into the "secondary_content" category.

Exit mobile version