What is Duplicate Content?

Duplicate content refers to the content of a web page or website that is reproduced identically or almost identically on the Web. Duplicate content poses a problem for search engine optimization (SEO), as search engines track down and penalize pages or sites affected by duplicate content.

Duplicate content is the copying and pasting of content from different URLs.

This can be textual content alone in a paragraph, or textual content with other elements across an entire page. When such content is taken over for publication on another URL, with or without slight modification, it is considered duplicate content, and it is the search engine that makes this “judgement”. There are two types of duplicate content.

The first concerns duplicate pages within the same site, on different URLs, either due to the need to create a separate desktop and mobile version of a site, or due to a technical or webmaster error. In this case, the content is perfectly identical. This often happens with e-boutiques and their product sheets. The second is duplicate pages on different sites.

It may be the result of a redistribution of RSS feeds, or an almost identical description of a similar product, or simply plagiarism. It’s a phenomenon that website owners really fear. Sometimes, however, it is intentional, because it is necessary, to allow duplicate content to persist, in which case all you have to do is indicate the source content to the Google robot using the rel=canonical tag, and the page considered to be the original will be indexed.

Duplicate content is detrimental to a page’s search engine ranking

The first thing to remember is that, except in the most severe cases, duplicate content does not prevent the Google search engine from indexing the pages concerned. Google simply tries not to outrank a site by taking the same content into account several times.

What happens to pages classified as duplicate content is that they lose positions in the SERPs and are even removed from the search results. An original page may also be relegated to the background in favor of the content thief when the latter’s PageRank is higher. Apart from the 2 types of duplicate content, there are 3 other cases.

The first refers to strictly identical pages. Here, only the page with the highest PageRank will be indexed.

The second relates to similar pages, but differentiated by their Title and Description tags. Here, all the pages will be indexed, but those that are not considered to be the original will not appear in the SERPs until you click on “relaunch the search including the ignored pages”.

SEO training?
Sylvain is here for you

16 years’ expertise in digital marketing

Find out more

How do you detect duplicate content?

There are several tools you can use to detect duplicate content on your website. The simplest is to use Google’s search tool. You can type a phrase of your content into the Google search bar and see if the same results appear on several pages of your website. You can also use tools such as Copyscape, which allows you to search for duplicate content on your website.

How can I avoid duplicate content?

The best way to avoid duplicate content is to create unique and original content for each page of your website. Make sure you don’t copy and paste content from one page to another, and create unique titles and descriptions for each page. If you need to use the same content on several pages, make sure you rewrite it using keyword variations.