Canonical tags are used to fix duplicate content issues. This guide explores the different types of duplicate content issues that a website can generate. It also provides a step-by-step guide on how to use Screaming Frog to identify duplicate content, and it concludes with explaining what the indexing reports on Google Search Console mean and how to resolve them.
Table of Contents
If you’re new to SEO and learning about canonical tags, this guide will help you understand how they can play a pivotal role in getting content indexed and ranking in search engines
What is a Canonical Tag?
Canonical tags are HTML elements that tell search engines which version of a webpage should be indexed. They also help resolve duplicate content. It’s important to understand that Google considers this tag as a signal rather than a command… this means that they may decide to ignore the canonical tag…
Canonical tags are most commonly used when there are multiple versions of a page. For example, URLs that generate dynamic content, print-friendly versions, or session tracking.
By specifying the canonical URL, webmasters inform search engines in understanding what URL they should index.
Why use a Canonical Tag?
As already mentioned, canonical tags are used to resolve duplicate content issues.
To use a canonical tag effectively, you must audit your website and find instances of when unique URLs gemerate duplicate content, or a high percentage of similar content. You can then add a canonical tag within the <head></head> section of the HTML linking to the URL that you want Google to index.
There are so many ways to identify duplicate content. But the most efficient way is by checking the Page Indexing Report in Google Search Console or crawling your website with a tool like Screaming Frog.
How to Write a Canonical Tag
The first step is to understand the syntax used in the canonical tag… It uses a link element and two attributes, more information on these is listed below:
- Link element: The link element is used to specify a relationship between the current webpage and an external one.
- Rel=”canonical” Attribute: This indicates that the specified URL is the canonical version.
- Href=”{URL}” Attribute: This is the URL of the canonical version of the current webpage.

What are the best practice implementations?
- Self-Referential Canonical: A canonical tag must point to the current URL. For example, if URLs A, B, and C are duplicates, and A is the canonical version, placing the tag pointing to A on URL A is considered best practice.
- Near-Duplicates: Canonicalization isn’t restricted to exact duplicates; it can extend to near-duplicates with similar content.
- Mixed Signals: Mixed signals can be confusing for search engines. Refrain from scenarios where page A points to page B while page B points to page A. Also, avoid chaining canonical tags (A-–>B, B-–>C, C–->D).
- Cross-Domain Duplicates: If the same content is found on multiple sites that are managed by the same company, or brand, using the canonical tag across domains is practical and removes duplicate content issues.
Google Search Central mentions some other scenarios for when a canonical tag should be used. Be sure to read that article if you want to find out more information from the horses mouth.
How to Audit Canonical Tags
It’s important to implement the canonical tag syntax correctly to make sure Google understands it properly and increases the likely-hood of adhering it when crawling your website. Here are some different methods of checking how canonical tags have been implemented:
- Manual Inspection: Right-click on the page, select “view page source”, press CTRL+F, type canonical, and check to see if the canonical tag is nested within the <head> section.

- Browser Developer Tools: Right-click a webpage, select “Inspect,” and navigate to the canonical link element found in the <head> of the HTML source code.

- Online SEO Tools: Crawl a website using Screaming Frog or Moz to review the canonical tag on each page.
How to use Screaming Frog to Find Duplicate Content
- Tick the ‘Near Duplicates’ in ‘Configuration > Content > Duplicates’.
Make sure you configure the ‘Near Duplicate Similarity Threshold’ to enable the Screaming Frog crawler to quantify what percentage of a crawled page is duplicate to another. - Define the HTML tags, classes, or IDs, for the content areas of the page templates on your website in ‘Configuration > Content > Area’.
The most efficient way identify the classes to exclude, or include, is by inspecting the HTML source through Google Chrome. You can simply right click on a page and then click inspect.
Highlight the elements to exclude, like <head>, <footer>, etc. It might be easier to highlight the elements to include to e.g. <main> in this case. - Make sure the ‘Crawl Analysis’ configuration has ‘Content’ >’Near Duplicates’ ticked to make sure we can populate the ‘Near Duplicates’ filter.
- Crawl the website by clicking ‘Start’ and then wait until it is finished.
- When the crawl is finished, click ‘Crawl Analysis’ and then ‘Start’.
- You can view duplicate page URLs in the ‘Content’ tab
You will notice several filters in the top left drop down menu but the only ones we you need to focus on at this stage are ‘Exact Duplicates’ and ‘Near Duplicates’. There you will find out which page URLs of your website have duplicate content. - Once you click on a page URL, you can also find out which URLs are duplicate by clicking the ‘Duplicate Details’ tab on the bottom.
What are the Most Common Instances of Duplicate Content?
- HTTP vs HTTPs
http://www.website.com/ vs https://www.website.com - WWW vs non-WWW
https://www.website.com/ vs https://website.com - Index files vs root domain
https://www.website.com/ vs https://www.website.com/index.html - Session IDs
https://www.website.com/ vs https://www.website.com/?traffic-source=social - Dynamic content
https://www.website.com/buy/nike-air-jordan/ vs https://www.website.com/buy/nike-air-jordan/?sort-by=price-decending - Print URLs
https://www.website.com/buy/nike-air-jordan/ vs https://www.website.com/buy/nike-air-jordan/?print=true
Google Search Console Page Indexing Report
In Google Search Console, there is a ‘Page Indexing’ section that reports on reasons why URLs have not been indexed when Google crawled your website.
Although there could be many reasons why Google does not index a page when crawling your website, this section focuses on reasons that relate to duplicate content and how canonical tags can resolve them.
What Does an ‘Alternate Page with Proper Canonical Tag’ Mean?
“Alternate Page with Proper Canonical Tag” is not always an issue! This is reporting the URLs that are not indexed because they contain a canonical tag. It might be a good idea to review the URLs in this report to make sure that they truly need a canonical tag.
How to Fix an ‘Alternate Page with Proper Canonical Tag’
If an alternate page with a proper canonical tag is not performing as expected, consider the following steps:
- Review Canonical URLs:
Ensure that the canonical URL specified is accurate and points to the desired URL. - Content Consistency:
Verify that the content across alternate pages aligns with the canonical version to avoid confusion. - Indexing Issues:
Check for any indexing issues that might be affecting the visibility of the canonical URL such as robot tags causing conflicts.
What does ‘Duplicate Without User-Selected Canonical’ Mean?
“Duplicate without user-selected canonical” in Google Search Console indicates that there is a duplicate page without a canonical tag. This lack of specification leaves Google uncertain about which page to index, potentially leading to arbitrary choices among the duplicates.
How to Fix ‘Duplicate Without User-Selected Canonical’
Add a canonical tag to each duplicate page, guiding search engines to the preferred page that should appear in search results.
What does ‘Duplicate, Google Chose Different Canonical’ than User Mean?
“Duplicate, Google chose different canonical than user” in Google Search Console indicates that, despite your specified canonical tag, Google has indexed a different page. This discrepancy may result from an incorrect or missing canonical tag link, or issues with the user-selected canonical page.
How to Fix ‘Duplicate, Google Chose Different Canonical’ than User
Investigate why Google deviated from the canonical tag. Correct or add the canonical tag if it’s inaccurate or missing. If the user-selected canonical page has additional issues, resolve those. Here are some good questions to ask while investigating:
- Does the page specified in the canonical tag target the same search intent?
- What is the percentage of duplicate content?
- What are the call to actions on each page?
- What page would the user have the best experience with?
- Is the page specified in the canonical tag built only for ranking purposes?
Leave a Reply