If you’re new to SEO and learning about canonical tags, I have written this guide to help you understand how canonical tags can play a pivotal role in getting content indexed and ranking in search engines.
Table of Contents
Canonical tags are used to fix duplicate content issues. This guide explores the different types of website technical issues that can cause duplicate content. It also provides a step-by-step guide on how to use Screaming Frog to identify duplicate content, and explanations on what Indexing issues Google Search Console reports on and how they can be resolved by implementing a canonical tag.
What is a Canonical Tag?
Canonical tags are HTML elements that tell search engines which version of a webpage should be included in their index and rank for relevant keywords. They also help resolve duplicate content and indexing issues.
These tags most frequently come into play when there are multiple versions of a page with similar, or duplicate content. For example, dynamic URLs, print-friendly URLs, or session ID URL parameters.
By specifying the canonical page, webmasters guide search engines in understanding what content to index and rank.
Why use a Canonical Tag?
As mentioned previously, canonical tags are used to eliminate similar or duplicate content issues.
To use a canonical tag effectively, find instances of duplicate or highly similar content and add a canonical tag in the HTML of each duplicate page, pointing to the preferred URL.
The best way to find instances where there is duplicate highly similar content is to run a crawl of your website using a tool such as Screaming Frog.
How to Write a Canonical Tag
There is a correct syntax to use when writing a canonical tag in HTML. Make sure you use these:
- Link element: The link element to specify a relationship between the current document and an external resource.
- Rel Attribute: Set the rel attribute to “canonical” to indicate that the specified URL is the canonical version.
- Href Attribute: Populate the href attribute with the URL of the preferred canonical version.
What are the best practice implementations?
Handling duplicate content can be tricky, but here are some best practice considerations when using the canonical tag:
- Self-Referential Canonical: A canonical tag must point to the current URL. For example, if URLs A, B, and C are duplicates, and A is the canonical version, placing the tag pointing to A on URL A is considered best practice.
- Near-Duplicates: Canonicalization isn’t restricted to exact duplicates; it can extend to near-duplicates with similar content.
For example, e-commerce pages that target different search queries and produce several pages for what the same product is. Here is an example:
https://www.ebay.com/b/adidas-Yeezy-350/15709/bn_7118853686 https://www.ebay.com/b/Yeezy-350/15709/bn_7117744221. - Mixed Signals: Mixed signals can be confusing for search engines. Refrain from scenarios where page A points to page B while page B points to page A. Also, avoid chaining canonical tags (A-–>B, B-–>C, C–->D).
- Cross-Domain Duplicates: If the same content is found on multiple sites that are managed by the same company, or brand, using the canonical tag across domains is practical and removes duplicate content issues.
Make sure you see Google Search Central examples of issues where a canonical tag is suggested.
How to Check Canonical Tags
The correct implementation of canonical tags is vital for effective page indexing in search engines. Other than running a crawl, you can check them using these other methods:
- Manual Inspection: Right-click on the page, select “view page source”, hit CTRL+F, type canonical, and check to see if the canonical tag is within the <head> section.
- Browser Developer Tools: Right-click on the webpage, select “Inspect,” and navigate to the canonical link element found in the <head> of the HTML source code.
- Online SEO Tools: Crawl a website using Screaming Frog or Moz to review the canonical tag on each page. Check out the links for more information.
How to use Screaming Frog to Find Duplicate Content
- Tick the ‘Near Duplicates’ in ‘Configuration > Content > Duplicates’.
Make sure you configure the ‘Near Duplicate Similarity Threshold’ to enable the Screaming Frog crawler to quantify what percentage of a crawled page is duplicate to another. - Define the HTML tags, classes, or IDs, for the content areas of the page templates on your website in ‘Configuration > Content > Area’.
The best way I find to do this is to use a Chrome browser, right click and then inspect across your page templates. Then, I highlight the HTML element of the main content to understand what tag, class, or ID it uses.
I find that it really depends how the website’s HTML is built to make this task easy… Another and quicker way of doing this is to highlight the HTML element of the header, main navigation, side bar (if there is one), and then the footer. - Make sure the ‘Crawl Analysis’ configuration has ‘Content’ >’Near Duplicates’ ticked to make sure we can populate the ‘Near Duplicates’ filter.
- Crawl the website by clicking ‘Start’ and then wait until it is finished.
- When the crawl is finished, click ‘Crawl Analysis’ and then ‘Start’.
- Now let’s analyse some data… First, you can view duplicate page URLs in the ‘Content’ tab
You will notice several filters in the top left drop down menu but the only ones we you need to focus on at this stage are ‘Exact Duplicates’ and ‘Near Duplicates’. There you will find out which page URLs of your website have duplicate content. - Once you click on a page URL, you can also find out which URLs are duplicate by clicking the ‘Duplicate Details’ tab on the bottom.
What are the Most Common Instances of Duplicate Content?
Here’s a list of the most common duplicate content instances. This is where I strongly advise a canonical tag to be used.
- HTTP vs HTTPs
http://www.website.com/ vs https://www.website.com - WWW vs non-WWW
https://www.website.com/ vs https://website.com - Index files vs root domain
https://www.website.com/ vs https://www.website.com/index.html - Session IDs
https://www.website.com/ vs https://www.website.com/?traffic-source=social - Dynamic content
https://www.website.com/buy/nike-air-jordan/ vs https://www.website.com/buy/nike-air-jordan/?sort-by=price-decending - Print URLs
https://www.website.com/buy/nike-air-jordan/ vs https://www.website.com/buy/nike-air-jordan/?print=true
Google Search Console Page Indexing Report
In Google Search Console, there is a ‘Page Indexing’ section that reports on reasons why pages have not been indexed when Google crawled your website.
Now, although there are many reasons why Google does not index pages when crawling your website, I am going to focus on the ones that relate to duplicate content and how canonical tags are used.
What Does an ‘Alternate Page with Proper Canonical Tag’ Mean?
“Alternate Page with Proper Canonical Tag” is not always an issue! This is reporting the URLs that are not indexed because they contain a canonical tag. It would be best practice to review the URLs in this report to make sure that they truly need a canonical tag and should not be indexed.
How to Fix an ‘Alternate Page with Proper Canonical Tag’
If an alternate page with a proper canonical tag is not performing as expected or if issues arise, consider the following steps:
- Review Canonical URLs:
Ensure that the canonical URL specified is accurate and points to the desired URL. - Content Consistency:
Verify that the content across alternate pages aligns with the canonical version to avoid confusion. - Indexing Issues:
Check for any indexing issues that might be affecting the visibility of the canonical URL.
What does ‘Duplicate Without User-Selected Canonical’ Mean?
“Duplicate without user-selected canonical” in Google Search Console indicates that there is a duplicate page without a canonical tag. This lack of specification leaves Google uncertain about which page to index, potentially leading to arbitrary choices among the duplicates.
How to Fix ‘Duplicate Without User-Selected Canonical’
Add a canonical tag to each duplicate page, guiding search engines to the preferred page that should appear in search results.
What does ‘Duplicate, Google Chose Different Canonical’ than User Mean?
“Duplicate, Google chose different canonical than user” in Google Search Console indicates that, despite your specified canonical tag, Google has indexed a different page. This discrepancy may result from an incorrect or missing canonical tag, or issues with the user-selected canonical page.
How to Fix ‘Duplicate, Google Chose Different Canonical’ than User
Investigate why Google deviated from the canonical tag. Correct or add the canonical tag if it’s inaccurate or missing. If the user-selected canonical page has additional issues, resolve those. Here are some good questions to ask while investigating:
- Does the page specified in the canonical tag target the same search intent?
- What is the percentage of duplicate content?
- What are the call to actions on each page?
- What page would the user have the best experience with?
- Is the page specified in the canonical tag built only for ranking purposes?
Leave a Reply