Canonical Tag Definition
This is a method of communicating to search engines that a specific URL represents the master copy of a page. A canonical tag prevents problems that are caused by identical or duplicate content appearing on numerous URLs. The canonical tag is used to tell the search engine the version of the URL that someone wants to appear in the search results.
A Little More on Canonical Tag
Duplicate content is complicated, but when search engines have many URLs with similar content, various SEO problems arise. When the search crawlers sift through many similar contents, they can miss the user’s unique content. Large scale duplication may also reduce the ranking ability of an individual. Another problem is that even if the one’s content does not rank, the search engines may identify the wrong URL as the original. Canonization helps to minimize duplicate content.
The problem with URLs
Most people think of a page as a concept. For search engines, each unique URL represents a separate page. For example, search engines can reach a homepage in each of the following ways.
To a person, all these URLs lead to the same page, but to a search engine, each URL is a unique page. This is, however, a small sample of the variations that one might encounter. Modern content management systems and code-driven websites which are dynamic worsen the problem even more. Most sites add tags automatically, allow multiple URLs to the same content and add URL parameters for searches, currency options, sorts, etc.
Canonical tag best practices
Some of the essential things to be taken into consideration when using canonical tags are as follows:
- Canonical tags can be self-referential. It is alright if a canonical tag points to the current URL. If URLs X, Y, and Z are similar and X is the canonical version, it is okay to put the tag pointing to X on URL X.
- Proactively canonicalize your home-page. Since homepage duplicates are popular and people; thus people may link to someone’s homepage in numerous ways, it is advisable for one to put a canonical tag on their homepage template to prevent any problems.
- Spot-check your dynamic canonical tags. One should spot-check their URLs since a bad code may cause a site to write a separate canonical tag for every version of the URL.
- Avoid mixed signals. If one sends mixed signals, the search engines may avoid a canonical tag or misinterpret it. It is also not advisable to chain canonical tags. One should send clear signals to prevent forcing the search engines to make bad choices.
- Be careful canonicalizing near-duplicates. One can canonicalize near-duplicates, but they should do so with caution. Usually, it is okay to use canonical tags for identical pages. However, the non-canonical versions of the page may not qualify for ranking, and if the pages are very different, the search engine may decide to ignore the tag.
- Canonicalize cross-domain duplicates. If one controls both sites, they may utilize the tag across various domains. For example in the case of a publishing company that publishes the same article on numerous websites, using the canonical tag will focus their ranking on just a single site. This is because canonicalization will block the non-canonical sites from ranking.
References for Canonical Tag
Academic Research for Canonical Tag
- A longitudinal study of Web pages continued: a consideration of document persistence, Koehler, W. (2004). Information Research, 9(2), 9-2. This article evaluates the existing literature together with an ongoing study of a set of URLs that were first identified in late 1996 to identify whether a static collection of general webpages can achieve some stability after they age.
- Canonicalization: A fundamental tool to facilitate preservation and management of digital information, Lynch, C. (1999). This paper shows how one of the school of thought concerned with the preservation of digital objects has shifted its gaze from the conservation of physical artifacts which house information temporarily and focused on the conservation of the actual objects in the disembodied digital form.
- xPerm: fast index canonicalization for tensor computer algebra, Martín-García, J. M. (2008). Computer physics communications, 179(8), 597-603. This is a presentation of quick implementation of the Butler-Portugal algorithm for index canonicalization concerning permutation symmetries which has been written as a combination of a Mathematical package and a C subroutine.
- An incremental heap canonicalization algorithm, Musuvathi, M., & Dill, D. L. (2005, August). In International SPIN Workshop on Model Checking of Software (pp. 28-42). Springer, Berlin, Heidelberg. This paper presents an incremental heap canonicalization algorithm that is used in the computation of an incremental hash and which reduces the state space explored through detecting heap symmetries.
- Can social bookmarking improve web search?, Heymann, P., Koutrika, G., & Garcia-Molina, H. (2008, February). In Proceedings of the 2008 International Conference on Web Search and Data Mining (pp. 195-206). ACM. This article gathers the most extensive dataset from a social bookmarking site which is analyzed by academic researchers to answer a question of whether the data can be utilized in augmenting systems like web search.
- How to provide security for web service (SOAP), Xue, W. (2004). This study majors on WSS:SMS and uses an overview with a detailed example to provide the reader with an excellent idea of how it fixes the security challenges and its capabilities together with its processing rules.
Web Crawling, Patel, J., & Jethva, H. (2015). Web Crawling. This article presents search engines, the various types of crawling techniques, the crawler architecture, the different algorithms of crawlers, the numerous issues affecting crawlers and the types of crawlers.
Canonicalization of database records using adaptive similarity measures, Culotta, A., Wick, M., Hall, R., Marzilli, M., & McCallum, A. (2007, August). In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 201-209). ACM. This paper attempts to use edit distance measures to construct a canonical representation that is identical to each of the different records to reduce the impact created by noisy records on the canonical representation.
Improving access to web archives through innovative analysis of PDF content, Phillips, M., & Murray, K. (2013, January). In Archiving Conference (Vol. 2013, No. 1, pp. 186-192). Society for Imaging Science and Technology. This article explains the overall workflow and describes the tools of extracting document features to search for opportunities for the development of retrieval tools that might present new ways of content selection and building collections from big web archives.
URL Mining Using Web Crawler in Online Based Content Retrieval, VijayaLakshmi, M., Senthil, M., & Kumar, P. (2014). This paper introduces a supervised web-scale forum crawler which crawls related content from the web minimal overhead and also detects duplicate links.
Evaluation of crawling policies for a web-repository crawler, McCown, F., & Nelson, M. L. (2006, August). In Proceedings of the seventeenth conference on Hypertext and hypermedia(pp. 157-168). ACM. This article introduces a web crawler that is effective in website reconstruction when backups are not available and which also retrieves web resources from the Internet Archive, Google, Yahoo, and MSN.