XML Sitemap, Robots.txt, and Hreflang Tags

by Brian Toomey, JB Analytics CEO

Sitemap & Robots.txt

What are sitemaps and robots.txt?

Two key tools for helping search engines understand and properly index your website are an XML sitemap file and Robots.txt file.

  • XML sitemap: An XML document that shows search engines the overall structure and inter-relationship of content on your site.
  • Robots.txt: File declaring what should be excluded from indexing. 

Why they matter

In combination both files should provide a comprehensive and accurate picture of what content you want a search engine to index and how it is organized.

What to do

In short: Use a sitemap that lists what you want indexed, and doesn’t list what you don’t want indexed. List folders or files you don’t want indexed on your robots.txt file as well.

Additional technical items:

  • Content: We recommend listing all and only resolvable HTML pages, excluding parameterized content and archive files.  
    • Date-based archive pages such as the below have little value for organic search
      • https://www.yoursite.com/2010/05/01/
      • https://www.yoursite.com/2010/05/07/
      • https://www.yoursite.com/2010/05/14/
  • Hosting: Hold sitemaps on a verified domain. 
  • Size: Individual sitemaps should be 50MB or less when uncompressed and each should hold no more than 50,000 individual URLs.
  • Placement: The sitemap should either be placed at /sitemap.xml or the location signposted from the robots.txt file. 

Multiple sitemaps can be listed in a sitemap index file (at /sitemap_index.xml) for easy parsing by search engines. 

JB Analytics combines a passion for data and analytics to drive improvements in Pay Per Click performance. They are friendly, results driven, and we recommend them wholeheartedly.

Localized content & Hreflang tags

What it is

Hreflang tags show the relationships between pages on the same topic for different regions or languages.

The example above shows search engines where equivalent homepages are in different languages. 

  • On the German language homepage 
  • <link rel=”alternate” href=”https://yoursite.com/” hreflang=”en-us” />
  • On the US English language homepage
  • <link rel=”alternate” href=”https://yoursite.com/” hreflang=”de-de” />

Hreflang tags can also be used across different domains. For example:

  • On the German language homepage 
  • <link rel=”alternate” href=”https://yoursite.com/” hreflang=”en-us” />
  • On the US English language homepage
  • <link rel=”alternate” href=”https://yoursite.de/” hreflang=”de-de” />

Why it matters

Setting hreflang tags helps search engines serve the right content to users in the right place. It also helps search engines understand what might look like near duplicate content that is in fact aimed at audiences in different locations. 

What to do

  • When hosting content in more than one language, use hreflang tags to denote the relationship between pages across domains or within a single domain.


 

Happier Users & More Traffic

We’re passionate about delivering value through design, data and development. Let’s talk!