Thursday, January 1, 2009

XML Sitemaps

An XML sitemap is an indication to search engine crawlers of the pages/files that your site contains. It was developed by Google and gained acceptance from MSN and Yahoo.

The following is a sample sitemap
<?xml version="1.0" encoding="UTF-8" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.mysite.com/file1.html</loc>
<lastmod>2008-12-30</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.mysite.com/file2.html</loc>
<lastmod>2008-12-30</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>

The sitemap XML file contains a urlset root node with the namespace specified as http://www.sitemaps.org/schemas/sitemap/0.9 . The urlset contains multiple url nodes for each entry containing at least the loc node specifying the URL of the resource. Optional child nodes for the url node are lastmod (in the W3C Datetime format; use YYYY-MM-DD to exclude the time) to specify the date of modification of the file, changefreq (valid values are always, hourly, daily, weekly, monthly, yearly and never) to specify the expected frequency of the page, and priority (with a value from 0.0 to 1.0) to specify crawling priority for the site.

You can get a sitemap file either generated or coded by-hand. Among various sites that offer sitemap generation services is XML-Sitemaps.com

Although, initially, the sitemap had to be stored at the root of the site, you can now specify in the robots.txt file the location of the sitemap. Each sitemap file can only reference one domain.

No comments: