XML sitemaps for SEO
An XML sitemap serves as a blueprint to a website. It's simple to generate and can be easily submitted to search engines through things like Google Webmaster Tools. While on the surface the concept of a sitemap is fairly straightforward, how to use your sitemap completely can be more complex. A sitemap is not just a blueprint to a search engine, it's also a blueprint to you. A sitemap should list the exact pages in your website, no more no less and the pages within your website should match the XML sitemap pages exactly. If you have old URL's / pages which are no longer linked within your site but are still indexed on Google or similar you should include these. Anything that is indexed or you want indexed within a search engines should be part of your sitemap. When you combine a sitemap with other SEO concepts you can avoid duplicate content problems.
XML sitemap structure
<?xml version="1.0" encoding="UTF-8"?>
// another page
A basic web page based sitemap has the following structure. If you are familiar with HTML you will see clearly it has the same nested tag structure as a HTML page. The first XML tag lists the XML version and the document encoding. The group of URL’s within the sitemap will be contained within a <urlset> tag. When the urlset tag is opened we attach the xmlns attribute to specific the sitemap Schema. Each individual page has a <url> tag which wraps the page information. You can use just a <loc> tag to specify the page URL and nothing else if you want. The other things like priority are optional.
Multiple XML sitemaps
Your sitemap is limited to 10,000 URLs and if you need to include more you must split it across multiple pages. If you ever come cross this you may need to break the sitemap up by specific categories and actually build a few different sitemaps. This has the added benefit of allowing you to see the count of specific pages for each type of category within your site and work out any inconsistencies with pages listed that should not be.
Sitemap change frequency
Change frequency may give indication of when pages should be crawled within a website. You can use various values like daily, weekly, monthly etc. how much these values influence crawling could be debatable. Search algorithms will undoubtedly use their own calculations to work out how content changes within a website. This is one of the core features of how a search engines calculates how pages rank. It's probably worth including this information within your sitemap but as always make sure it's accurate. Setting everything to hourly won't make a search engine re-crawl your website every hour but you may benefit somewhat if you keep sitemap change frequency accurate.
Sitemap modified time
It always good to give an accurate modified time for each page within your sitemap. Don’t go down the route of marking everything as modified today or similar. The modified time should be calculated from the HTML page / template file. If your site is dynamic and works off a database use the last time the database record was modified. Things like WordPress have plugins for building sitemaps but from my experience can be very hit and miss when it comes to accurately generating pages. PHP makes finding the last modified time of a file very easy using filemtime. This function returns a UNIX timestamp which can be converted to the data format required by an XML sitemap.
$modified = date("Y-m-d", filemtime($filename));
You can also include the hours, minutes and seconds a page was last modified since but unless you are updating your website with huge amounts of content every single day and you need it indexed extremely quickly you probably don’t need to include it.
Sitemap priority is a very simple way to give an indication of the relationships between your site pages. Starting to calculate sitemap priority can be very easy. Priority should range from 1.0 to 0 with 1.0 being the most important and pages lower down in your website getting lower priorities. If you give false priorities in your sitemap you won't be doing your website any good. There is no point just listing everything as 1.0 priority. You can give priorities to 2 decimal places and mark things like 0.95 as a priority. Your website home page should generally be the only page with 1.0 priority. It's your home page and everything else should branch off it in a tree structure. If we take for example an ecommerce store that sells products we may have this simple structure and priority.
- Home page – 1.0
- Product category pages – 0.9
- Individual product pages – 0.8
In this structure we may have various categories of products branching off our home page. These categories could be divided by product brand etc. If we have pages with products divided this way we will usually have a link to each individual product on these pages. The flow of the website will bring you through these pages and the priority of each should reflect their position in your site tree.
Be very careful when you’re using special characters in your sitemap URL's. There are certain characters that should be encoded. A big one you will come across if you use a lot of URL parameters is ampersands (&). This characters will usually form part of the URL when using URL parameters and should be encoded.
Counting your sitemap pages
If you analyse your sitemap and look at the page count, you can use this count to determine exactly what search engines have indexed. Google Webmaster Tools can also give you a count of sitemap pages vs indexed pages. If you see that a search engine has far more or less pages indexed that the count of the pages in your sitemap, you should try to determine exactly what and where these pages are. If you run Google Chrome this handy command can be type in the URL bar to return the SERP pages indexed on Google for a given site.
site::ddmseo.com // shows all pages indexed on google for ddmseo.com