Generate a sitemap for your site
I have been ignoring the sitemap for a long time. It is nice and easy for the WordPress powered site since there is plugins for that but for other sites I have to write a script to do it. Finally I decided to add it.
Google has a nice sitemap generation tool here. It is written in Python but don’t be scared away if you don’t know Python. I don’t. And it took me about 2 hours of work to set it up (including other scripts that I had to do). Here is what I did:
1. Wrote a PHP script to generate multiple URL files. They are basically text files which have URLs in them. It is one of the ways to feed the Python script to generate the file sitemap.xml. There are other ways too, but this one is the most suitable choice for my situation. The “feed”, among others, can be configured through a config.xml file and the Python script will pick it up. So there is no need to mess with the Python code at all, at least at this point.
2. First I ran my own URL generation script, and then the Py fella. The script can be run from the command line and at the end it will automagically ping Google to notify the updated sitemap.xml. Oh, did I mention you have to have Python installed on your server? I really don’t see why it’s not there if it is one of the Unix cusines but it doesn’t hurt to verify it.
3. I manually submitted the sitemap.xml to Yahoo and Live search. If you haven’t signed up your site there you’ll need to login (assuming you have an account) and add your site.
4. After some staring at my Unix console I started thinking, “since the Python script can notify Google, why not the other two?” So I dig a little bit into the Python script. Although I didn’t know the language it didn’t take me long to find out the location of “NOTIFICATION_SITES”. It is an array that has the notification list of the sites. This is what you can change it to to make it ping Yahoo and MSN/Live as well (while they are still two companies).
NOTIFICATION_SITES = [
('http', 'www.google.com', 'webmasters/sitemaps/ping', {}, '', 'sitemap'),
('http', 'webmaster.live.com', 'ping.aspx', {}, '', 'siteMap'),
('http', 'search.yahooapis.com', 'SiteExplorerService/V1/ping', {}, '', 'sitemap')
]
5. Combine these command lines into a shell script and stick it into the cron.
So there you have it. Now I am just waiting to see the floor of traffic that will be brought in by using a sitemap.