Mar 06

We recently had a customer who had some old .asp file names in his website. The code didnt actually have asp coding , but all had extesions named .asp. Because he also had some flash stuff, we decided we didnt want to change all his filenames to .html.

So within cpanel you can add new Mime Types. This will allow apache to parse the .asp files as normal text/html.

Within your control panel you can go down to the section called “Advanced”. Within that section there is a an icon called “MIME Types”, click the icon.
At this point there are 2 boxes to fill out.
The boxes should look like this
MIME type: text/html
Extensions: asp .asp

Once this is done your .asp files should parse as normal html.

Feb 27

For Google to index the pages of your website the Google crawler first needs to know how to find the web pages. The best way to tell Google how to find your pages is to submit a sitemap to Google. You will need to have shell access to your server and have python 2.2 or greater installed for this script to work. Start by downloading sitemap_gen-1.5.tar.gz from http://code.google.com/p/sitemap-generators/downloads/list. Next unzip and untar the file. Next cd to the directory created by the untar command. The directory name should be something like sitemap_gen_1.53. Once in the directory you will need to create the file yoursite_config.xml. Google gives you a sample file in this directory. Here is the config file I made..

<?xml version="1.0" encoding="UTF-8"?>
 <site
  base_url="http://www.mysite.com/"
  store_into="/home/me/public_html/sitemap.xml"
  verbose="1"
  sitemap_type="web"
 >
   <directory  
   path="/home/me/public_html"
   url="http://www.mysite.com" 
   default_file="index.html"/>

<filter action="drop" type="wildcard" pattern="*/TEST/*" />  
<filter action="drop" type="wildcard" pattern="*/backup/*" />  
<filter action="drop" type="wildcard" pattern="*/.*" />  
<filter action="drop" type="wildcard" pattern="*/*.tar" />  
<filter action="drop" type="wildcard" pattern="*/blank/*" />  
 </site>

Notice I used the “filter action=drop” to overlook files I do not want to submit to Google. You can use regular expressions in the pattern matching here. Now lets run the script to make the sitemap.
python sitemap_gen.py --config=mysite_config.xml --testing
Now have a look at the sitemap.xml. It should be located in /home/me/public_html/sitemap.xml as we specified this in the config file. Review that you have all the pages listed in the sitemap that you would like to submit to Google. If you need to make changes in your config file make sure to rerun sitemap_gen.py and then review your sitemap.xml until you get everything correct. Notice we are running the sitemap_gen.py with –testing. Always use testing until you are ready to submit your sitemap to Google. Then run
python sitemap_gen.py --config=mysite_config.xml
This will submit your sitemap to Google.
You can also resubmit your sitemap using a http request to Google. Here is my http request to resubmit my sitemap to Google.
www.google.com/webmasters/tools/ping?sitemap=http://www.mysite.com/sitemap.xml
Before we submit the request we must url encode everythingafter the “?”. So my http request now looks like
www.google.com/webmasters/tools/ping?sitemap=http%3A%2F%2Fwww.mysite.com%2Fsitemap.xml
Now issue the http request with curl or wget.
wget http://www.google.com/webmasters/tools/ping?sitemap=http%3A%2F%2Fwww.mysite.com%2Fsitemap.xml
Lastly add your sitemap to your robots.txt file.
sitemap: http://www.example.com/sitemap.xml
You have now told Google how to find pages on your site that Google might now have normally found.