Your site on the Web – How sitemaps effect SEO
Robots.txt and Sitemap.xml
After analyzing the domain name, general design, and URL format, I look at potential client’s robots.txt and sitemap. This is helpful because it starts to give you an idea of how much (or little) the developers of the site cared about SEO. A robots.txt file is a very basic
step webmasters can take to work with search engines. The text file, which
should be located in the root directory of the website (http://www.example.com/robots.txt), is based on an informal protocol that is used for telling search engines what directories and files they are allowed and disallowed from accessing. The inclusion of this file gives you a rough hint of whether or not the developers of the given site made SEO a
Bit.ly is a very popular URL shortening service. Due to its connections
with Twitter.com, it is quickly becoming one of the most linked websites on
the Web. One reason for this is its flexibility. It has a feature where users
can pick their own URL. For example, when linking to my website I might
choose http://bit.ly/TimeStopping. Unfortunately, Bit.ly forgot to block
certain URLs, and someone was able to create a shortened URL for
http://bit.ly/robots.txt. This opened up the possibility for that person to
control how robots were allowed to crawl Bit.ly. Oops! This is a great
example of why knowing even the basics of SEO is essential for webbased
business owners. After taking a quick glance at the robots.txt file, SEO professionals tend
to look at the default location for a sitemap. (http://www.example.com/sitemap.xml). When I do this, I don’t spend a lot of time analyzing it (that comes later, if owners of that website become a client); instead, I skim through it to see if I can glean any information about
the setup of the site.
A lot of times, it will quickly show me if the website has information hierarchy issues. Specifically, I am looking for how the URLs relate to each other. A good example of information hierarchy would be http://www.example.com/mammal/dogs/english-springer-spaniel.html, whereas a bad example would be http://www.example.com/node?type=6&kind=7. Notice on the bad example that the search engines can’t extract any semantic value from the URL. The sitemap can give you a quick idea of the URL formation of the website. URLs like this one are a sign a website has information hierarchy issues because search engines can’t extract any semantic value from the URL.
When viewing a website from the 100-foot level, be sure to take the following actions:
Decide if the domain name is appropriate for the given site based on the criteria outlined above.
Based on your initial reaction, decide if the graphical design of the website is appropriate
Check for the common canonicalization errors
Check to see if a robots.txt exists and get an idea of how important SEO was to the website developers. If inclined, check to see if a sitemap.xml file exists, and if it does,
skim through it to get an idea of how the search engines might see
the hierarchy of the website.
If all else fails…. Keep making awesome, relevant posts and keep building those links into your sitemap! Penguin is a different beast altogether from Panda…