Welcome to MSDN Blogs Sign in | Join | Help

A few months ago we announced two new features to MSNBot to reduce the burden of crawling on your website. These were part of a series of improvements we’re making to our crawler during the Spring to increase the freshness and breadth of content in our index. As part of these latest improvements, you may notice an increase in the amount of traffic from MSNBot starting over the next couple weeks. If you notice any issues with MSNBot, please make sure to drop us a note on our Crawling Feedback & Discussion Forum so we can investigate.

This is a great time to take a look at your robots.txt file (and meta tags) to make sure that you are not inadvertently blocking robots from content on your site you may want indexed. Also, if you feel that MSNBot is crawling your site too frequently, you can use the crawl delay directive in robots.txt. Please refer to the MSNBot support page for more information. Here are a few recommended settings:

Slow (wait 5 seconds between each request)

Crawl-delay: 5

Really Slow (wait 10 seconds between each request)

Crawl-delay:  10

Note that setting the crawl delay reduces the load on your servers, but it also increases the amount of time it will take MSNBot to index your website (proportional to the length of the delay), and possibly make it more difficult for your customers to find your site on Live Search.

Another great way to reduce the impact of MSNBot on your website is to enable HTTP Conditional GET and HTTP Compression as outlined in our prior blog post.

-- Nathan Buggia, Live Search Webmaster Center

8 Comments
Filed under:

Webmasters, you may already have heard, but our friends on the adCenter team launched a new community portal today in an effort to assist and enable advertisers. We want to encourage you, whether you are currently using adCenter or not, to visit.  Features of the site include:

  • Product/Service specific blogs
  • Categorized User Forums
  • Multimedia Distribution including video interviews, audio podcasts and training videos
  • User profiles

The community team at adCenter wants the community  to be a place for two way communication between Microsoft adCenter and the advertiser community. If you are an adCenter Advertiser, using adCenter Analytics or developing through the adCenter API, www.adCenterCommunity.com will be the one place to visit for all adCenter updates, news, tips, tricks and best practices.

In the coming weeks and months, you'll get information, updates and assistance from Microsoft employees, but also from each other as some of the most knowledgeable users of adCenter's offerings and services are the customers!  Be sure to check out the user forums as they grow into a robust resource on everything adCenter and more. Enjoy!

-- Jeremiah Andrick, Live Search Webmaster Team

Today we're pleased to announce an update to the Sitemaps Protocol, in collaboration with Google, and Yahoo! This update should help many new sites adopt the protocol by increasing our flexibility on where sitemaps are hosted.

Essentially, the change allows a webmaster to store their sitemap files just about anywhere, using a reference in the Robots.txt file to establish a trusted relationship between the sitemap file and the domain or folder.

Here's how it works: Say you run a web site like MSN.com, which has a bunch of sub domains like health.msn.com, travel.msn.com and moneycentral.msn.com. And, due to a technical requirement, you would like to host all of your sitemaps in one location like sitemaps.msn.com. Until now the protocol did not support this scenario, each sitemap would have needed to be hosted directly under the domain it described. This update now introduces support for this scenario, with the requirement that you simply include a reference to the sitemap in your Robots.txt file. For example, moneycentral.msn.com/robots.txt would need to include this line:

Sitemap: http://sitemaps.msn.com/index_moneycentral.msn.com.xml

The catch is that all the URLs in the sitemap file all need to be within the same domain as the robots.txt file (i.e. moneycentral.msn.com/* in this example). Note that this applies equally for sitemap index files and for compressed files.

Here are a few other useful notes about our implementation:

  • We support multiple "Sitemap:" references in your robots.txt files
  • We recommend you limit the size of your robots.txt file to less than 1 MB
  • If multiple sitemaps for a domain include the same URL with conflicting metadata (i.e. priority, change frequency, etc), we will disregard the metadata and just look at the URL.
  • Individual sitemap files should never be larger than 10 MB when uncompressed. This includes all sitemap file formats: XML, RSS and Text.
  • You can upload your sitemap in our Webmaster Tools
  • You can ping us with updates to your sitemap using our Ping URL:
    http://webmaster.live.com/ping.aspx?siteMap=[Your sitemap URL]

This change comes directly from feedback we received from webmasters, thank you for helping us improve our product! If you have any additional feedback or questions, please check out our Sitemap Discussion forum.

-- Fabrice Canel, Program Manager, Live Search Crawler

5 Comments
Filed under: ,

In just hours, we will be heading to sunny Santa Clara California.  If you're going to be attending SMX West be sure to come see us at one of the Live Search or adCenter sessions, booths or  at the Search Bowl.  Both Nathan and myself along with lots of other folks from Redmond will be making the rounds and enjoying a little bit of sunshine (hopefully!).  

SMX West Keynote Panel

  • Brad Goldberg, Keynote
    Generation Next: Search In The Coming Decade

Search Session Speakers & Panelists

  • Raju Malhotra (Panel Speaker)
    Search 3.0: The Blended Search Revolution
  • Henry Hall (Q&A Speaker)
    Search 3.0: Video, Images & Blended Results
  • Kevin Hagwell (Q&A Speaker)
    Search 3.0: Local Search & Blended Results
  • Paul Dillon (Q&A Speaker)
    Search 3.0: Online Retail & Blended Results
  • Sean Lyndersay (Panel Speaker)
    Search 4.0: Will The Social Graph Change Search?
  • Nathan Buggia (Q&A Speaker)
    SEO 2.0 For Web 2.0 Sites
    Search Engineers Q&A
    Linking Q & A

adCenter Sessions

  • Mary Berk (Q&A Speaker) 
    Decrypting Quality Scores
  • Natala Menezes (Panel Speaker)
    Search Ads & Behavioral Targeting
  • Christopher Plambeck (Panel Speaker)
    Paid Search Roundtable

Hope to see you there! 

-- Jeremiah Andrick, Live Search Webmaster Team

Today we're pleased to announce several improvements in the crawler for Live Search that should significantly improve the efficiency with which we crawl and index your web sites. We are always looking for ways to help webmasters, and we hope these features take us a few more steps in the right direction.

  • HTTP Compression: HTTP compression allows faster transmission time by compressing static files and application responses, reducing network load between your servers and our crawler. We support the most common compression methods: gzip and deflate as defined by RFC 2616 (see sections 14.11 and 14.39). Compression is currently supported by all major browsers and search engines. Use this online tool to check your server for HTTP compression support.

    The following links provide configuration information for IIS, and Apache.

  • Conditional Get: We support conditional get as defined by RFC 2616 (Section 14.25), generally we will not download the page unless it has changed since the last time we crawled it. As per the standard, our crawler will include the "If-Modified-Since" header & time of last download in the GET request and when available, our crawler will include the "If-None-Match" header and the ETag value in the GET request. If the content hasn't changed the web server will respond with a 304 HTTP response.

    To check if your site already supports the "If-Modified-Since" HTTP header, you can use this online tool to check your server for HTTP Conditional Get support. Alternatively, you can check using Fiddler for Internet Explorer, or Live Headers for Firefox. Each of these tools allows you to create a custom GET request and send it to your server. You'll want to make sure that your request includes the "If-Modified-Since" header like the following simplified sample:

    GET /sa/3_12_0_163076/webmaster/webmaster_layout.css HTTP/1.1
    Host: webmaster.live.com
    If-Modified-Since: Tue, 22 Jan 2008 01:28:49 GMT

    You should receive a server response similar to the following simplified sample:

    HTTP/1.x 304 Not Modified

    Check out MSDN for more information on using Fiddler for performance tuning.

    If you have not yet configured conditional get on your site, we would strongly encourage you to do so, as it can significantly help reduce server load as most browsers and crawlers already support this feature (e.g. IIS, Apache).

In addition to these two features there are many more improvements in performance that should help further optimize our crawling. As a result, we've also upgraded our user agent to reflect the changes, it is now "msnbot/1.1". If you think you are experiencing any issues with MSNbot, or have any questions about the updates, please use our Crawler Feedback & Discussion form.

-- Fabrice Canel, Live Search Crawling Team

That's a good question, and you've come to the right place.  If your site is not performing in Live Search or you are not being indexed by Live Search here are some steps you can take to change course and improve your rank.  First and foremost at Live Search ranking is free and you can't pay to boost your website’s relevance ranking.  We have a completely automated ranking process which takes into account a lot of different factors. These factors include web page content, the number and quality of websites that link to your pages, and the relevance of your website’s content to query terms.  So let’s work through the possible issues and what you can do…

1. Have you built great content?

I’m sure you hear this all the time, but this is always the first item you should consider when thinking about SEO, because it is the primary influencer of all the other factors. Quality content has a long shelf life, and will accrue many quality backlinks over its lifetime.

Top factors in creating great content

  • Make it unique – make sure your content gets noticed by ensuring that it isn’t just one of a million similar articles. Find a way to make it different by focusing on a different subject, taking a different perspective, or making it entertaining.

  • Use the customer’s language – many times customers will use different words and phrases than you to describe your product or what they are looking for. For example, for a long time Microsoft’s official web site for Visual Basic did not rank well for the term VB, because our internal branding guidelines required that we always refer to the product by its full name. So customers were searching for “VB”, and none of our pages used that term.

  • Know what keywords to use – make sure that you use keywords that are important to your company or site within your pages. For example, if your business is located in a specific town, make sure you include the name of that town in your site, along with common words that describe what you do. Be careful not to get carried away using too many keywords.

  • Write good HTML – as you write content, make sure that you are using HMTL tags appropriately, so that search engines can more easily understand your content. For example, make sure your important keywords show up in title tags, header tags and anchor text. And put descriptive text in the alt tags on your images.

Articles on building great content

Examples of great content

  • Digital Photo Review – they seem to know more about every digital camera than even the original manufactures of those cameras. And they have all those hi-res photos to make the shutterbugs drool…
  • PinchMySalt.com – a site that is notorious for tasty recipes and beautiful photographs.
  • BentoYum.com – Have a great product you want to sell? Why not create a blog showing potential customers all the different things they could do with it?
  • Amazon.com – they do a great job supplementing “the same old product descriptions” with some of the best user generated content on the web.

Creating a lot of good content is hard and takes a lot of time. But, you don’t have to do it all at once. The first thing I would recommend is to look around your office/ business and see if you don’t already have some great content lying around that you could put on the web. Do it. Then look for ways that you can incorporate building good content into your existing business routines.

2. Do you have 10 high quality sites linking to you?

Okay, so 10 isn’t really a magic number, but the more links you have from high quality, related, websites the better your site is going to be indexed and ranked by Live Search. Use our Webmaster Tools to see who’s linking to your site.

Ideas for generating high quality inbound links

  • Start a blog – and write content that will give people a reason to link to your website

  • Join a reputable industry association – often times they will list their members and provide links to their websites. Examples could be a local rotary club, or a professional association like the American Medical Association.

  • Get involved with your community – participating with your community through blogs, forums and other online resources may give you legitimate reasons to provide links to your site.

  • Talk to a reporter – is there a potential story around your business? Or do you have helpful tips about your business people might be interested in? Pitch a story to a reporter or journalist, and they might give you a link.

  • Press Releases – if your company has a significant event, consider doing a press release through a site like http://prweb.com.

  • Suppliers and partners – ask your business partners if they would add a section to their website describing your partnership, with a link to your website. Or, if your suppliers have a website, perhaps they have a page where they recommend local distributers of their products.

  • Evangelize your site in the real world – with business cards, magnets, USB keys and other fun collectables

The process of building up these high quality links can take time, and we hear from many webmasters who have tried to speed things up by purchasing links, or participating in linking schemes. We recommend webmaster be very careful with these, as they can often end up hurting your ranking in the long run by providing you with only low quality links that are often associated with spammy sites. If a link isn’t adding significant value to a website’s user, than it is most likely a low quality link.

3. Could your website be too advanced for a robot to understand?

Okay, so you’ve got great content, and tons of backlinks but you’re still not getting the results you’re looking for? It is possible that Live Search is having technical problems crawling your website. Here are a few common issues you should investigate:

  • Heavy use of Flash, AJAX, Images or Silverlight – if you’re using any of these Rich Internet Application (RIA) technologies extensively on your website, then your site may be too advanced for a robot to understand. We recommend building the structure and content of your site in HTML, and then using these RIA technologies to spice up the user experience. That way you get cutting edge, web 2.0 user experiences, and search engines can still crawl your site and send you lots of traffic.

  • JavaScript navigation – any URL that is constructed using JavaScript will not be visible to a search engine, so those pages might not get indexed. (Note: that ASP.Net does this extensively with their postback infrastructure. Use that feature sparingly.)

  • Robots.txt file – a surprisingly high percentage of Robots.txt files are misconfigured to inadvertently block Live Search or other search engines from crawling their sites. You can use our Robots.txt Validation tool inside our Webmaster Center to check your file and see if it is okay.

  • Frames – search engines can sometimes have a difficult time understanding and crawling frames in HTML. We recommend that you not use them.

A good way to test for this is to look at your website with Flash, Silverlight, JavaScript and Images turned off. This is how robots view your website, and it is a good bet if you find it difficult to navigate your website under these conditions, than so will each search engine’s robots. A good way to test this is to use the developer toolbar in Firefox, and turn all of those options off before you surf your website.

4. Have you submitted a sitemap?

Sitemaps help us ensure that we’ve discovered all the pages on your website. This is especially important if you have a new website that might not have a lot of other sites linking to it yet. You can read more about sitemaps at http://sitemaps.org.  

5. Still no luck, now what?!?

The last question you’ll need to ask yourself is if you might have been using any "aggressive" marketing tactics (aka Spamming). The best way to check is to log into the Webmaster Tools, verify your site, and check your dashboard to see if we’re blocking any pages in your website. If so, after you have addressed the issue, you can use the form to request reinclusion into the Live Search index.

If you’ve been through all 5 steps, and everything looks good (except your results on Live Search) then you should contact us on the Webmaster Center Forums, or if it is a sensitive issue, use our private Feedback form. We’ll do our best to research and resolve your issue in a timely manner. But to set expectations, there are tens of millions of websites and only 3 of us at the moment – so we may not be able to reply to every request as quickly as you would like.

And to all you experienced SEO professionals out there, please leave your favorite tips and examples of good content in the comments below for our readers!

-- Jeremiah Andrick, Live Search Webmaster Team

13 Comments
Filed under:

Some of you may have noticed Google advertisements showing up in Live Search results today. We have identified and resolved the issue, and you should see these disappear over the next couple days.

google-ads

The issue stems from the way Live Search handles content disallowed by the Robots.txt file. We regularly check the robots.txt file of a site to ensure that we don't index and cache pages excluded by the webmaster. However, if we do find a link elsewhere on the web pointing to a page excluded by the robots.txt file, we may include the link and the anchor text in our index if we think it might be valuable to our users. Yesterday we accidently began including the links from the ads of Google AdSense customers. The issue has been fixed, and you should see the results disappear from our search results over the next couple days.

We'd like to thank Search Engine Land and several customers for contacting us earlier today, your feedback is much appreciated and helped us quickly identify and resolve this issue.

-- Nathan Buggia, Live Search Webmaster Center

Since the inception of the Live Search team a few years ago, we've been maniacally focused on one thing: relevance. We use the term relevance to mean that Live Search finds the answer to your question better than any other source - whether your question is best answered by a website, or a real-time traffic map. And with our recent Fall update our relevancy has improved dramatically - to the point where we think we've got the best product in some areas, and a highly competitive product in others. (Try it out, I dare youimages, maps, mobile, web, 1-800-225-5411).

One of the biggest challenges with relevancy is how to distinguish legitimate information from various forms of search spam. This is one area that we've made especially good progress in over the last 8 months through a suite of tools that helps us detect, evaluate and manage spam. One of these tools is an extension to MSNBot, giving us an additional way to detect cloaking. (It should be noted that not all cloaking is spam related and we do our best to take this into account, however, we still don't recommend cloaking in any situation).

The goal of the tool was simple, however there have been some well-documented short comings in our implementation that have impacted the reporting metrics of some websites. We have been listening to the feedback over the past couple months and continuing to optimize the tool to eliminate these issues:

  • AdSense/Overture reporting - Initially there was a bug in our crawler that caused it to download all content on your page, including ad blocks. We have since fixed this issue by blocking requests to Google and Overture to preserve the integrity of your reporting.

  • Distort site statistics with unfilterable bot traffic - Webmasters have also reported a high level of traffic coming from this bot, in some cases high enough to impact their logs in a statistically significant way. We have been continuing to optimize the crawler and most webmasters should notice the referrer traffic dropping to almost nothing over the next month.

  • Pollute HTTP logs with inappropriate terms - Another unfortunate issue is that we were using a common list of keywords for our testing that was not site specific. We have tuned this list and you should no longer see any keywords used that are not related to the content of your site.

  • Microsoft isn't responding to questions -  Webmasters who encountered these problems and reported them to Microsoft have not been able to get a satisfactory or timely response. We have created a forum specifically to answer your questions and comments. For sensitive issues, please use our feedback form to contact us privately.

Hopefully webmasters have also noticed these issues disappearing. If you are still experiencing any issues, please contact us before you block MSNBot, to see if we can address the issue.

We're inspired by the relevancy improvements we've made this Fall, and have much more in the works for Spring. Please keep us in mind as you do your searching, and let us know how you think we can make search better.

Thank you,

-- Nathan Buggia, Live Search Webmaster Team

13 Comments
Filed under:

If you're going to be attending either SES Chicago or Webmaster World Las Vegas, then stop by and check out one of the Live Search/ adCenter sessions, booths or exciting social events. I will be at the freezing cold (but intellectually stimulating) SES Chicago, while Jeremiah (PM Webmaster Tools) and Martina (Dev Manager Webmaster Tools) will be basking in the warm glow of the Las Vegas slot machines (aka, the one arm bandits).

Search Engine Strategies

  • Brad Goldberg, Orion Panel: Universal, Blended and Vertical Search
  • Brian Boland, Are Paid Links Evil?
  • Ziya Genceren, Online Maps: Plotting the Direction of Local Search
  • James Colborn, Personalization, User Data & Search
  • Christopher Plambeck, Search Engines on Click Fraud
  • Booth: adCenter / Live Search Webmaster Center

Webmaster World 2007

 Hope to see you there!

-- Nathan Buggia, Live Search Webmaster Team

We are glad you found your way to this blog. We, the Live Search Webmaster Team, have created this blog to help keep you informed on progress as we develop tools to assist you as webmasters. We will also be sharing from time to time on how to keep your site performing well.

We want this blog to be a place for two way communication between Live Search and the webmaster community because we understand that SEO’s and webmasters need this kind of information and the tools we are building to keep their sites performing well.

We know we have a lot of work still to do and we would like to get your feedback on the tools as well as suggestions for future tools.

If you are passionate about your site or you just want to find out what progress we are making on tools and statistics sign up for our RSS feed. So welcome and we look forward to hearing from you soon!

And don’t forget sign up for the Live Search Webmaster Center today!

Jeremiah Andrick, Program Manager – Live Webmaster Team

If you’ll be attending the inaugural Search Marketing Expo in London November 15-16, stop by and meet the meet Live Search Webmaster Team.  Nathan and Martina will be at the Live Search booth in the exhibit hall to meet you, answer your questions and to help you sign up to the webmaster center.

Also Nathan will be speaking at a panel Friday on Dealing with the Penalty Box.  Along with Nathan, the panel also includes a number of other really great industry influencers.  The panel will focus on what to do when you’ve had a site hit the search engine penalty box. Nathan will be there to talk through how to avoid the penalty box, along with how to be re-included.

The session should have time for lots of questions and we hope you will join the discussion on penalties and how communication with us might be improved. We hope you can make it to that event, but for those that can’t we will post some of the discussion and feedback following the event.

If you will be at SMX London, we hope you’ll make a point to stop by.

Jeremiah Andrick, Program Manager – Live Webmaster Team

3 Comments
Filed under: , ,

Today we’re happy to take the “private” label off our Webmaster Tools beta, and open it up to all webmasters and SEO professionals. This is a first step by Live Search to make it easier to ensure your site is getting the best crawling, indexing, and representation in Live Search results possible. Our goal is that it will lead to better traffic for your site, and better search results for our customers.

In addition to the tools, we’re also launching:

  • Webmaster Blog – to let you know what’s going on and provide you with the technical information you’re looking for from Live.
  • Live Webmaster Center – a convenient portal, consolidating all of Microsoft’s information and services for webmasters in one easy-to-remember URL: http://webmaster.live.com

So, what should you do with all of this news? Easy:

  1. Make sure there aren’t any problems with MSNBot and your site (if there are let us know)sign into the tools, verify your site and ensure that Live Search is indexing your pages, we haven’t identified any of your content as SPAM, and there are not any issues with your Robots.txt file. If any pages have been blocked for being spammy, fix the issue, then follow the instructions to complete a re-inclusion request.
  2. Let us know about your sitemap to get the best crawl – enter the URL of your sitemap file, or sitemap index file, into the tool so you know we have access to all of your sites content. This will help us find all that really deep content on your site that might not be well linked.
  3. Add our blog to your RSS reader – we have lots more features in the works, so add our blog to your RSS reader so you know what’s been shipped.

Nathan Buggia, Lead Program Manager – Live Webmaster Team

 
Page view tracker