Configuring Heartbeat parameters under Virtual Server

Configuring Heartbeat parameters under Virtual Server

  • Comments 4

Virtual Server provides a service where a user can be notified if a virtual machine is not responding.  This is called the virtual machine heartbeat.  There are two situations under which a virtual machine might not send its heartbeat.  One is because the virtual machine has crashed - and no programs are running any longer.  The other is because another program on the virtual machine may be using all of the CPU resources and not leaving enough CPU time for our code to be able to send a heartbeat message.

Because of these two scenarios we are cautious about telling the user that the virtual machine has stopped sending heartbeats.

By default we will send one heartbeat every 6 seconds.  If we miss a heartbeat we will continue to send one heartbeat every 10 seconds.  We will then only declare the virtual machine 'dead' if we have not received a heartbeat in 120 seconds.  Depending on your virtual machine configuration - you may wish to change these parameters to make us more or less sensitive to the state of the virtual machine heartbeat.  You can do this by editing the virtual machines .VMC file and finding the following section:

<integration>
   <microsoft>
      ...
      <heartbeat>
          <failure_attempts type="integer">12</failure_attempts>
          <failure_interval type="integer">10</failure_interval>
          <rate type="integer">10</rate>
          <time type="integer">60</time>
      </heartbeat>
      ...
   </microsoft>
</integration>

Failure_attemps specifies how many heartbeats should be missed before we fire the 'heartbeat stopped' event.  Failure_interval specifies how long (in seconds) we should wait between heartbeats once and initial failure has been detected.  Time specifies the standard interval (in seconds) in which to sample heartbeats - and Rate specifies how many heartbeats should be received in the interval defined by Time.

Cheers,
Ben

Leave a Comment
  • Please add 3 and 1 and type the answer here:
  • Post
  • > By default we will send one heartbeat every
    > 6 seconds.

    Do you mean 60?
  • No - we do it 10 times every 60 seconds - so once every 6 seconds.

    Cheers,
    Ben
  • Then this part of the XML file:
    <rate type="integer">10</rate>
    <time type="integer">60</time>
    means 10 times in 60 seconds, so that part is consistent.

    But then I'm confused about the other part of your original posting.  If you think you might be missing a heartbeat, then you drop back to check less frequently instead of checking more frequently?  Of course if the guest is really overloaded then it's going to be late in replying, but I still don't see the purpose in reducing the attempts to contact.

    It's not like the original version of Ethernet, or wireless LANs and other stuff like that, where there are other senders competing with you and you have to agree that every contending sender will have to drop back.
  • Hi Norman,

    Yes - the default configuration is to slow down the heartbeat when we detect a failure.  This is to give the virtual machine the best chance of not being reported 'dead' by mistake.

    Cheers,
    Ben
Page 1 of 1 (4 items)