At PDC 2009 we have announced roadmap and details on release of Microsoft project codenamed "Velocity" under Windows Server AppFabric, to download latest bits and more details please check out following links.
On-Premises or in the Cloud, One Consistent Composite Application Experience for Developers
http://www.microsoft.com/presspass/features/2009/nov09/11-17PDCAppFabric.mspx?rss_fdn=Top%20Stories
Windows Server AppFabric
http://msdn.microsoft.com/en-us/windowsserver/ee695849.aspx
Thanks
Microsoft project codenamed "Velocity" Team
Windows PowerShell allows you to invoke C# code directly from the PowerShell scripts. This offers a powerful way to “script” cache operations such as puts/gets etc. This can be used to prime the cache as part of an automated setup or to perform other management tasks such as exploring the cache etc. This blog shows some examples on how you can invoke the "Velocity" APIs directly from PowerShell scripts. Note that I will use product and "Velocity" interchangeably .
Let us see how we can do it step by step in this write up. Windows PowerShell let's you create .NET objects, you can create object with new-object cmdlet like this
[System.Random]$rand =new-object System.Random
Before creating any object make sure that the assembly for that class is loaded by PowerShell.
You can check the assemblies loaded by PowerShell by running this on PowerShell.
[System.Threading.Thread]::GetDomain().GetAssemblies()
By default, PowerShell loads a bunch of assemblies for you so that you can start working with the base types, and some common functions. If you want to use the functionality in an assembly that is not already loaded, you will have to load that assembly yourself.
you can load by doing something like this on PowerShell prompt
[Reflection.Assembly]::LoadFrom("CacheBaseLibrary, Version=1.0.0.0, " `
+ "Culture=neutral, PublicKeyToken=b77a5c561934e089")
It will be good idea to use PowerShell-based cache administration tool which will be installed as part of product installation. Let us now see how we can create DataCacheFactory object in a PowerShell Script.
[Microsoft.Data.Caching.DataCacheServerEndpoint ]$server =new-object
Microsoft.Data.Caching.DataCacheServerEndpoint("machineName", 22233, "DistributedCacheService")
[ Microsoft.Data.Caching.DataCacheServerEndpoint[]]$serverarray = new-object Microsoft.Data.Caching.DataCacheServerEndpoint[] 1
$serverarray[0]=$server
[Microsoft.Data.Caching.DataCacheFactory ]$cacheFactory = new-object Microsoft.Data.Caching.DataCacheFactory($serverarray, $true, $false)
Once we have created this DataCachefactory we can use it as we do it in any C# client, what I mean is that you can call GetCache, GetDefaultCache etc. on this DataCacheFactory. In the following code
I am creating a cache with a random name and going the use it with the DataCacheFactory created above by calling GetCache on it. Creating a region and putting a key and value in it. At last a get
call is getting the value for key we inserted by calling get on cache we created, after that we are testing the value for its correctness.
[System.random]$rand =new-object System.random
$cacheName = $rand.next(100,134)
#Creating a Named cache in product cluster
New-Cache -CacheName $cacheName -NotExpirable true -Eviction LRU -TTL 1400 -Secondaries 0 -NotificationsEnabled true -Verbose
start-sleep -seconds 60
[Microsoft.Data.Caching.DataCache]$cache= $cacheFactory.GetCache($cacheName)
$regionName = $rand.next(22,127)
$cache.CreateRegion($regionName, $true)
start-sleep -milliseconds 15000
$key=$rand.next(45,292)
$value=$rand.next(83,120)
$cache.put($key, $value, $regionName)
$item = $cache.get($key, $regionName)
# Testing the value we got is what we put
if ($value -eq $item)
{
write-output("Test Passed")
}
In the above code snippet you can see that is helping you in many ways. Firstly you are able to mix cache administration cmdlets with the client code, secondly it is a script and you get all powers of script here. We will see how a Cache Administrator can solve some problems that he he/she may face.
1) Strating a product Cluster, Creating a Cache, Creating some regions and filling some data before giving it for other use.
# Starting Cluster
Start-CacheCluster
start-sleep -seconds 15
$errRegionAlreadyExists = [Microsoft.Data.Caching.DataCacheErrorCode]::RegionAlreadyExists
# Random Class I will use it to create random values for cachename, region name, keys and values
[System.random]$rand =new-object System.random
$cacheName = $rand.next(1,999)
#Creating a new cache using a cache administration cmdlet
write-output ($cacheName )
New-Cache -CacheName $cacheName -NotExpirable true -Eviction LRU -TTL 1400 -Secondaries 0 -NotificationsEnabled true -Verbose
start-sleep -seconds 60
# Product Exception handling block, this will handle all product exception in this script
trap [Microsoft.Data.Caching.DataCacheException]
{
write-error $("TRAPPED: " + $_.Exception.Message);
write-error $("TRAPPED: " + $_.Exception.ErrorCode)
if($_.Exception.ErrorCode -eq $errRegionAlreadyExists)
{
$regionName = $rand.next(1,200)
$cache.CreateRegion($regionName, $true)
}
if($_.Exception.ErrorCode -eq 12011)
{
exit
}
continue;
}
# Creating Cache Factory with one DataCacheServerEndpoint
[Microsoft.Data.Caching.DataCacheServerEndpoint]$server =
new-object Microsoft.Data.Caching.DataCacheServerEndpoint("machinename", 22233, "DistributedCacheService")
# This code Microsoft.Data.Caching.DataCacheServerEndpoint[] 1 creates a array of one
[Microsoft.Data.Caching.DataCacheServerEndpoint[]]$serverarray = new-object Microsoft.Data.Caching.DataCacheServerEndpoint[] 1
$serverarray[0]=$server
[Microsoft.Data.Caching.DataCacheFactory] $cacheFactory = new-object Microsoft.Data.Caching.DataCacheFactory($serverarray, $true, $false)
# This block will handle any kind of exception in this script if it not of product type
trap [Exception]
{
write-error $("TRAPPED: " + $_.Exception.GetType().FullName);
write-error $("TRAPPED: " + $_.Exception.Message);
write-error $("TRAPPED: " + $_.Exception.StackTrace);
write-error $("TRAPPED: " + $_.Exception.Source);
write-error $("TRAPPED: " + $_.Exception.TargetSite);
continue;
}
# Getting the cache object in here which we created earlier in this script.
[Microsoft.Data.Caching.DataCache]$cache= $cacheFactory.GetCache($cacheName)
$regionName = $rand.next(22,127)
# Creating region with a random name, which is in this case a integer
$cache.CreateRegion($regionName, $true)
start-sleep -milliseconds 15000
$key=$rand.next(4,292)
$value=$rand.next(8,120)
$cache.add($key, $value, $regionName)
# Putting data in the Region we created above
for ($i=1;$i -le 100;$i+=1)
{
$key=$rand.next(4,292)
$value=$rand.next(8,120)
$cache.put($key, $value, $regionName)
}
In the code above you can see how product exceptions being handled, you can also choose try-catch style in PowerShell Script.
Function TryCatch() {
&{#Try Block
throw (new-object Microsoft.Data.Caching.DataCacheException)
}
#Catch Block
trap [Microsoft.Data.Caching.DataCacheException] {
Write-Host " $_.Exception"
Write-Output "Failure!"
continue
}
}
References:
http://blogs.msdn.com/powershell/archive/2006/04/25/583273.aspx
http://www.microsoft.com/windowsserver2003/technologies/management/powershell/default.mspx
http://blogs.msdn.com/wriju/archive/2008/07/01/how-to-find-public-key-token-for-a-net-dll-or-assembly.aspx
Thanks,
Abhijeet Bhattacharya
(Microsoft project code named “Velocity” team)
How to control ETW sessions and collect traces
In this section I will describe how to work with an ETW sink. ETW stands for “Event Tracing for Windows”. It is a high speed tracing facility provided by the OS. It uses buffering and logging mechanism implemented in the OS kernel. The buffers used are written to the disk. The user can later retrieve these events in a human-readable format. Please follow these steps to subscribe to Microsoft project code named “Velocity” ETW sink.
· For the first time logging metadata has to be registered for the machine. ‘Logs’ folder (under the installation folder: Microsoft Distributed Cache\V1.0\Logs) contains the following files:
Ø Provider.man - manifest file for events for Microsoft Windows Vista and above.
Ø Provider.mof - manifest file for events for downlevel (Microsoft Windows XP, Microsoft Windows Server 2K3).
Ø ProviderUninstall.mof - manifest file to uninstall “Velocity” provider from WMI repository.
Ø ProviderGUID.txt - text file which contains the GUID for “Velocity” session provider and is used internally.
Moving/deleting these files can break the functionality and is not supported.
From command prompt:
Ø Downlevel users would have to do “mofcomp Provider.mof”
Ø Users having Microsoft Windows Vista and above would have to do “wevtutil im Provider.man” (this requires elevation).
· For an ETW sink, logs would go to ETW session and can be retrieved by command line utilities. Sessions can also be controlled in the same way.
· The utilities required are ‘tracelog.exe’ and ‘tracerpt.exe’. These are standard Windows utilities and are publicly available.
· Start collecting logs: first, an ETW session has to be started with a definite log level. Logs generated would be pumped to it. The log level can be changed dynamically.
tracelog -start <sessionName> -f <logFile> -guid ProviderGUID.txt -level <level>
o sessionName: name of the ETW session.
o logFile: file in which the logs would go to.
o level: the desired Log level.
§ 2 - Error
§ 3 - Warning
§ 4 - Information
§ 5 - Verbose
o To disable logging, log level should be given as 1.
· Change log level: it can be changed dynamically by
tracelog -enable <sessionName> -guid ProviderGUID.txt -level <level>
· Stop collecting logs: now, the ETW session needs to be stopped.
tracelog -stop <sessionName>
· Trace dump can be viewed by : tracerpt <logFile> -y
Ø Downlevel users would get a .csv file.
Ø Users having Microsoft Windows Vista would get an .xml file. To get a .csv file, the switch ‘-of CSV’ would have to be added. Command then would be “tracerpt <logFile> -of CSV -y”.
This was all about ETW sink on a cache host. However, if you want this functionality on the client side, you need to copy Provider.man, Provider.mof, ProviderUninstall.mof and ProviderGUID.txt from the ‘Logs’ folder (under the installation folder: Microsoft Distributed Cache\V1.0\Logs) onto the client machine before executing the steps mentioned above.
Thanks,
Amit Kumar Yadav
(Microsoft project code named “Velocity” team)
How to use “Velocity” logging framework
Microsoft project code named "Velocity" provides the ability to trace events on the cache client and cache host. These events are captured by enabling log sinks. A sink represents a valid destination to which logs or events can be emitted. “Velocity” supports three log sinks:
· Console sink - emits events to the console.
· File sink - emits events to a text file.
· ETW sink - emits events to the OS buffers which are written to the disk.
See related sections in the documentation for more details:
Log Sink Settings: http://msdn.microsoft.com/en-us/library/dd187434.aspx
How to: Set Log Sink Levels (XML): http://msdn.microsoft.com/en-us/library/dd169209.aspx
How to: Set Log Sink Levels (Code): http://msdn.microsoft.com/en-us/library/dd187351.aspx
How to change log file generation format
By default file sink emitted log files are generated hourly and use the following file format ‘dd-hh’. Hence you would see log files named with current date and hour and a new file would be generated every hour. For example, DCacheTrace[1371]13-2, DCacheTrace[1371]13-3, DCacheTrace[1371]13-4 and so on.
Now to increase or decrease the granularity of log file generation, you would need to change the above mentioned time format. Say for instance:
· If you need log files per minute, you would want to change it to ‘dd-hh-mm’.
· For a 24-hour format, change it to ‘dd-HH-mm’.
· For AM/PM display, change it to ‘dd-hh-mm tt’.
Now I will list out the steps required to change the time format. Note that these settings are applicable to a single cache host since changes are done locally.
· Open DistributedCache.exe.config (located in the installation folder).
· Go to <fabric> <section name=”logging” path=”” > tag.
· Add the following entry in “sinks” collection.
<customType className="System.Data.Fabric.Common.EventLogger,FabricCommon" sinkName="System.Data.Fabric.Common.FileEventSink, FabricCommon" sinkParam="C:\Program Files\Microsoft Distributed Cache\V1.0\Logs\DCacheTrace[$]/dd-hh" defaultLevel="0" />
· The path in ‘sinkParam’ should be taken from ‘location’ attribute in the <log> tag.
· The portion after ‘/’ in ‘sinkParam’ represents the time format. Change it as per your requirements.
· Log level can be specified in ‘defaultLevel’. This represents the same property as ‘logLevel’ in the <log> tag.
· Note that these settings override the ones defined in the <log> tag.
For more details, refer to http://msdn.microsoft.com/en-us/library/dd187434.aspx
In the next post, I will talk about the ETW sink.
Thanks,
Amit Kumar Yadav
(Microsoft project code named “Velocity” team)
How to ensure you have correct client libraries
This section is more important for people who are upgrading from CTP2 to CTP3. Since Microsoft project code named “Velocity” bits are not backward compatible, while upgrading the cache hosts, “Velocity” clients have to be upgraded as well. There are four client libraries:
1. ClientLibrary.dll
2. CacheBaseLibrary.dll
3. CASBase.dll
4. FabricCommon.dll
Even if these are updated, since the client needs references to only ClientLibrary.dll and CacheBaseLibrary.dll, in a scenario where by some chance only these two libraries are updated and the client still has older CASBase.dll and FabricCommon.dll, user would run into issues (has been reported earlier by multiple folks). The fact that file and product versions for the second two libraries are 1.0.0.0 for both CTP2 and CTP3 makes troubleshooting even more difficult. Hence you need to make sure that the correct set of assemblies has been copied by checking that their ‘Date modified’ timestamp is same in both locations (copied from, copied to).
Next time we’d be discussing “Velocity” logging framework usage.
Thanks,
Amit Kumar Yadav
(Microsoft project code named “Velocity” team)
Hi,
Hope all of you are having great time caching your data with Microsoft project code named “Velocity”. This post is first in the series of some practices and guidelines to follow in order to make best use of “Velocity” and for troubleshooting any common issues you may face. I also recommend that you see the "Velocity" readme file and the product help file for other known issues. You can find the code_name_velocity_readme.txt file on the "Velocity" download page.
How to work with common “Velocity” exceptions
The occurrence of any undesired behavior in “Velocity” cache requests results in DataCacheException being thrown to the client application. So the application using “Velocity” should be capable of catching them, recovering and taking appropriate actions, for instance, retrying a retry-able error or falling back to the persistent data store.
A “Velocity” client application should check for the exception’s ErrorCode field. These are explained in the documentation in detail. There are error codes like DataCacheErrorCode.KeyDoesNotExist, DataCacheErrorCode.RegionDoesNotExist, etc. which point to logical issues and represent definite failures. However, there also exist some error codes which may come due to cluster’s temporal state of re-alignment as a result of a cache host coming up, going down, named cache getting created/deleted etc. These are explained here and should be taken care of.
1. DataCacheErrorCode.RetryLater : this error code represents transient incapability of the cluster to serve a request. Application can retry the request.
For instance, this would be thrown when the cluster is undergoing a configuration change or there are not enough cache hosts in the cluster to perform a Put/Add.
In the next release, this error code would be accompanied by an error sub-status which would tell the exact reason for this failure. For example - primary cache copy not found, secondary cache copy not found to update, etc to name a few.
2. DataCacheErrorCode.Timeout : this error code is thrown when the client doesn’t receive a response from the server within a specified timeout. For instance, this would be thrown when there is a glitch in underlying communication between client and the server.
However, there is a possibility that the request may have been processed by the server, so application logic should take care of that while retrying.
To clarify a bit more, suppose client does
cache.CreateRegion(“Foo”, false);
and gets a Timeout error code. Trying again may result in an exception with code DataCacheErrorCode.RegionAlreadyExists.
If the deployment mode is chosen as ‘Simple’, there is one more error code which the application should take care of - DataCacheErrorCode.CacheServerUnavailable. This is thrown when a “Velocity” client is not able to reach a particular cache host. This may be possible if that server is down or there is some network related failure.
Next in the series is ‘How to ensure you have correct client libraries’.
Thanks,
Amit Kumar Yadav
(Microsoft project code named “Velocity” team)
We are planning to add performance counters in the upcoming release, CTP4. The performance counters will be available for one or both of the following categories:
· Host - A category for single-instance performance counters pertaining to a single cache host. Host counters track information such as total active connections and total client requests for a single cache host.
· Cache - A category for multiple-instance performance counters pertaining to all cache hosts. Each instance of a cache counter corresponds to a separate named cache in the cache cluster.
|
Counter Name |
Description |
Applicable Category |
|
Total data size (MB) |
This counter will represent total size of cached data in Velocity. It does not include cache overhead.
|
Host, Cache |
|
Total cache misses |
This will be total number of requests that couldn’t find the key in cache since the start of the cache service. This gives the information about how efficiently cache is being used. |
Host, Cache |
|
Cache miss ratio |
This counter gives ratio of ‘number of requests that couldn’t find the key’ to the ‘total number of requests’ since the start of the cache service. And it gives the information about how efficiently cache is being used. It is a ratio of sum of misses to total number of request since the start of the cache service. |
Host, Cache |
|
Total Get requests |
This will be number of Get requests received from all clients since the start of the cache service. |
Host, Cache |
|
Get miss ratio |
This will be ratio of ‘number of Get requests which couldn’t find the key’ to the ‘total number of Get requests’. This gives the information about how efficiently cache is being used. |
Host, Cache |
|
Total write operations |
This will be number of write requests since the start of the cache service. The write requests include Put, Add, Remove,
ResetObjectTimeout, GetAndLock, PutAndLock, Unlock. |
Host, Cache |
|
Total active connections |
This counter stores number of active connections on the cache host. |
Host |
|
Total client requests |
This will be total number of requests received from the Velocity client. It includes all of the API calls. |
Host |
|
Total requests served |
It is a counter for number of requests served and responses sent by the cache host since the start of the cache service. This will provide a rough estimate of the throughput of the cache host. |
Host |
|
Average response time (milliseconds) |
This counter represents the average response time to service a Velocity client API request on the cache host. The average is taken over all the requests received by the cache host since the start of the cache service. |
Host |
|
Total expired Object |
This counter stores the number of expired object since the start of the cache service. |
Host |
|
Total memory evicted (MB) |
This will provide the information about the amount of memory which was freed due to the Eviction procedure on the cache host since the start of the cache service. |
Host, Cache |
|
Average age of evicted object (seconds) |
It stores average age of evicted Object. This is measure of the efficiency of the eviction procedure. |
Host, Cache |
|
Total eviction run |
This will be the number of eviction runs since the start of the cache service. |
Host |
|
Total evicted Objects |
This counter stores the number of evicted object since the start of the cache service. |
Host |
|
Total exceptions |
This represents the number of Velocity exceptions thrown by the cache host since the start of the cache service. |
Host |
|
Total retry exception |
This will be total number of retry operation exceptions thrown by the cache host since the start of the cache service. |
Host |
|
Total notification poll requests |
This counter stores total number of poll requests received by the cache host since the start of the cache service. |
Host, Cache |
|
Total GetAndLock requests |
This will be total number of GetAndLock requests received by the cache host since the start of the cache service. |
Host, Cache |
|
Total successful GetAndLock requests |
This will be number of successful GetAndLock requests since the start of the cache service. |
Host, Cache |
Look forward to your feedback on identified counters or suggestions for new counters.
Sharique Muhammed
(Microsoft project code named "Velocity" Team)
As Velocity moves ahead with the next release milestone, there is an update that - Velocity is planning to release CTP 4 around mid-September 2009. Some of the features planned for CTP 4 would be - Velocity Setup/Configuration changes, Velocity Performance Monitor Counters, Enhancements to Security and more stability & reliability since CTP3
In coming days we will be sharing more details on these features to get your valuable feedback and comments.
Velocity Team
Aaron has published a Velocity article on MSDN magazine. It also include sample code for the cool Northwind demos that I used in my recent talks.
Check it out at - http://msdn.microsoft.com/en-us/magazine/dd861287.aspx
Sometimes you see this error in the Get calls and this can freak out people. There are a few reasons why this error may come. Internally we have a 15second timeout for calls (either Get or Put) .So if are not able to satisfy the request within that time, we timeout and throw this error to the user.
This can be fine tuned using the DataCacheFactory.Timeout property to make it higher or lower. In typical scenarios, you should not his this error. However, there are cases where you will get this,
* The specific machine that the client is routing the request to has just gone down. We try to establish a TCP connection twice before deciding to refresh our routing table. The TCP connection open timeout is 15seconds. (The minimum Send timeout is 10s). These limits are not exposed. Since we retry twice for a connection, it is possible that it takes as long as 40 seconds before we raise a complaint. So in that timeframe calls coming in would start timing out at 15 second intervals.
The lease intervals are maintained between machiens and they are 3 minute long leases with a 1.5 minute update. So within 1 1/2 mins if the machine has not responded the neighbours will suspect something and try to establish a connection. If the machine is running, but hte process is down, then the connection will be refused instantly. If the machine itself is down, then the TCP timeout applies and then an arbitration process is started to kick the machine out of the cluster. So all in all, it could take about 2 mins before the server side decides that a machine is down and reconfigure the cluster.
* Machine is saturated - Here the connection would be slow and the Put/Get might start timing out. Nothing much can be done other than adding new machines or reducing the load. Post V1, we also have automated load balancing that will shift some of the load around in these scenarios. But typically the distribution of load is good enough that if you have uniform sized objects and load, you wouldn't get in to this scenario unless you have not sized your servers properly.
* HA is needed and machine is down : This is the most common problem that we see - people install the Velocity on either a single machine with HA on or on two machines and one machine is down. In either case, if we dont have a place that we can write a backup copy to, we throw this error. In V1, we will fix it so that we raise a different error (Secondary not available) or some such thing, so that you can do somethign different other than retrying.
One of the diagnosability work that is happening for the V1 release is to make this error only occur when it is truly a transient problem.
Anshul ran some more numbers to see the max throughput and latency that we have observed with CTP3. Here are some numbers to share.
Note that these are the max throughput numbers and latencies observed at those values. If you reduce the throughput, then latency goes down. For example, in the Read 2k, if the CPU are not maxed out , then you can get latencies of about 0.6 ms or so.
|
|
Nodes |
Throughput(max) |
Throughput(avg) |
Latency(in Milliseconds)(avg) |
Comments |
|
Read operation (2k) |
1 node |
30000 |
28600 |
1.5 |
Observed at 53 clients and 12 agents |
|
Read operation (2k) |
2 node |
58500 |
55500 |
1.6 |
Observed at 100 clients and 17 agents |
|
Read operation (2k)** |
3 node |
64000 |
59000 |
0.6 |
Observed at 150 clients and 17 agents |
|
|
|
|
|
|
|
|
Read operation (4k) |
1 node |
21500 |
20900 |
2.3 |
Observed at 60 clients and 12 agents |
|
Read operation (4k) |
2 node |
40100 |
39300 |
2.3 |
Observed at 100 clients and 17 agents |
|
Read operation (4k) *** |
3 node |
50100 |
46300 |
1.0 |
Observed at 150 clients and 17 agents |
|
|
|
|
|
|
|
|
Read operation (20k) **** |
1 node |
5450 |
5400 |
3.1 |
Observed at 30 clients and 12 agents |
|
Read operation (20k) **** |
2 node |
10600 |
10100 |
3.2 |
Observed at 50 clients and 17 agents |
|
Read operation (20k) **** |
3 node |
15000 |
14700 |
2.9 |
Observed at 50 clients and 17 agents |
** With all 17 dual core clients ( hardware I have) the 3 quad core server’s CPU was not saturated L . The numbers are at around 65% CPU utilization.
*** With the 17 dual core clients the 3 node server CPU was not saturated. The numbers are at around 75% CPU utilization.
****In all 20K numbers the network was saturated.
Server Configuration : Intel(R) Core(TM) 2 Quad CPU Q9300 \8 GB RAM \ 64 bit win 2k3\1GBPS network.
Thanks
Murali