Welcome to MSDN Blogs Sign in | Join | Help

What to do when your 32-bit .Net Framework 2.0 app won't run on x64

I just ran into this problem trying to run Log Parser Lizard (FX 2.0 app using 32bit dlls) on my x64 machine. It kept throwing BadImageFormatExceptions and crashing. A little research identified the problem, the app is marked to run on any CPU but it uses 32bit dlls.

Simple enough to fix if I can just figure out how to force it to run as a 32bit app. I found the answer here. In a nutshell you can use the CoreFlags.exe utility to force the app to run as a 32-bit application.

Posted by steveshe | 1 Comments

When you know the solution the problem is a lot easier to find…

Yeah, that title sounds a little crazy but let me explain. The phenomena I am referring to is that once you have identified the solution to a seemingly complex problem the problem is revealed to be simple. In my case the problem was really poor performance on the console of a very powerful box. I am currently using a 8 processor 16GB machine running Windows Server 2008 x64 as my primary “workstation”.

The main problem was that it had pretty severe video performance issues. It is also worth noting that this is one of several of these machines we have scattered around and nobody else seemed to be having these problems.

Here is a quick summary of the weird behaviors I was seeing:

  1. One of the three monitors attached to this machine would go black and then repaint every time I mapped or unmapped a network drive. No kidding.
  2. Unlocking the workstation, even immediately after locking it, resulted in a 50-120 second delay while it repainted my screen and generally sat there looking lame.
  3. My VX-600 webcam which I use to chat with my 3 year old son when I am at going be at work late at night was totally unusable. Any time I started the camera the whole machine slowed to a complete crawl.
  4. Scrolling anything in windows in the applications I use like IE, Word, Excel, WinDbg, and Visual Studio was jerky and very slow.
  5. Dragging a window from one monitor to the others was jerky and sometimes would just stall the machine for 10-20 seconds.
  6. Enabling the Aero desktop made the whole thing unusable.

I searched everywhere for problems with Windows Server 2008 and performance problems on the physical console. Why the physical console? Because I can use Remote Desktop to connect to this machine it runs just fine.

Based on all of these symptoms I figured it was the fault of the stock video card I had in the machine which uses a cheap NVidia Quadro GPU so I went out and got a new video card making for a matched set of GEForce 8600 SE cards in this machine. This configuration now matches my neighbors machine which doesn’t have these problems. In the end this change helped somewhat but performance was still pretty poor. I tried updating my video drivers, no help there either. Time to try some other stuff.

The next thing I did was switch my processor scheduling setting from “Background services” to “Programs”. This seemed to produce a slight improvement but not enough to make the machine usable from the console.

Next I shut down my Hyper-V VMs. I considered that perhaps they were chewing up resources that my machine desperately needed, despite the fact that I had 7GB-10GB of RAM free and CPU utilization was averaging 5%. That seemed to have no effect at all but since they just made it take longer each time I rebooted and I wasn’t using them right now I left them turned off. 

Since it wasn’t any of the obvious stuff, next I went to the trusty ole Reliability and Performance Monitor. I created a new counter set to track the Memory, Processor, Processes and Threads objects. Surely if I have a performance problem on the physical console it will show up in one of those.

I started the log, locked my machine, waited 2 minutes and then unlocked it. It took about 55 seconds to repaint all the screens and get control of the apps. When I cracked open the log I could see that there were a few of my helper apps like Communicator, SnagIt and UltraMon taking a pretty big chunk of CPU for most of that unlock time.

I used MSConfig.exe to disable all of the autorun things I had loaded and rebooted. Performance was better. At this point I assumed that the performance I was seeing was as good as it was going to get so I set about determining which of autorun apps was killing what little performance I could get out of this machine.

I proceeded to enable the apps one at a time, reboot, and test the performance of locking and unlocking the machine. In the end I removed the webcam software as it seemed to be the only thing left when everything else was loaded and it still ran better than it had before.

After using the machine for an hour or so I locked it and came back ten minutes later. When I unlocked the machine it took more than a minute for me to get control of the desktop and even then it was pretty sluggish. Back to square one.

During all of this I have been talking with my peers to see what other ideas they might have. One of them suggested that maybe I had a desktop heap problem. Since I have run out of desktop heap in the past and I know what that looks like I didn’t really think that was it but I am desperate now so I doubled the amount of desktop heap for the interactive session to 40MB. No help there either, same behavior.

At this point I had pretty much resigned myself to using this machine in it’s current state until I have time to pave it and reinstall everything. A few minutes later one of my peers who has the exact same configuration was showing me how his machine performed. It was awesome! Unlocking his machine was pretty much instantaneous! What in the @#$%&*%$# is wrong with mine?

It was then that he saved me by commenting that at one point he had installed Hyper-V and had some performance problems, not as bad as mine, but he had some, and he fixed them by uninstalling Hyper-V…

Now I’m thinking “Hyper-V can’t be my problem, I don’t even have any VMs running”…and then it dawned on me. Hyper-V uses a HyperVisor to wedge in between the OS and the hardware. What could possibly go wrong there?

Needless to say, 5 minutes later, after I had uninstalled Hyper-V and rebooted, my machine is as blistering fast as my co-worker’s!

Now that I knew what the solution was was I did some quick searches for problems with Hyper-V video performance. I immediately found KB 961661. For reference here is the cause section of the KB article:

This issue occurs when a device driver or other kernel mode component makes frequent memory allocations by using the PAGE_WRITECOMBINE protection flag set while the hypervisor is running. When the kernel memory manager allocates memory by using the WRITECOMBINE attribute, the kernel memory manager must flush the Translation Lookaside Buffer (TLB) and the cache for the specific page. However, when the Hyper-V role is enabled, the TLB is virtualized by the hypervisor. Therefore, every TLB flush sends an intercept into the hypervisor. This intercept instructs the hypervisor to flush the virtual TLB. This is an expensive operation that introduces a fixed overhead cost to virtualization. Usually, this is an infrequent event in supported virtualization scenarios. However, some video graphics drivers may cause this operation to occur very frequently during certain operations. This significantly magnifies the overhead in the hypervisor.

This is why I say, when you know the solution the problem is a lot easier to find.

Posted by steveshe | 0 Comments

RECAP: Thou shalt not touch my databases!

Apparently some people, who shall remain nameless, still haven't gotten the memo that clearly states that if you modify the WSS/MOSS databases in almost any way your farm will be rendered unsupported. What memo am I referring to you ask? I am referring to KB841057. This KB article represents our best efforts to spell out the kinds of changes that are and are not allowed to our databases.

I'll try to add a little clarity from the perspective of someone who is often put in the position of telling customers that they have violated the rules laid out by this article. Before I begin let me state for the record that these statements do not establish the official position of Microsoft Corporation. Recommendations and advice are those of the author or people and organizations the author trusts. The contents are provided "AS IS" with no warranties and confer no rights.

Lets take a look at a couple of key points of this article and see if we can make it a little clearer. When the article says:

The products that are listed in the "Applies to" section were tested by using the existing structure and were approved for release based on that structure. Unless Microsoft protocol documentation is followed precisely, Microsoft cannot reliably predict the effect to the typical operation of these products when parties other than Microsoft support change the database or run stored procedures. Parties other than Microsoft support would include, but not be limited to, changes that are made by customers, by third-party vendors, or by consultants.

I read this to say, "When developing custom solutions using our protocol documentation, unless you do things exactly as we do them there is a very, very, good chance that bad things will happen. We may make exceptions for specific updates to specific datasets at specific times for our support personnel. Support personnel are those persons employed by the Microsoft Commercial Office Support Systems group with the following titles: Support Engineer, Support Escalation Engineer, Escalation Engineer. If you do not work in the Microsoft Commercial Office Support Systems group AND posses one of the aforementioned titles then you are not permitted to directly modify our databases."

Having said all of that I can tell you from lots of experience that we in the SharePoint support team very, very, rarely make direct modifications to the database. Even when it does happen it is only when the problem was introduced by the product itself and after all other means of correcting the problem have been exhausted.

The article goes on to describe some, but not all, of the things that are not permitted:

Examples of such database changes include, but are not limited to, the following:

  • Adding database triggers
  • Adding new indexes or changing existing indexes within tables
  • Adding, changing, or deleting any primary or foreign key relationships
  • Changing or deleting existing stored procedures
  • Adding new stored procedures
  • Adding, changing, or deleting any data in any table of any of the databases for the products that are listed in the "Applies to" section unless Microsoft protocol documentation is followed exactly
  • Adding, changing, or deleting any columns in any table of any of the databases for the products that are listed in the "Applies to" section
  • Making any modification to the database schema
  • Adding tables to any of the databases for the products that are listed in the "Applies to" section
  • Changing the database collation

I think this list is pretty clear and while we specifically state that this is not an exhaustive list I'm pretty hard pressed to come up with something I might want to do to my database that is not covered by that list. In short, Do not touch the databases for the products associated with this article!

The article goes on to describe in pretty solid detail what happens if you decide to violate these prohibitions so I will not bother with interpreting them here.

Posted by steveshe | 0 Comments
Filed under:

The Challenges of Script Debugging Using VS2008

I recently worked on a problem that required me to debug the script for the HTML Editor control ion WSS. In the past I have always used the Microsoft Script Debugger and just selected "Break on next statement". In this case that was not really getting me where I needed to be so I decided to try to do it with VS2008.

Unfortunately this presented a new set of problems because while I knew  where the code was I wanted to debug I still didn't have a good way to catch it when it executed. A quick bit of web searching turned up this blog post that briefly mentioned the "debugger;" instruction. Not being a JavaScript guy I was unaware of this command. It's simply invokes the debugger when it executes. How very handy :)

I then edited my .js file in the \Layouts\1033 folder and added the line "debugger;" at the beginning of the function I wanted to work on. I then cleared the IE temporary internet files folder so I would easily get the new version of the .js file. I went through the steps to reproduce the problem and it opened Windows Script Debugger. This was not what I wanted. I wanted Visual Studio 2008 so I could use all the fancy features like hovering over variables and such.

I speculated that the problem might be that when I installed the script debugger it stole the JIT settings from VS2008 so I uninstalled it and tried again. No joy. OK, it may have stolen them and not put them back when it was removed.

It occurred to me that it may be that VS2008 is just not configured for JIT debugging of scripts. I checked my VS2008 debugging setting in the UI and found that I didn't have the Script setting enabled so I turned off the other two and enabled it:

image

A quick test verified that this did not fix my problem.

At this point one of my EE cronies (Rob Anderson) mentioned that he gets the JIT debugger popup when he does script debugging, I don't. This reminds me that historically the AEDebug key controlled this behavior. We check his registry settings and compare them to mine and discover that the likely reason is that I am missing the "Auto" key and value. Adding that fixed my problem and I am now happily JIT debugging javascript in Visual Studio 2008!

 

Here is the proper settings for this to work on an x86 machine:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug
"Auto"="1"
"Debugger"="\"D:\\WINDOWS\\system32\\vsjitdebugger.exe\" -p %ld -e %ld"
"UserDebuggerHotKey"=dword:00000000

In my case these settings needed to be transposed to the x64 registry location:

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\AeDebug 

Other than the location, the setting are the same.

Posted by steveshe | 0 Comments
Filed under:

How do I tune the MOSS Object Cache for performance and economy?

Tuning the size of the MOSS Object Cache is done via the Max Cache Size settings on the Site Settings > Object Cache Settings page. It is important to recognize that this maximum cache size setting is a limit and not a static value. For example; Just because the maximum cache size setting for the Object Cache is configured to 100MB does not mean that it will consume 100MB of memory at startup. It simply means that if the cache were to exceed 100MB it will be compacted to reduce its memory consumption to a level below that maximum value.

 

For purposes of this discussion we will categorize cache compaction rates into three categories, unacceptable, acceptable and optimal.   A cache compaction rate of more than 6 per hour should be considered unacceptable. A rate of between 2 and 6 compactions per hour is acceptable and between 0 and 1 cache compactions per hour is optimal. You should only target the optimal level of cache compactions if you have sufficient amount physical memory installed on your servers to achieve this goal and still have sufficient free physical memory remaining on the server to support other critical system operations.

 

You should regularly monitor for cache flushes because they are extremely expensive in terms of cache performance impact. A cache flush results in the ejection of all cache contents. Cache flushes are triggered by the creation, deletion or moving of a web. They can be monitored via the "SharePoint Publishing Cache/Total number of cache flushes" counter in Performance Monitor.  If you are seeing cache flushes throughout the production day you should reconsider how you manage webs during those hours.

 

Based on these considerations we have developed the following guidance for how to tune the Object Cache size for acceptable performance using the "SharePoint Publishing Cache / Total number of cache compactions" performance counter.

 

We feel this level of cache performance will meet an economical customers performance needs without unduly sacrificing their limited memory resources. The steps to achieve this are fairly straightforward and must be applied in an iterative fashion until the desired level of performance is achieved. The recommended steps are:

1.       Start with the default cache settings of 100MB on the site collection.

2.       Capture at least 8 hours worth of performance data from the WFEs while the system is under a typical load using the "SharePoint Publishing Cache/Total number of cache compactions" counter. Since we are interested in tracking compactions per hour it is acceptable to  capture this data at 1 minute intervals.

3.       After analysis of the data, if we are exceeding the threshold for acceptable cache compactions we will need to add an additional 50MB to the Maximum Cache Size value and run the test again.

4.       We should continue this process until such time as we have achieved an acceptable cache compaction rate.

 

Posted by steveshe | 1 Comments
Filed under:

New Database Maintenance whitepaper released

We have just realeased a whitepaper that describes the recommended maintenance strategies for the databases that host content and configuration settings for SharePoint Products and Technologies. You can get it here. It covers:

·         Checking database integrity.

·         Defragmenting indexes by either reorganizing them or rebuilding them.

·         Setting the fill factor for a server.

·         Shrinking databases to recover unused disk space.

Posted by steveshe | 1 Comments
Filed under:

Welcome Roger Lamb to the blog space

For his first post he has provided a bunch of examples of dispose patterns for SharePoint that will be really helpful for anyone trying to write custom web parts for SharePoint and wanting to avoid all the myriad ways you can leak objects:

SharePoint 2007 and WSS 3.0 Dispose Patterns by Example

Posted by steveshe | 0 Comments

Overlapped Recycling And SharePoint: What Are The 64-bit Settings?

We strongly recommend that our customers move to 64-bit servers for MOSS and WSSv3 unless there is some significant reason that prevents it. It is also worth noting that according to our recent documentation this will definitely be the last version of SharePoint to run on 32-bit hardware. This brings up the question of whether or not you need to configure overlapped recycling on 64-bit servers.

While we have quite a few customers already on 64-bit servers I have not seen a lot of problems that would be addressed by these settings but I have seen a couple. This tells me that there is still value in configuring these settings on 64-bit servers. Unfortunately, I don’t have any solid data to make recommendations on what those settings should be.

That means the best I can offer is some thought on how I would figure it out. I would start by enabling a scheduled recycle each day that happens about 30 minutes before my first users start work. This makes good sense as I consider a daily recycle to be good housekeeping. As mentioned in a previous post, recycling the worker process cleans out fragmentation and results in a more orderly workspace.

Next, I would get Performance Monitor data for my servers for a period of two weeks. These two weeks would ideally contain a few days of my highest user loads. It is also important that during these two weeks that I’m not already experiencing problems like crashes or memory allocation related errors in the ULS or Application Event Logs.

While I can’t tell you exactly what days are in your business are the most intensive I can give an example to guide you. Many companies function on a monthly cycle where their peak loads occur at the end of each month. In this type of business I would gather my data for the last week of the current month and the first week on the next month. This would probably give me a solid representation of my heaviest loads and my lightest loads.

In my scenario I’m going to guess 30% growth over the max I see in my Performance Monitor data is about the most to I want to allow before I recycle my worker process. I will be using that number in the calculations below.

Once I have my data I would then determine the maximum size that the worker process ever reached over that two week period in terms of Process/Private Bytes and Process/Virtual Bytes. I would also gather the Memory/Available MBytes value from those same times where I saw the maximum values.

Once I had these values I would first determine if I have enough free physical memory to enable overlapped recycling. To determine this I would compare the maximum Private Bytes value to the available MBytes value and if Available MBytes was not at least equal to ((Private Bytes value * 1.3)+300MB) I would not enable overlapped recycling until I had added more physical memory to the system.

Assuming I have enough free physical memory I would then use those maximum observed values plus my 30% growth factor to come up with the settings for the “Maximum Memory Used” and “Maximum Virtual Bytes” recycle settings. Here is an example:

Maximum Observed Private Bytes 2000MB
Maximum Observed Virtual Bytes 4000MB

Using these numbers I would configure the IIS Application Pool using the following settings:

Maximum Memory Used Value 2600MB
Maximum Virtual Bytes Value 5200MB

UPDATE 6/19/2008 

I thought it was worth coming back to this post and updating it. As of this date we have not seen any significant problem in this area with customers running on 64-bit platforms. Most of the customers I am aware of on 64-bit platforms are not using memory based recycling settings. I do want to say that I would still reccomend a scheduled nightly recycle because it will help reduce any possiblity of problems caused by fragmentation.

 

 

Posted by steveshe | 1 Comments
Filed under:

Overlapped Recycling And SharePoint: Tracking Recycle Events

As part of your efforts to properly configure your server for overlapped recycling  you will obviously need to know when a recycle takes place. It also makes sense that you would want to know about ALL recycle events, not just overlapped ones. To this end you will need to configure your IIS Application Pool to track all of these events in the Event Logs. Here is how you how do this:

1.     Click Start, click Run, and then type cmd at the command prompt.

2.     Change to the directory where Adsutil is located. The following is the default directory location: %SYSTEMROOT%\Inetpub\AdminScripts

3.     Type the following command:

 

cscript adsutil.vbs Set w3svc/AppPools/[YourAppPoolName]/LogEventOnRecycle 255

In the command above, replace [YourAppPoolName] with the actual name of the application pool upon which you want to enable the events.

 

Note:

If your application pool name has a space in it, for example, “SharePoint- 80”, you must include double quotes around the metabase path in the command. Here is an example:

cscript adsutil.vbs Set "w3svc/AppPools/SharePoint - 80/LogEventOnRecycle" 255

 

For those among you who are curious as to the details of what this all means, here are some links to explain:

How to modify Application Pool Recycling events in IIS 6.0
LogEventOnRecycle Metabase Property (IIS 6.0)

Posted by steveshe | 1 Comments
Filed under:

Overlapped Recycling And SharePoint: A Hidden Benefit

There is a hidden value in configuring the overlapped recycling values. I was lucky enough to gain the collaboration of Thomas Marquardt on a particularly nasty performance case and he let me in on this little secret. Well, it’s not really a secret it’s just so obscure that if you don’t know about it, and where to research it, then it might as well be.

 

The secret is this, the ASP.NET Cache references the value entered into the “memory used” setting to govern how aggressively it tries to reduce the size of the cache. In a nutshell, if you don’t set the “memory used” value, the ASP.NET Cache will assume that when your process reaches 800MB of Private Bytes that we are running out of memory and it will begin to induce garbage collections in the managed heap to free up some space. The further past 800MB you go the more aggressive it gets. This will show up in a Performance Monitor trace as a sharp increase in the .NET Memory/% time in GC and .NET Memory/# Induced GC counters for your worker process as memory pressure increases.

 

Note:

In his post Thomas mentions a very important code fix that was released as a hot fix prior to .NET 2.0 SP1 and included in .NET 2.0 SP1. It was created to adjust how aggressively the ASP.NET Cache responds to low memory conditions. Anyone running SharePoint should have the hot fix or the service pack installed. The kb for the hot fix is: KB93876. 

 

I won’t try to restate the details on how this works here because it is complicated and I would probably get it wrong. I also won’t try to simplify it because Thomas has already done a very good job of mapping it out, you may just have to read it a few times to really understand it completely.  You can find Thomas’ words here.

Posted by steveshe | 0 Comments
Filed under:

Overlapped Recycling And SharePoint: Configuring The Shutdown Timeout

The previous post was so short, I though today I’d give you two posts. The shutdown timeout value is used to determine how long IIS will allow for the old worker process to finish the in-flight requests that were active at the time of the recycle event before it forcibly terminates the process and drops the request. To be very clear here, if the application pool competes all of its outstanding requests before the timeout expires then it will terminate at that time. If the timeout is hit before the requests are complete, IIS will forcibly terminate the worker process and the users will eventually receive a timeout error.

The default for the shut down timeout setting is 90 seconds. This seems like a long time for most web applications. However it may not be enough for a SharePoint Server. Consider this scenario; You have configured your server to allow file uploads of up to 50MB. You have users who telecommute and even some on other continents from time to time. These both represent an opportunity for very long running operations. When you throw these into the overlapped recycle scenario you will probably end up exceeding that 90 second timeout value by a wide margin.

This is why you need to carefully configure the shut down timeout when using overlapped recycling. Setting it too low results in intentionally dropping user requests which is very harmful the user experience. Setting this value too high results in a worker process that could potentially hang around too long.

How long is too long? Depending on how tight you are on free physical memory you could end up impacting overall server performance by keeping this large worker process around while other demands for server memory cause you to start paging.  Assuming you have followed the memory guidance from the previous post, Overlapped Recycling And SharePoint: What To Watch Out For, then you are probably relatively safe from this problem. That really only leaves the user impact to be considered.

In our documentation we recommend that you set this value at 300 seconds. This is probably going to meet the needs of most environments. The way to know if you haven’t allowed enough time for all the longer running requests to complete is to monitor the Application Event Log. If you see warning event messages like the one below then you may want increase the timeout further:

Event Type: Warning

Event Source: W3SVC

Event Category: None

Event ID: 1013

Date: 12/1/2007

Time: 13:07:21 PM

User: N/A

Computer: <ComputerName>

Description: A process serving application pool 'SharePoint - 80' exceeded time limits during shut down. The process id was '<xxxx>'.

 

 

In the end, it is entirely up to you how long you are willing to have that extra worker process hanging around.

Stop back by tomorro when I will give you my thoughts on: Overlapped Recycling And SharePoint: What Are The 64-bit Settings?

Posted by steveshe | 1 Comments
Filed under:

Overlapped Recycling And SharePoint: What To Watch Out For

The next section of the documentation on IIS Process Recycling that needs a little SharePoint context added is the section on “Considerations When Recycling Applications”. I believe this section may put a lot of people off of using overlapped recycling. Here is what it says:

When applications are recycled, it is possible for session state to be lost. During an overlapped recycle, the occurrence of multi-instancing is also a possibility.

Loss of session state: Many IIS applications depend on the ability to store state. IIS 6.0 can cause state to be lost if it automatically shuts down a worker process that has timed out due to idle processing, or if it restarts a worker process during recycling.

Occurrence of multi-instancing: In multi-instancing, two or more instances of a process run simultaneously. Depending on how the application pool is configured, it is possible for multiple instances of a worker process to run, each possibly loading and running the same application code. The occurrence of an overlapped recycle is an example of multi-instancing, as is a Web garden in which two or more processes serve the application pool regardless of the recycling settings.

If your application cannot run in a multi-instance environment, you must configure only one worker process for an application pool (which is the default value), and disable the overlapped recycling feature if application pool recycling is being used.

WOW! That’s is some scary sounding stuff, especially if you don’t even understand what “session state” and “multi-instancing” are.

If I didn’t know that SharePoint explicitly supports and recommends overlapped recycling I’d be a bit concerned about enabling this feature. Luckily for me, we recently published a set of recommended settings for configuring overlapped recycling on IIS Application Pools hosting SharePoint, you can find it here.

There is another thing you should be aware of when configuring overlapped recycling that they failed to mention in that documentation. You need to be absolutely sure you have enough free physical memory available to support multiple instances of the worker process.

For example if you are running SharePoint  on a server with only 1GB of physical memory (the minimum) then you probably don’t want to enable overlapped recycling. The reason is that when a recycle event occurs, IIS is going to create a new worker process alongside your existing one. If your existing worker process has a working set of 650MB and you only have 180MB  of free physical memory, then the creation of that second worker process is going to cause a significant amount of paging and greatly impact performance. You should use Performance Monitor to get a solid understanding of the memory requirements for your worker process. You can then use that information to determine if overlapped recycling is a viable choice for you. If you do not have enough resources with your current configuration I would suggest that it’s worth the effort and expense to upgrade you servers so you can take advantage of this great feature.

In the next post I will be talking about: Overlapped Recycling And SharePoint: Configuring The Shutdown Timeout

Posted by steveshe | 1 Comments
Filed under:

The documentation update has been completed

The original version of Planning and Deploying Service Pack 1 for Microsoft Office SharePoint Server 2007 in a Multi-server Environment contained an error. It stated that the recommended value for the overlapped recycling setting for Virtual Memory Used was 1300MB. This was incorrect. The document has been updated to reflect the correct value: 1700MB.
Posted by steveshe | 1 Comments
Filed under:

Overlapped Recycling And SharePoint: Scheduled Recycling

As I have mentioned in previous posts, scheduled recycles are a good thing. You should view them  as basic process housekeeping task, kind of like washing dishes or doing laundry. Scheduled recycles are of benefit primarily because they ward off problems caused by both heap fragmentation and virtual memory fragmentation. For those of you who don’t know what fragmentation is, I’ll attempt to describe it here. First, I’ll give you an analogy:

At my restaurant I have a huge parking lot. It has 200 parking spaces. In the morning my parking lot is completely un-fragmented  because each night all the customers take their cars to another lot or to their homes and then it’s empty. 

Once I open my restaurant, people start parking in various parking spaces. This is not a problem because I have lots of spaces and most people are just picking up takeout food for breakfast so they don’t stay very long. This means most of the cars are parked near the front of the lot and there are a lot of free spaces in large clumps. My parking lot is slightly fragmented

As we near lunch time the lot starts to really fill up and cars are parking further out in the lot and some are staying a lot longer than others. This creates clumps of cars all over the lot with only a few single spaces in between the clumps. My parking lot is becoming heavily fragmented. This is still not a problem as long as I have at least one parking spot for each car. In other words, I can still fulfill all of the allocation requests.

Later in the afternoon a tour bus shows up with people who want to eat lunch at my restaurant. A tour bus needs at least 8 parking spaces side by side to park. My parking lot has 48 spaces free but there no groups of more than 6 spaces side by side. My parking lot is too fragmented to meet this request.

The bus driver is denied a parking space. Not because my parking lot is too small for a bus. I even have enough total parking spaces to park a bus, they are just not in a contiguous block.

Technically, we’re all done with the analogy. But, if you really want to have some fun, add in some abandoned cars (memory leaks) and people who scrape or bump the other cars while pulling in or out and then just quietly just drive away (memory corruption).

To make the mental connection back to memory fragmentation simply replace the parking lot with your worker process’s virtual address space and replace the cars and the bus with memory allocations.

Now that you understand conceptually how memory gets fragmented, the following statements should make perfect sense to you. Fragmentation is a term generally used to describe the condition when the amount of a given resource as a whole is sufficient for a request to be satisfied, but there is not a large enough block of that resource to accommodate the entire request in one contiguous block. This is true of memory and also disk drives. Lots of people are familiar with disk fragmentation so I’ll explain the difference.

What separates the memory problem from the disk problem is that when a large enough contiguous block of disk space cannot be allocated, the file system just breaks the data into chunks and spreads it around, linking each chunk to the next so they form a chain of chunks. This type of fragmentation has a performance impact as the heads are forced to fly back and forth across the disk to hunt down the chunks, but everything still works.

The same is not true for memory. As a general rule, most programming languages that support direct memory addressing assume that memory allocations are contiguous. This means that if I can’t get a single contiguous chunk of memory of the size I requested then the only valid response is to deny my request. In the .NET world that response comes in the form of an OutOfMemoryException.

SharePoint lets you know when this happens by logging memory allocation failures in the ULS logs, the Application Event Logs and even to the client browser on occasion. If you believe you are getting these error as a result of fragmentation you should use Performance Monitor to examine the amount of Virtual Bytes being consumed by your worker process at the time of the error. If you are experiencing memory allocation failures when there is a significant amount of free memory available in the virtual address space, say more than 300MB, you MAY be suffering from fragmentation and should engage Microsoft Support for assistance. For the do-it-yourselfer, check out Yun Jin’s blog for a jump start.

Note:

There are some operations in SharePoint that require vast amounts of memory to complete. These are things like using STSADM to backup large site collections where the manifest for the backup may require several hundred megabytes or using the Download a Copy feature with very large files. These operations may fail even when there is very little fragmentation present because they require such large allocations. If you experience these types of failures you may wish to wait until off peak hours, recycle the worker process and then attempt them again immediately to take advantage of the pristine address space created by the recycle.

 

By recycling the worker process periodically we clear out all of those little allocations that that are breaking up the address space and start fresh. While fragmentation will eventually creep into my address space again, if I recycle the worker process at regular intervals it will probably not get bad enough to cause me problems.

Check back later when we'll talk about: Overlapped Recycling And SharePoint: What To Watch Out For

Posted by steveshe | 1 Comments
Filed under:

Overlapped Recycling And SharePoint: Memory Based Recycling

I have already mentioned the values for Maximum Memory used and Maximum Virtual Memory we recommended in our recent whitepaper in previous posts but I want to speak on them directly for a minute. Let’s start by saying that there is no lack of controversy on what the proper values are for these settings. There are teams within Microsoft that believe that the Maximum Memory Used setting should never exceed 800MB and that the Maximum Virtual Memory setting should never exceed 1500MB. These are great conservative numbers and if you can service your required user load without constantly recycling the worker process while using them, then I would say that you should use them. They represent a “safe harbor” setting. Using these settings you will probably never see SharePoint logging allocation failures in your ULS or Application Event Logs.

Unfortunately, many of our customers  are trying to get the last bit of life out of their 32-bit servers while moving from the less resource intensive world of SPS2003 and WSSv2 to the more feature rich and resource intensive world of MOSS and WSSv3. This means that they are often willing to take a slightly more aggressive approach to memory management to buy time until their hardware leases expire or until they have completed testing on their new 64-bit hardware. So, while there may be some theoretical or anecdotal controversy around using the 1000MB/1700MB values, we have used them successfully in some very memory intensive environments and I feel comfortable that they will work for the vast majority of customers running SharePoint.

It is possible that in some environments you may even be able to push the “Maximum memory used” setting of 1000MB a little further but I would strongly recommend that you test it thoroughly and aggressively monitor your ULS logs and Application Event logs for memory related errors to ensure that you are not causing  problems for your users. While I’m on the subject, with all of the setting levels mentioned above I would recommend that you regularly peruse your Application Event Logs and ULS logs to ensure that your system is healthy and stable, this is just good administrative practice.

There is a also a hidden value in configuring the Maximum Memory Used setting. I was lucky enough to gain the collaboration of Thomas Marquardt on a particularly nasty performance case and he let me in on this little secret. Well, it’s not really a secret it’s just so obscure that if you don’t know about it, and where to research it, then it might as well be.

 

The secret is this, the ASP.NET Cache references the value entered into the Maximum Memory Used setting to determine how aggressively it governs the cache. In a nutshell, if you don’t set the Maximum Memory Used value, the ASP.NET Cache will assume that when your process reaches 60% of physical memory or 800MB of Private Bytes that we are running low on memory. It will then begin to induce garbage collections in the managed heap to free some memory. The further past 800MB you go the more aggressive it gets. This will show up in a Performance Monitor trace as a sharp increase in the .NET Memory/% time in GC and .NET Memory/# Induced GC counters for your worker process as memory pressure increases.

 

I won’t try to restate the details on how this works here because it is complicated and I would probably get it wrong. I also won’t try to simplify it because Thomas has already done a very good job of mapping it out, you may just have to read it a few times to really understand it completely.  You can find Thomas’ words here.

 

In his post Thomas mentions a very important code change that was released as a hot fix prior to .NET 2.0 SP1 and included in .NET 2.0 SP1. It was created to adjust how aggressively the ASP.NET Cache responds to low memory conditions. Anyone running SharePoint should have the hot fix or the service pack installed. The kb for the hot fix is: KB93876. In the right scenarios, configuring the Maximum Memory Used setting and this hot fix can provide dramatic improvements in performance.
Posted by steveshe | 2 Comments
Filed under:
More Posts Next page »
 
Page view tracker