One of the updates found in FIM 2010 RC1 Update 1 as David points out is the ability to ensure Requests and Workflows (WF) created on a specific FIM Service Partition are executed only on the same FIM Service Partition that it was created on. FIM Service Partitions only matter if you run 2 or more FIM Services that are connected to the same FIM Service DB.
Background
To give you some context on this change, it is important to understand some background information about how Request & WF Host works in RC1.
Requests come into our system via the web service and are then handled internally via the Request Processor, if there are any Workflows generated, they are added to a WF queue. This means we operate on a polite first come, first served basis.
This is fine for end users, but when you account for administrative tasks like sync or run on policy update, which can generate hundreds of thousands of Requests, then your end user Request with its associated Workflow(s) could be at the end of a very long line before its Workflows execute. You wouldn’t want an end user trying to create a distribution group waiting behind 100k WFs kicked off from an export that you just finished from Sync.
Solution
The solution is providing the ability to setup separate middle tiers where one or more can process sync\admin Requests & the other can handle your end user Requests and WFs
In RC1 this was possible as each Web Service has it’s own WF queue and puts WFs it generates into it’s own queue.
The problem in RC1 was that when a FIMService starts it can resume any Request or WF that was previously persisted to disk (cases 1-4 below). Therefore a Request or WF from the “Admin” portal can be resumed by the “User” portal and vice versa. At this point you no longer have control of your queue size and can’t protect your “User” Requests from being queued up behind Admin Requests.
Requests and WFs save their execution state in the following cases:
- Delay Activity – If the Workflow has a delay activity it persists its state to the FIMService DB while waiting for the delay to expire
- Service Stopping – When the FIMService is stopped, incomplete Workflows are unloaded & persisted to the FIMService DB
- Request Processing - Requests save their current state to the FIMService DB as they move through the processing pipeline
- Action WF Policy Enforcement – Some Action WF policy enforcement is executed asynchronously and persists the Request and\or WF to the FIMService DB which get resumed by a FIM Service
We resolved this by introducing (in RC1 Update 1) the concept of Service Partitions. A Service Partition is just a unique way of identifying one or more servers that will share the processing of Requests and WFs in the system.
In the above diagram I have the following Service Partitions created.
- FIMPortal.Contoso.com – Single Web Service instance for handing end user portal Requests
- FIMPassword.Contoso.com – NLB Web Services for handling password registration\reset Requests
- FIMAdminPortal.Contoso.com – For handling Sync & Admin Requests
There is also an additional consideration of which of these servers do you want to process mails from Exchange & generate Requests. These servers will need to have polling enabled, which will be shown in the next section.
Steps to Configure
Configuration of multiple middle tiers in your environment is pretty straight forward. Here are the steps to setup the above topology.
- Install FIM Service & Portal on your Admin portal environment
- If you don’t want this service processing Requests made through Outlook to Exchange then ensure the polling checkbox is unchecked.
- Set the FIM Service Server address (aka externalHostname) to be the FQDN for your Admin web service (i.e. FIMAdminPortal.Contoso.com)
- Configure the FIM MA
- Set the FIM MA to point to the FQDN for the WS of the admin portal
- Install FIM Service & Portal for your End User Portal
- Setup this service to process mails from Exchange (assuming you are using Exchange 2007)
- Set which email account you want this service to process mails from
- Note: Each Service Partition that has polling enabled should have it’s own email account to ensure it only processes mails from it’s own account
- Set the FIM Service Server address (aka externalHostname) to be the FQDN for your “User” web service (i.e. FIMPortal.contoso.com)
- For this & every other instance that will use the same database, choose to reuse the database.
- Install FIM Service & Portal on your Password Web service machines (You can just install the FIM Service if you don’t need the portal)
- Set the FIM Service Server address (aka externalHostname) to be the FQDN for your “Password” web service (i.e. FIMPassword.Contoso.com)
Now what you have is a guarantee that FIMPortal will only ever execute Requests & WFs that were created by end users in the Portal, FIMAdminPortal will only ever execute ones from it or Sync, & FIMPassword will allow Requests & WF processing to execute on either of the 2 NLB Password boxes.
Note: To fully take advantage of this topology you will need to also need to configure your clients appropriately to send password reset Requests, Outlook plug-in mails, & Portal links to the right servers. I will cover that in my next post.
You can now find FIM 2010 RC1 Update1 making its way out via Microsoft update. This is a cumulative release with many fixes across the product. Of particular interest from a performance perspective is some changes we have made to significantly improve performance around the password reset scenario.
I highly recommend you pick up RC1 Update 1 and try it out.
Links
Scale
For the discussion of scale the first question is the number of users & groups you expect to be in your system. There are fairly easy numbers to determine with a few queries of Active Directory or whatever is your directory store of choice. As many companies currently do not allow end users to manage groups, you will need to consider first if you are going to allow your users to create & manage groups. Then if you do, how that will impact your scale as the number of groups may grow. In our case we are using 150k+ Users & 400k+ groups in our testing.
The next portion to consider is how your configuration will impact your scale. Are you going to deploy calculated groups? If so what types & how many will you use? Are you going to manage custom object types in your environment like computers?
Last there is some impact from your policy objects on the scale of your deployment. In a cross forest environment you may be provisioning FSPs into Sync. Using codeless will increase the number of EREs & DREs in your system which can be roughly estimated based on the number of SyncRules for the object type * the count of that object type.
Load
Now that you have your system configured, gotten it populated you are ready to start rolling out your deployment. How are you going to know that you have the right set of hardware in place to sustain your deployment?
In our case we started with estimating the expected usage for a given number of users for many of the key operations in the system.
- How often do we expect a request to join or leave a group?
- How often do we expect a user to create a static or dynamic group?
- What is the breakdown of security group usage vs. distribution list usage?
- How often do we expect a request to register for password reset or reset a password?
- How many computers have the FIM client installed? – The client will periodically do a check for if registration is required on logon.
- Will users primarily use the Portal or the Outlook client for group management operations? – This will determine how much work is done by the Mail Listener as it polls Exchange & processes mails
- What is the usage pattern of your company? Do you expect peaks in the morning when users log in?
After estimating each of these, some based on hard data we have from another tool which was deployed & some we had to estimate, we were able to create what we call a load profile.
Using that we were able to leverage VSTS to create load tests to simulate our deployment. VSTS is a great tool for doing performance testing of a product, it is geared toward product development directly but could also be leveraged in testing a product deployment.
How are you going to evaluate the readiness of your deployment to meet your scale & load requirements?
Policy Objects are the heart of your deployment where you will implement the business logic needed. These will also be influenced by the scenarios (credential management, group management, user management, etc.) that you deploy. As such it is no one deployment will be exactly the same as another but to help give you a data point I would like to cover a breakdown of what we are using currently in our deployment.
RC1 Out of Box vs. Sample Deployment
| | RC1 Out of Box (OOB) | Deployed |
| Sets | 71 | 96 |
| MPRs | 51 | 98 |
| Workflows | 12 | 58 |
| Sync Rules | 0 | 80 |
| Domain Configuration | 1 | 14 |
| Email Templates | 13 | 24 |
So how many objects do you plan to have in your deployment? How should you think about these policy objects with relation to performance?
Things to consider:
- Sets – Every operation in the system must be evaluated for it’s impact on a given set. Some are simple transitions like someone’s building has changed, but others have cascading effects such as a manager change impacting other objects indirectly
- MPRs – MPRs have two primary uses granting rights, & triggering WFs. As you build these out you will find you may increase your number of sets to capture the various states you expect objects to transition in & out of. As these could then trigger WFs & additional work in the system, you will need to be aware of this.
In RC0 we found a problem with our ability to scale with the number of MPRs in the system & as such have done significant work to help improve this. For example beyond the above set of MPRs we ship OOB, we have added 400 additional MPRs. Similarly we are doing testing around other core system object types to ensure we can meet your needs in deployment.
In my next post I will cover scale & load, what scale we are currently using & some questions to think about for load.
My goal with this post is to give you some data on the type of hardware we are using internally for our testing, which you could use to help inform your own hardware needs. This is the hardware we are using to evaluate the product for release & that it will be able to support the workloads needed in MSIT. This hardware is not configured or meant for production, but just for testing. An example of this is that we have our hard drives configured in RAID 0 to get the most out of fewer drives in our testing, but you would likely have yours setup for redundancy. In which case you could add additional drives to get similar results.
Similar to in my previous post about our topologies, there are two basic classes of hardware we use in our testing. What I will refer to as a standard machine & a performance machine.
| | Standard Machine | Performance Machine |
| CPU | 1x4 Core Core2 Q6600 2.4 GHz | 2x4 Core Xeon E5410 2.33 GHz |
| Memory | 4 GB | 32 GB |
| Hard Drive | Single Hard drive | 8 – 136 GB 10k Hard Drives |
On the performance machine we have the 8 hard drives currently allocated as follows. Your situation will be different but we have standardized our machines on a single drive configuration for both Sync & FIMService to allow us to use machines for either depending on our testing needs.
| | Drive Setup | Purpose |
| C | 1 – 136 10k Hard Drive | OS\Applications |
| E | 1 – 136 GB 10k Hard Drive | SQL Logs (ldf files) |
| F | 6 – 136 GB 10k Hard Drives (RAID 0) | SQL Data Files (mdf files) |
In each of our performance rigs we have 2 performance machines & then as many standard machines as needed based on the topology. In our standard topology we use the standard machine for a client & other test machines for Visual Team System (VSTS) to generate our load.
The performance machines thus far have always been allocated to the SQL server as that is where the primary horsepower is needed for both Sync & FIMService. Each of these components use resources in a different manner.
Synchronization Service
Synchronization service (formally MIIS) the primary place we have observed the most bang for your buck is in the disk subsystem. Increasing your disk throughput will increase the performance of sync the most. Most of the time if you observe disk IO of Sync you will see the disk under the heaviest utilization.
Resource Utilization of Synchronization Service
Notice the white line at the top is our SQL server drive
FIM Service
FIM Service leverages SQL to do a large amount of our processing, if you ran SQL profiler on the system you would observe some very large queries being executed. SQL queries handle many of the core services of our product including query evaluation, rights enforcement, & set transitions. As such for the FIMService db, having higher CPU power will give you the best increase in performance.
Resource Utilization of FIMService
From the above you can see that occasionally we have spikes of usage in memory & the F drive which corresponds to the SQL data file drive in this rig. Furthermore, in this case I have the FIMService & SQL Server on the same machine, so we can compare CPU utilization of both processes to see where the processing power is being used most.
FIMService vs. SQL Server CPU Utilization

From above the highlighted black line is the CPU usage of SQL Server, the light blue line is the CPU usage of Microsoft.ResourceMangementService. This demonstrates at least at this point in time that the primary consumer of your CPU & where you should invest your CPU power is your SQL Server.
In my next post I will discuss policy objects & how we are thinking about them in our performance testing.
Customers have very different deployment needs for their product, all of these driven based on their various business requirements. For ourselves we have taken a two pronged approach for testing the performance of FIM 2010 in different topologies. We typically test in what I will refer to as our Standard topology & then we do additional testing in what I will refer to as our NLB topology.
Standard Topology
This is the standard configuration we generally test in. We look for bottlenecks & may adjust this over time if we find we need to use a difference configuration.
NLB Topology
This is our more complex topology intended to give us insight into how the product performs both functionally & performance wise in a more complex deployment.
What topology are you planning to use for your deployment? Are there any specific situations you need to handle?
In my next post, I will talk about the hardware we are using in these topologies & cover what we are seeing for resource utilization & bottlenecks.
One of the areas I have been focused on recently has been testing the performance of our product. With the release of RC1, I thought I would start off with some insight into what you can expect with the product.
A key problem with figuring out the performance of a given product is the number of variables that impact the results you observe, this is especially around performance. Many customers have a simple question of can your product support my company of size X. While on the surface this is a simple question there are a large number of variables that play into the answer of that question. For FIM 2010 there are a couple key pieces of information needed in evaluating what that answer is.
- Topology - What is the topology you plan to deploy the product in. Will SQL Server be on the same box as the FIMService? Will you be using a Network Load Balancer?
- Hardware - What hardware are you running on each piece of your topology? What is CPU, Memory, Disk, Network? How are your drives configured? How is SQL configured to store your files?
- Policy Objects - Policy objects are a key component of FIM 2010. These include Sets, Management Policy Rules, Schema, Workflows, Sync Rules, etc. Depending on how you configure these, there will be additional work that the product must do & that will impact your performance.
- Scale - Typically scale is talked about in terms of the number of users, but in the case of FIM you also need to think about the other object types in the system depending on the solution you are deploying. How many groups & of what type? Do you have calculated groups? Do you have custom object types you are managing like computers?
- Load - How do you expect your system to be used? How often do you expect someone to create a group? What type of load do you expect from Password Reset deployment? Do you expect users to use the Portal or the Outlook Add-in for Office 2007 more?
How you answer each of these questions will likely impact the performance of the product. This is a classic problem as a tester we often find a matrix of variables & then need a way to help answer some of these questions. My goal here is not to give you a definitive answer for your specific case, but instead to share information directly with you of how we have approached our testing which can then inform you in your deployment.
For our own internal testing we have worked to leverage feedback of our customers & most specifically MSIT to model a basis of how our product will be deployed and perform. From that we have been working to expand this model to then see how changes to some of these variables then impact our performance.
In my next few posts I will discuss how we have approached each of these items, both for planning our testing & our eventual deployment.
Today FIM 2010 RC1 has been released. There have been alot of improvements since RC0 which we believe you will enjoy, so download it, try it, & let us know your feedback.
http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=4bb3f16b-27f8-4c1d-922f-2c7b522d9ad6
After about 7 years at Microsoft, I have decided to join the role of bloggers. As such I should start with a quick introduction about myself.
I have been working at Microsoft over the past 7 years as a tester in what was originally Microsoft Metadirectory Services. In that time I have worked on shipping Microsoft Identity Integration Server 2003 (MIIS), Identity Lifecycle Manager 2007 (ILM) , & most recently Forefront Identity Manager 2010 (FIM). As part of FIM 2010 I have been the test lead responsible for several areas of our product including Schema, Sets, Management Policy Rules, Mail Services, Web Services, FIM MA, & Performance.
Over the course of these releases I have had the opportunity to learn from our customers & help improve the quality of our product. As a tester I have found the information gained from our customers invaluable in our ability to test & improve the quality of our product. An example of how your feedback has helped us is if you are familiar with the MIIS 2003 Resource Kit 2, included in that package are two tools. (MIIS Dynamic Help & MIIS Provisioning Assistant) I had developed based directly on feedback received by our customers.
Additionally, I have found there is a wealth of knowledge about the product gained over the course of product development as we debug, investigate & fix issues in the product. My goal is to continue that two way communication where we continue to respond to your feedback to improve the product & we also help share information on how to debug, investigate, & implement our products.
If there are question you have, or topics of interest please feel free to let me know & I will see what I can do to address these.