As you know from my recent posts, I've been hanging out at 1,200 concurrent Subscribers trying to boost performance. When I first hit the 1,200 Subscriber mark, I was able to change and replicate ~13 million rows per hour.  I was happy with the scalability, but the performance was no better than what I achieved with 600 concurrent Subscribers.  Rather than push the scalability envelope out to 1,800 or 2,400 Subscribers, I decided to tweak, poke and prod my portable data center until I could get better performance at the 1,200 level.  I succeeded with the architecture you see below:

1200 Concurrent Subscribers

Sometimes more is more.  Sometimes less is more.

Knowing that the ISAPI DLL running on IIS is the biggest bottleneck in the system, I decided to scale out to 6 IIS servers in addition to my separate SQL Publisher and SQL Distributor servers.  The 2 SQL Servers have 8 cores and 16 GB of RAM while the 6 IIS servers contain 2 cores and 2 GB of RAM.  Each IIS server would accomodate 200 concurrent clients each.  In the last week of December 2007, I throttled back the MAX_THREADS_PER_POOL registry setting on the IIS servers from the default of 20 to just 3 and ran my test harness.  This resulted in the changing and replicating of ~15 millions rows per hour; a boost of 2 million rows per hour over my previous test.  Using fewer threads on each IIS box meant lower memory and CPU utilization across the board.  Instead of overwhelming SQL Server will lots of threads trying to perform work all at the same time, SQL Server got to chill out and thus processed each sync much faster.  This was great news so I pushed the fewer threads experiment even further.  I executed my test harness with 2 threads and then just 1 thread per IIS server.  Using just 1 thread resulted in the changing and replicating of ~18 million rows per hour; a 3 million row per hour boost over using 3 threads per IIS server.

At ~21 million row changes per hour, 2 threads per IIS server is the sweet spot!

  • Rows changed:  5,826 per second | 349,600 per minute | 20,976,000 per hour | 503,424,000 per day
  • Bytes per row: 116
  • Data replicated:  2.3 GB per hour  |  55 GB per day

The longest and average sync times dropped significantly over the first results I got with 1,200 concurrent Subscribers:

  • Longest sync time: 14 minutes
  • Shortest sync time: .6 seconds
  • Average sync time: 3 minutes, 38 seconds

The IIS didn't break a sweat:

  • IIS1: CPU: 11%  |  Mem: 172 MB  |  Network Utilization: 89%
    • Disk I/O: OS: .3%, ISAPI: 3.8%
  • IIS2: CPU: 8%  |  Mem: 167 MB  |  Network Utilization: .91%
    • Disk I/O: OS: .2%, ISAPI: 3%
  • IIS3: CPU: 6%  |  Mem: 171 MB  |  Network Utilization: .82%
    • Disk I/O: OS: .2%, ISAPI: 2.8%
  • IIS4: CPU: 7% | Mem: 171 MB | Network Utilization: .71%
    • Disk I/O: OS: .5%, ISAPI: 3.4%
  • IIS5: CPU: 6% | Mem: 152 MB | Network Utilization: .92%
    • Disk I/O: OS: .3%, ISAPI: 2.3%
  • IIS6: CPU: 8% | Mem: 151 MB | Network Utilization: 1%
    • Disk I/O: OS: .3%, ISAPI: 2.6%

The CPU was finally well-utilized (after dozens of tests that never went higher than 35%) on the SQL Publisher and the SQL Distributor's disk that held the transaction log was pegged (which means it could use some RAID 0 or 10 medicine).

  • SQL Distributor: CPU: 9%  | Mem: 2.32 GB  |  Network Utilization: .64%
    • Disk I/O: OS: .7%, SQL: 1%, DB: 16.8%, LOG: 100%, Snapshot Share: 1%
  • SQL Publisher: CPU: 74%  |  Mem: 4.19 GB  |  Network Utilization: 4%
    • Disk I/O: OS: 1.1%, SQL: 13.7%, DB: 1%, LOG: 22.6%

I'm very pleased with these results as they represent the kind of scalability and performance that our clients are looking for when they're considering building and rolling out a mobile line of business application.  As usual, the low memory and CPU utilizaiton on the IIS servers will lead architects to think that using 6 load-balanced boxes is wasteful and they deserve to be consolidated.  I've been down that path and the place that I've arrived at today tells me that the ISAPI DLLs are exhausted long before you can detect any strain on the IIS server.  That being said, the use of fewer threads means that I don't need the memory and CPU power I once thought I needed.  Lower-end IIS servers could be purchased or perhaps consolidation could happen by deploying them as virtual images inside Hyper-V on Longhorn or Virtual Server on Windows Server 2003.  Definitely something worth looking at.

In the near term, you should expect to see me push the scalability envelope to the 1,800 and/or 2,400 concurrent Subscriber level in an effort to see what it takes to saturate a single SQL Server box.  Along the way, I will take a look at virtualization options to see how well they work out.  Lastly, you'll see me persue "Republishing" architectures with SQL Server in an effort to make Mobile Merge Replication scalable enough to support hundreds of thousands or millions of Windows Mobile devices.  Only then could you consider using this technology for large-scale consumer applications with a national or global reach.  Remember, Windows Mobile 6 comes with a built-in content synchronization engine called SQL Server Compact 3.1.  When you start thinking big, you realize that we could use this technology to push intelligent advertising to devices or build the next global social networking platform designed for people on the go.

See you at TechReady 6!

- Rob