Holy Toledo!  The growth in data on our dogfood server just isn't slowing down.  The file count (in particular) is really rocketing.  We're now over 33 million (yes, million) files and I don't see any evidence that it's going to slow down soon.  A while ago, I predicted that we'd hit 100 million this year - I'm now starting to wonder if that was conservative.

Following up from last time, we've spec'd out and ordered a new SAN to better be able to handle the I/O load we're puting on the system.  Here's the stats:

  • 43 disks
  • RAID 0+1
  • 4 disk shelves are needed to hold these 43 disks.
  • 31x300GB disks for data
  • 12x146GB disks for Log and TempDB
  • It's arranged as 15 paired 300GB data drives (and 1 spare), 4 paired 146GB TempDB drives and 2 paired 146GB Log drives.  As a conservative estimate, with each disk providing 125 IOPS (I/O's per second), that should give us a capacity of 3,750 IOPS on the data drive.  Currently we are peaking around 500 IOPS in production.  This should give us ample room to grow, especially when added to the fact that we are constantly making improvements that reduce the number of I/Os we do per operation.

    One thing to note about this configuration is that our TempDB drive set is much bigger than Log (most importantly in terms of spindle count, not GB).  We are using 8 drives for TempDB and 4 for Log.  This is because TFS is a pretty heavy user of TempDB.  On our production system, we are seeing TempDB peek at about 200 IOPS and Log at about 100 IOPS.

    As I said, we continue to work on performance.  In the past few weeks, we've made the following improvements:

    • improved the performance of merge by 2X or more (we should see large merges drop from ~30 minutes to ~10-15 minutes).
    • made improvements to the version control History operation.  We’ve seen some queries take up to an hour and frequently up to 10 minutes.  We’ve gotten them down to about 5 minutes and if we can make one more change we’re looking at, we should be able to get it down to 10-20 seconds.
    • are still investigating why syncing the work item tracking cache frequently takes a minute or more resulting in long pauses in the UI.
    • resolved a problem with the NIC settings on the NC proxy server that caused downloads to be 10X slower.

    And on to the actual stats...  As I mentioned, they are growing, growing growing.  But they would be even bigger except that we did a bunch of house keeping this month.  We deleted some old projects that are no longer in use and hundreds of workspaces and thousands of shelvesets that had not been used in over 6 months.  This effort reduced some tables by thousands of rows and others by many millions.

    Users
    Recent users: 747 (up 68)
    Users with assigned work items: 1,958 (up 300)
    Version control users: 1,404 (up 90)

    Work items
    Work items: 103,832 (down 2,000)
    Areas & Iterations: 6,188 (up 100)
    Work item versions: 817,086 (up 45,000)
    Attached files: 32,839 (up 4,300)
    Queries: 10,299 (up 300)

    Version control
    Files/Folders: 33,580,679/5,646,201 (up 14M/3M)
    LocalVersion: 128.8M (up 18M)
    Total compressed file sizes: 243.8G (up 50G)
    Workspaces: 2,174 (down 530)
    Shelvesets: 2,952 (down 1,100)
    Checkins: 100,300 (up 19,000)
    Pending changes: 876,649 (up 480,000)

    Commands (last 7 days)
    Work Item queries: 335,821 (up 85,000)
    Work Item updates: 40,212 (up 3,500)
    Work Item opens: 193,782 (up 36,000)
    Gets: 25,066 (up 11,000)
    Downloads: 10.9M (up 6.5M)
    Checkins: 2,130 (down 1,600)
    Uploads: 212,493 (up 201,000)
    Shelves: 248 (down 200)

    Cheers,

    Brian