Cindy Gross: Small Bites of Big Data, Small Data, All Data

Small Bites of Big Data, Small Data, All Data for Hadoop, SQL Server, Hive, Distributed Systems, Scale Out....

Browse by Tags

Tagged Content List
  • Blog Post: Hadoop Likes Big Files

    One of the frequently overlooked yet essential best practices for Hadoop is to prefer fewer, bigger files over more, smaller files. How small is too small and how many is too many? How do you stitch together all those small Internet of Things files into files "big enough" for Hadoop to process...
  • Blog Post: Understanding WASB and Hadoop Storage in Azure

    Yesterday we learned Why WASB Makes Hadoop on Azure So Very Cool . Now let's dive deeper into Windows Azure storage and WASB. I'll answer some of the common questions I get when people first try to understand how WASB is the same as and different from HDFS. What is HDFS? The Hadoop Distributed File System...
  • Blog Post: Why WASB Makes Hadoop on Azure So Very Cool

    Data. It’s all about the data. We want to make more data driven decisions. We want to keep more data so we can make better decisions. We want that data stored cheaply, easily accessible, and quickly ingested. Hadoop promises to help with all those things. However, when you deal with Hadoop on-premises...
  • Blog Post: Windows storport enhancement to help troubleshoot IO issues

    For Windows 2008 and Windows 2008 R2 you can download a Windows storport enhancement (packaged as a hotfix). This enhancement can lead to faster root cause analysis for slow IO issues. Once you apply this Windows hotfix you can use Event Tracing for Windows (ETW) via perfmon or xperf to capture more...
  • Blog Post: What do those "IO requests taking longer than 15 seconds" messages on my SQL box mean?

    You may be sometimes seeing stuck/stalled IO messages on one or more of your SQL Server boxes. This is something it is important to understand so I am providing some background information on it. Here is the message you may see in the SQL error log: SQL Server has encountered xxx occurrence...
  • Blog Post: Compilation of SQL Server TempDB IO Best Practices

    It is important to optimize TempDB for good performance. In particular, I am focusing on how to allocate files. TempDB is a unique database in several ways. The ones most relevant to this discussion are: · It is often one of the busiest databases on an instance. This means the performance...
  • Blog Post: SQL Server with NetApp SAN

    If you are planning to use NetApp as the SAN for your SQL Server instance(s), take a look at these documents in addition to the normal SQL Server IO planning best practices documents. TR-3779 Sizing best practice guide. http://media.netapp.com/documents/tr-3779.pdf TR-3696 This is for the storage...
Page 1 of 1 (7 items)