SQL Remote Blob Storage Team Blog

Remote Blob Storage (RBS) is a library API set that is designed to move storage of large binary data (BLOBs) from Microsoft SQL Server to external storage solutions.

  • SQL Server Remote BLOB Store and FILESTREAM feature comparison

     

    With the recent refresh of the RBS Feature Pack a brief comparison between RBS and the SQL Server FILESTREAM feature may be useful.

     

    FILESTREAM and RBS are complementary approaches for storage and management of BLOBs and we see them both continuing to evolve.

    ·         FILESTREAM provides a storage option that allows storage, efficient streaming and integrated management of large BLOBs in a SQL database by utilizing the underlying NTFS file system for BLOB storage/streaming. It offers fully transactional access and compatible operations as varbinary(max).

    ·         RBS is a set of  standardized APIs that allow storage/retrieval of BLOBs outside of your main SQL database where a dedicated BLOB store is desirable for various reasons. This uses a provider model for plugging in any dedicated BLOB store that implements these RBS APIs.

    o   We have also built an out-of-the-box RBS FILESTREAM provider that allows a deployment to use a SQL Database (local or remote) as a dedicated BLOB store. This provider utilizes the FILESTREAM as the BLOB storage mechanism and ties the two technologies together.

     

    Both FILESTREAM and RBS will have continued investment over future releases of SQL Server. These are SQL Server features that application developers should feel comfortable taking dependencies on.

  • Remote Blob Storage November CTP Refresh Available

    The SQL Server 2008 R2 November Community Technology Preview Feature Pack is available for download now. This includes a refresh of the Remote Blob Storage package.

     

    Significant enhancements have been made in this release for performance and integration with SharePoint 2010. These changes include implementing the IDisposable interface in the SqlRemoteBlobCollectionManager and SqlRemoteBlobContext classes. This allows RBS to manage the lifetime of resources used by these objects and reuse them through a pooling mechanism.

    While this is not needed to ensure correctness, in order to take full advantage of the performance changes existing applications must be modified to use the IDisposable pattern:

     

    SqlRemoteBlobContext context = new SqlRemoteBlobContext(conn);

    // RBS operations

     

    Should be replaced with

     

    using (SqlRemoteBlobContext context = new SqlRemoteBlobContext(conn))

    {

        // RBS operations

    }

     

    Note that this release contains breaking changes preventing access to data stored in FILESTREAM blob stores using previous CTP releases. Uninstalling previous versions and removing the FILESTREAM blob store data prior to installing this one is recommended for configurations using FILESTREAM stores.

     

    SQL Server 2008 R2 November CTP Feature Pack link:

    http://www.microsoft.com/downloads/details.aspx?familyid=020EE0D5-BCE4-4A45-9D64-B0C49C8831E5&displaylang=en

     

    Install and configure Remote BLOB Storage on SharePoint Foundation 2010:

    http://technet.microsoft.com/en-us/library/ee663474(office.14).aspx

  • SQL Server 2008 R2 Remote Blob Store August CTP Available

    Updated RBS Packages are available as part of the SQL Server 2008 R2 August CTP release. This includes the new default FILESTREAM RBS provider along with many other enhancements and bugfixes.

    This provider is the first officially supported RBS Provider released by Microsoft and begins to bridge the gap between RBS and Filestream, allowing applications to code against the RBS API and use Filestream as the blob store. This is the default provider for RBS and will enable fast developer startup when investigating the SQL / RBS stack.

    In addition to the new provider, several features were added to the core RBS client libraries. These include asynchronous provider requests, deferred deletion of collections, maintainer optimizations and support for provider specific consistency checks.

    Administrators and developers can look forward to significant performance enhancements when these are made available in future CTP releases.

     

  • Exceptions to be Thrown by BlobStore Implementations

    I recently got a question on what Exceptions a BlobStore implementation should throw and this blog post answers that question. BlobStore implementations should throw either BlobStoreException or other exceptions.

     

    BlobStoreException is meant to be used for conditions that RBS client library expects and knows how to handle (either now or in the future). For this reason, there is a defined list of cases where BlobStoreException should be thrown. This is the list of values of the BlobStoreExceptionCode enum. If a condition falls in this list, then a BlobStoreException must be thrown. If a condition does not fall into this list, then a different exception must be thrown.

     

    So the next question is: which exception should a BlobStore implementation throw? .NET Framework guidelines say that as far as possible, we should throw system exceptions. This helps with easier handling of the error, possibly by higher level applications or by admins. In the case of RBS, the client library wraps any non-BlobStoreException Exceptions in another exception (RemoteBlobStoreException with ExceptionCode = BlobStoreUnhandledException) and sets the InnerException to the original exception thrown by the BlobStore. So applications will need to catch and handle this new exception. If LogLevel is configured to Error or higher, the log will also contain details of the exception as well.

     

    In general, it would be very rare to have a condition not covered by the list in BlobStoreExceptionCode. One example where it is not covered is present in the sample provider. The sample FileStoreLibrary throws non-BlobStoreExceptions such as ArgumentNullException. In this case, if this exception gets thrown, it indicates a bug in RBS client code, and it should not be hidden by a BlobStoreException.

     

    If other components/dlls called by a blob store provider can throw exceptions such as OutOfMemoryException, the BlobStore implementation should be careful about letting it bubble up directly. Often times the correct thing to do is to throw a BlobStoreException with BlobStoreExceptionCode set to OperationFailedAuthoritative. Remember, the list of conditions indicated by BlobStoreExceptionCode are the ones that RBS client library knows how to handle, and it always takes precedence.

     

    - Pradeep.

  • Provider Specific Configuration Keys

    When developing a provider it's possible to create provider specific configuration keys that administrators can use to change parameters for your provider. These must be defined at setup time in the machine.config registration of the provider otherwise any attempt by the application to create a ConfigItem object and pass it to the RBS client library will fail. This can be specified as follows:

    <Provider name="file">
        ...
        <
    ProviderSpecificConfigKey name="pool_capacity" format="Number"/>
    </
    Provider>

    Formats allowed are Number, Name, Boolean, Binary and Duration. When registering the blob store with the RBS instance, the stored procedure mssqlrbs.rbs_sp_add_blob_store is used. This takes an XML configuration object, where specific values for these custom configuration options can be specified.


    <
    blob_store_config>
        <
    common>
        ...

        </
    common>
        <
    extended>
            <
    config_item_list>
                <
    config_item key="pool_capacity" value="7" />
            </
    config_item_list>
        </
    extended>
    </
    blob_store_config>

    Correct registration of the provider and installation using the administrative stored procedures can be done using the InstallProvider.exe tool (full source and binaries available) from http://www.codeplex.com/sqlrbs.

    - mike 

  • RBS BlobId Columns in Application Tables - Registering and Indexing

    I recently got a question on RBS BlobId columns needing to have an index. This blog post expands on the answer. Feel free to send us any questions using either direct email, comments on this blog or the "Discussions" tab on the RBS Codeplex site.

     

    Registering and Unregistering

     

    An application needs to register its RBS BlobId columns by calling the stored proc rbs_sp_register_column, once for each BlobId column. This is a required step for all RBS applications and RBS behavior is undefined if this is not done. Similarly, an RBS BlobId column can be un-registered by calling rbs_sp_unregister_column. This stored proc should be called before dropping/renaming a registered column or dropping/renaming the table containing a registered column. This must be done even if the intent is to create another table/column with the same name soon after dropping them. The view rbs_columns shows the list of currently registered columns.

     

    Indexing

     

    RBS BlobId columns in application tables should be indexed - for each BlobId column there should be a separate index with that column as the only index key column. If such an index is not present for a column, the stored proc rbs_sp_register_column gives a warning (raises error with level 10): Column <ColumnName> in Table <SchemaName.TableName> is not indexed.

     

    This index is needed for garbage collection to work efficiently. The Reference Scan phase of garbage collection does a diff: (BlobIds RBS knows about) EXCEPT (BlobIds referenced by application tables). The resulting list of Blobs are the ones that were deleted by the application and should garbage collected. Since the number of blobs can be very large, this diff is done for a few blobs at a time. This translates to multiple range queries. For such a query to be efficient, we need indices on all BlobId columns.

     

    Note that garbage collection will still work even if the index is not present, but it will be terribly slow on large tables. So, it is highly recommended that an index is created on RBS BlobId columns.

     

    - Pradeep

     

  • RBS Garbage Collection Settings and Rationale

    I recently got a question on RBS Garbage Collection settings and their usage. So I decided to write this blog post describing the different parts of GC and the associated settings.

     

    RBS Maintainer does garbage collection (GC) in 3 phases:

    1.       Reference Scan (RS) - Look through the application tables and find blobs that are no longer referenced by the application. The list of registered RBS columns is used for this purpose. BlobIds must not be stored in any place other than the registered columns. The blobs that are no longer referenced by the application are marked to be deleted.

    2.       Delete Propagation (DP) - Blobs marked for deletion are actually deleted from the blob store. There is a gap between when the blobs get marked for deletion and when they are actually deleted. This gap duration can be configured using the "garbage_collection_time_window" config item and defaults to 30 days. The reason for having this GC time window is to allow restoring old backups of the RBS database. Backups as old as the time window (e.g. 30 days) can be restored and all blobs that were referenced by that database are guaranteed to be present in the blob store. If we had deleted the blobs immediately, restoring an old backup will lead to dangling pointers (some blobs referenced by the application are not present in the blob store). Having this gap ensures that if an old backup of the database is restored, blobs that were referenced by the application at the time the backup was taken (but deleted by the application later) are still present in the blob store. For this reason, this config item must be set to the SLA time period for backup/restore.

    3.       Orphan Cleanup (OC) - All blobs in the blob store are enumerated and we compute the list of blobs that are present in the blob store but are not known to RBS. These blobs are "orphans" and can be caused due to aborted transactions, application misbehavior or other failures. Orphan blobs created before the GC time window are deleted from the blob store.

     

    In addition to the GC time window setting, there are 2 more config items related to GC: "delete_scan_period" is the time period for running one scan of RS and DP phases of GC. After one pass of RS and DP is completed, attempting to run them again within this time period will just skip RS and DP and do nothing. Similarly, "orphan_scan_period" is the period for the OC phase of GC. These are also 30 days by default.

     

    These settings can be set by calling rbs_sp_set_config_value. The format for these config items is: 'days n' where n is a positive number. For testing purposes, it can also be set to sub-day durations using the format 'time hh:mm:ss'. A smalldatetime field is used internally, so the precision of this setting is 1 minute. It can also be set to 0 using 'time 00:00:00'.

     

    The actual work of GC is done by the RBS Maintainer application. The maintainer is a console application that takes command line parameters such as the connection string to the database and the phases of GC to execute. This can be run from any machine that has access to the DB and the blob store(s). It can also be run from multiple machines simultaneously. You can schedule it using your favorite scheduler e.g. Windows Task Scheduler.

     

    Maintainer also takes an optional parameter to limit the amount of time it is run. Here is a scenario showing how this time limit can be used in conjunction with the scan_period settings above for a GC schedule:

    ·         Maintenance window in production environment is 2 hours every day from 2 AM to 4 AM.

    ·         A complete GC pass takes 6-10 hours to run.

    ·         Customer wants to run one GC pass every week (7 days).

    Solution:

    ·         Schedule a Maintainer.exe task to run every day at 2 AM, and include the command line option "-TimeLimit 120" - this will stop maintainer after running for 2 hours, even if GC pass is not complete. The time limit is specified in minutes.

    ·         On the database, set the RBS config values of delete_scan_period and orphan_scan_period to 7 days.

     

    This way, Maintainer.exe will run on Mon-Thu (assuming 7 hours run time) and then do nothing on Fri-Sun instead of starting a new GC scan. Next Monday, a new GC scan is started.

     

    For testing purposes, you can do the following to verify that blobs are getting deleted from the blob store:

    1.       Set garbage_collection_time_window and delete_scan_period to 'time 00:00:00'

    2.       Delete BlobIds from the application table

    3.       Run Maintainer.exe, specifying RS and DP phases

    Blobs should get deleted from the blob store at this point.

     

    - Pradeep

  • RBS Introductory Post from Jose Barreto

    Jose Barreto, a member of the Storage Solutions Division Team here at Microsoft has put together a great post introducing RBS.

    http://blogs.technet.com/josebda/archive/2008/07/29/sql-server-remote-blob-store-rbs-available-as-part-of-the-feature-pack-rc0-for-microsoft-sql-server-2008.aspx

     - mike

  • BlobId vs BlobReference, Collections

    Pradeep just sent out a good description of the differences between an RBS BlobId and BlobReference token, and a quick description of the current state of the collection architecture.

    Both a BlobId and BlobReference are generated by RBS. The ID from the provider (called StoreBlobId) is not exposed to the application directly. The BlobId is about 20-32 bytes and needs to be stored by the application in a registered RBS column (of type varbinary(64)). This is required (storing BlobId); if it is not stored, GC (garbage collection) will delete the blob. While reading a blob, the application can specify either the BlobId or BlobReference. The BlobReference can be retrieved from the database using the TSQL function mssqlrbs.rbs_fn_get_blob_reference(blob_id). The BlobReference contains all the data needed to locate the blob in the blob store and is typically larger than the BlobId. If only the BlobId is specified to read a blob, RBS will internally go to the database and retrieve the BlobReference and then proceed to read the blob data. So, the advantage of using the BlobReference instead of directly using the BlobId is that it avoids the extra network round-trip to the database.

    Collections are logical groupings of blobs and are also the unit of migration (from one DB to another). Migration is not yet implemented, but will be implemented in the next version of RBS. Those applications that want to be able to migrate a set of blobs from one DB to another (perhaps for load-balancing reasons) need to create collections and keep track of them and use them when creating RBS blobs. For other applications that don't need the complexity, using the default collection (CollectionId = 0) should be enough.

    - mike

  • RBS Sample Application Released

    The latest release on the RBS CodePlex sample site (http://www.codeplex.com/sqlrbs) contains a sample application demonstrating the various ways to store and fetch blobs using the RBS interface. This includes basic store / fetch, specifying collections / blob stores for store calls, use of the push and pull streaming APIs, and an example that uses the System.IO.Stream APIs to do asynchronous reads from the blob store.

    These samples have been designed to demonstrate best practices for using RBS, including correct Transaction usage, SQL connection information and BlobReference / BlobId optimizations. They require a database and client that has correctly installed RBS and installed a provider to run.

    Feel free to leave comments or file CodePlex issues if you have problems or need assistance.

     - mike

     

  • Remote Blob Storage RC0 Release

    Hi, and welcome to the Remote Blob Storage team blog. This first post is to announce our client release as part of SQL Server RC0's Feature Pack downloads, as well as point people to the RBS Samples link on CodePlex.

    Remote Blob Storage (RBS) is a library API set that is designed to move storage of large binary data (BLOBs) from Microsoft SQL Server to external storage solutions.

    With RBS, blob data is stored in external storage solutions such as Content Addressable Stores (CAS), commodity hardware with data integrity and fault-tolerance systems or mega service storage solutions. A reference to the blob is stored in the database. An application stores and accesses blob data by calling into RBS client library. RBS manages the lifecycle of the blob by doing garbage collection as and when needed.

    RBS gives applications the ability to use rich relational capabilities of SQL Server for their structured data along with capabilities of dedicated storage solutions for their unstructured data in a transactionally consistent manner. Many applications try to do this today on their own, attempting to tie blob data and metadata together with varying degrees of success. With RBS, this functionality is included in the SQL Server platform along with additional services around it. This is useful for applications such as Sharepoint and other document management software, archival systems etc.

    RBS is designed as an add-on that can be applied to Microsoft SQL Server 2008 and above. It uses auxiliary tables, stored procedures and a managed client library to provide its services. A reference to the blob (provided by the Blob Store) is stored in RBS auxiliary tables and an RBS Blob ID is generated. Applications store this RBS Blob ID in a column in application tables, however this is not a new data-type; it is just a simple varbinary(64).

    ISVs and storage solution vendors can create their own RBS Provider Library to allow the use of custom stores with applications written against the RBS API set. This will be explained in further detail in future blog posts - for now you can download a sample provider from the RBS Samples link on CodePlex.

    Current public issues with the RBS release are also tracked at the CodePlex RBS Work Item list.

    - mike


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker