A couple years ago I wrote a prior post on how a simple VBS script can be used to extract a document from a SharePoint 2003/WSS 2.0 database (Recover Documents from SharePoint 2003 Database). After seeing the traffic routing to the post and with the adoption rate of MOSS 2007 it looks like an update is necessary.
This approach obviates the need to restore a content database into a MOSS 2007 environment if the intention is to extract a few critical documents.
A few disclaimers are necessary. This script goes directly against a MOSS 2007 content database which is generally discouraged. Any code using the database directly will not be supported by MS Product Support Services. Instead, the SharePoint and WSS APIs are the way to go. If the database is modified directly rather than through the published SharePoint and WSS APIs Product Support Services cannot properly troubleshoot any unexpected issues. This script reads from the database so we're safe from errant modifications. However, the data structure the script queries could change in a future service pack.
With that out of the way, let's proceed.
The script queries the dbo.AllDocs table which contains the application documents and retrieves the most current version based on the document name which is then streamed out as binary data to a file:
server = "[SERVERNAME]"contentDatabase = "[CONTENTDATABASE]"leaf = "[LEAFNODE]"outputPath = "[OUTPUTPATH]"
ExtractDoc server, contentDatabase, leaf, outputPath
Sub ExtractDoc(server, contentDatabase, leaf, outputPath)
Dim conStr, selectStr
conStr = "Provider=SQLOLEDB;data Source=" + server + ";Initial Catalog=" + contentDatabase + ";Trusted_Connection=yes"
selectStr = "SELECT dbo.AllDocStreams.Content FROM dbo.AllDocs " selectStr = selectStr + "INNER JOIN dbo.AllDocStreams " selectStr = selectStr + " ON dbo.AllDocs.ID= dbo.AllDocStreams.ID " selectStr = selectStr + " AND dbo.AllDocs.Level = dbo.AllDocStreams.Level " selectStr = selectStr + " where LeafName='" + leaf +"' AND IsCurrentVersion=1"
Set cn = CreateObject("ADODB.Connection") Set rs = CreateObject("ADODB.Recordset") cn.Open conStr Set rs = cn.Execute(selectStr) Set mstream = CreateObject("ADODB.Stream") mstream.Type = 1 mstream.Open mstream.Write rs.Fields("Content").Value mstream.SaveToFile outputPath, 2 rs.Close cn.Close End Sub
Copy this code into Notepad and replace [SERVERNAME], [CONTENTDATABASE], [LEAFNODE] and [OUTPUTPATH] with appropriate values. Save this file as a VBS script and execute from the command line as:
The SQL Query is a bit more complicated than the SharePoint 2003 version. It joins the dbo.AllDocs table with the dbo.AllDocStreams table which actually contains the blob Content field. There is also a dbo.AllDocVersions table, however, with versioning enabled this table does not appear to be updated as new versions are added. With each new version a new row is added to both the dbo.AllDocs and dbo.AllDocStreams tables. Conveniently, there is an IsCurrentVersion boolean field in the dbo.AllDocs table. The join between the dbo.AllDocs and dbo.AllDocStreams is done between the mutual uniqueidentifier ID fields and a Level field which appears to increment with each new version.
The LeafNode is the name of the file to retrieve. This sample script assumes that the document is in the root of the containing document library. If it were in a subdirectory an addtional DirName would need to be used in the query and passed as a parameter.
NOTE: This was tested with a MOSS 2007 content database. This was not tested with a WSS 3.0 content database, however, I expect the schema is the same.
This is certainly not something to use in a production environment where automated document retrieval must be a repeatable and reliable process. But it is a quick and dirty means of extracting a document from a restored database that spares you from the overhead of restoring a SharePoint/WSS environment.
This was written using a simple VBS script so that production support folks can use it easily without having to compile a .NET assembly.
How to recover SharePoint document once deleted from recycle bin - He talks about using textcopy, which is part of the SQL Server 2000 Resource Kit, to perform the same task. In addition he has a number of screenshots and additional instructions which may prove helpful. The blog entry mentions that, although textcopy is not officially supported for use with SQL Server 2005, it does work. The kit is available for download if you are a MSDN subscriber. Otherwise, it's included in the book,
PingBack from http://blogs.msdn.com/field_notes/archive/2006/09/17/recover-documents-from-sharepoint-database.aspx
Excellent timing, i spent some time yesterday trying to sort this out. Works perfectly thank you.
With your tool I finally restore a file that was overwrite (not deleted) by a user with other different file.
First I restore the previous backup in a test sharepoint environment, only the content database, and then use your tool to extract it from the restored database.
Perfect!!! very helpfully.
Thanks a lot.
In the case of an overwritten file in a document library it sounds like versioning is enabled. You may have been able to recover the document by downloading the specific version rather than through the use of this script. Nonetheless, I'm glad it worked for you.
I recently threw together a little tool I thought might be helpful to those who find your script helpful. It is instead a .NET 3.5 Windows Application version of the script you have above. It is completely untested in any environment other than my own and am not entirely sure yet if it will do the trick for anyone else, but, if you don't mind being an alpha tester of sorts, you're welcome to play with it. There is a link to the application on my website, or you can go directly to http://www.jonmedel.com/MOSSFileGrabber.rar
Please let me know if this tool is helpful to anyone! Jon
Fantastic, thank you! I needed to extract one particular version of a document that MOSS was seeing as corrupt.
Maybe this will help someone else.
I started with a SQL query to find the correct version:
The sql query that I used to identify the doc and version:
SELECT dbo.AllDocs.TimeLastModified fields , dbo.AllDocs.DirName, dbo.AllDocs.LeafName, dbo.AllDocs.Version
INNER JOIN dbo.AllDocStreams
ON dbo.AllDocs.ID = dbo.AllDocStreams.ID
AND dbo.AllDocs.Level = dbo.AllDocStreams.Level
where dbo.AllDocs.LeafName='my file name aka 'LEAFNODE'
I then replaced 'AND IsCurrentVersion=1' with 'dbo.AllDocs.Version=[version number from table]' and successfully extracted the particular version of the file that I needed.
Thanks so much for this excellent utility! Saved me a bunch of hassle!
The script is extremely helpful and I'm sure I will use it - however my current problem is that the file I want to recover has an ' in the name. The script errors out with a syntax error even though i've encased it with "". can anyone tell me how to get around this issue?
When I run this I get the following error..
Error: Write to file failed.
The target path is C:\temp.
Any suggestions would be greatly appreciated.
Sorry for the late response. By now, you've probably figured this out. The output path needs to be a path to a file, not a just a directory.
I have tried to run the script but keep on getting this message:"Microsoft OLE DB Provider for SQL Server: Invalid object name 'dbo.AllDocStreams'.
Any help would be appreciated.
As this is my first time trying to recover document from a share point that crashed.