设为首页 收藏本站
查看: 739|回复: 0

[经验分享] Inside SharePoint Creating an External Storage Solution for SharePoint(转)

[复制链接]

尚未签到

发表于 2015-9-28 09:56:00 | 显示全部楼层 |阅读模式
Inside SharePoint                                         Creating  an External Storage Solution for SharePoint                                         
Pav  Cherny                                         
                                         
                                                   Code download available at:                                                    ChernySharePoint2009_06.exe                                                   (2,006 KB)                                                   
                                                                                 
                                                   

   DSC0000.gif   Contents  
                                                                Internal  Binary Storage                                                            
                                                             External  Binary Storage                                                            
                                                             Building  an Unmanaged EBS Provider                                                            
                                                             Building  a Managed EBS Provider                                                            
                                                             Registering  an EBS Provider in SharePoint                                                            
                                                             Implementing  Garbage Collection                                                            
                                                             Conclusion                                                            
                                                                                                                                       
Microsoft estimates that as much as 80 percent  of the data stored in Microsoft Windows SharePoint Services ( WSS ) 3.0  and Microsoft Office SharePoint Server (MOSS) 2007 content databases is  non-relational binary large object ( BLOB) data, such as Microsoft  Office Word documents, Microsoft Office Excel spreadsheets, and  Microsoft Office PowerPoint presentations. Only 20 percent is relational  metadata, which implies a suboptimal use of Microsoft SQL Server  resources at the database backend. SharePoint does not take advantage of  recent SQL Server innovations for unstructured data introduced in SQL  Server 2008, such as the FILESTREAM attribute or Remote BLOB Storage  API, but provides its own options to increase the storage efficiency and  manageability of massive data volumes.                                          
Specifically, SharePoint includes an external  binary storage provider API, ISPExternalBinaryProvider, which Microsoft  first published as a hotfix in May 2007 and incorporated later into  Service Pack 1. The ISPExternalBinaryProvider API is separate from the  Remote BLOB Storage API. Third-party vendors can use this API to  integrate SharePoint with advanced storage solutions, such as  content-addressable storage (CAS) systems. You can also use this API to  maintain SharePoint BLOB data on a central file server outside of  content databases if you want to build a custom solution to increase  storage efficiency and scalability in a SharePoint farm. Keep in mind,  however, that this API is specific to WSS 3.0 and MOSS 2007. It will  change in the next SharePoint release, which means that you will have to  update your provider.                                         
In this column, I discuss how to extend the  SharePoint storage architecture using the ISPExternalBinaryProvider API,  including advantages and disadvantages, implementation details,  performance considerations, and garbage collection. I also discuss a  64-bit compatibility issue of Microsoft Visual Studio that can cause  SharePoint to fail loading managed ISPExternalBinaryProvider components  despite a correct interface implementation. Where appropriate, I refer  to the ISPExternalBinaryProvider documentation in the WSS 3.0 SDK.  Another reference worth mentioning is Kyle  Tillman's blog.                                          
Kyle does a great job explaining how he  mastered the implementation hurdles in managed code, but neither the WSS  3.0 SDK nor Kyle's blog post includes a Visual Studio sample project,  so I decided to provide ISPExternalBinaryProvider samples in both  unmanaged and managed code in this column's companion material. The  purpose of these samples is to help you get started if you are  interested in integrating external storage solutions with SharePoint.  Remember, though, that these samples are untested and not ready for  production use.                                         
                                         
Internal Binary Storage                                         
By default, SharePoint stores BLOB data in the  Content column of the AllDocStreams table in the content database. The  obvious advantage of this approach is straightforward transactional  consistency between relational data and the associated non-relational  file contents. For example, it's  not complicated to insert the metadata  of a Word document along with the unstructured content into a content  database, nor is it complicated to associate metadata with the  corresponding unstructured content in select, update, or delete  operations. However, the most obvious disadvantage of the default  approach is an inefficient use of storage resources. Despite an I/O  subsystem optimized for high performance, the SQL Server storage engine  is not exactly a file-server replacement.                                         
A SQL Server database consists of transaction  log and data files, as illustrated in Figure 1. In  order to ensure reliable transactional behavior, SQL Server first writes  all transaction records to the log file before it flushes the  corresponding data in 8KB pages to the data file on disk. Depending on  the selected recovery model, this requires more than twice the BLOB size  in storage capacity until you perform a backup and purge the  transaction log. Moreover, SQL Server does not store unstructured  SharePoint content directly in data pages. Instead, SQL Server uses a  separate collection of text/image pages and only stores a 16-byte text  pointer to the BLOB's root node in the data row. Text/image pages are  organized in a balanced tree, yet there is only one collection of  text/image pages for each table. For the AllDocStreams table, this means  that the content of all files is spread across the same text/image page  collection. A single text/image page can hold data fragments from  multiple BLOBs, or it may hold intermediate nodes for BLOBs larger than  32KB in size.                                          
                                                    DSC0001.gif                                                    
Figure 1 Default SharePoint BLOB  storage in SQL Server                                                                                 
Let's not dive too deeply into SQL Server  internals, though. The point is that when reading unstructured content,  SQL Server must go through the data row to get the text pointer and then  through the BLOB's root node and possibly additional intermediate nodes  to locate all data fragments spread across any number of text/image  pages that SQL Server must load into memory in full to get all data  blocks. This is because SQL Server performs I/O operations at the page  level. These complexities impair file-streaming performance in  comparison to direct access through the file system. SQL Server also  imposes a hard size limit of 2GB on SharePoint because this is the  maximum capacity of the image data type. The Content column of the  AllDocStreams table is an image column, so you cannot store files larger  than 2GB in a SharePoint content database.                                         
                                         
External Binary Storage                                         
The ISPExternalBinaryProvider API offers a  clever alternative to internal BLOB storage in SharePoint content  databases. It is a straightforward COM interface with only two methods  (StoreBinary and RetrieveBinary), which you can use to implement an  External Binary Storage (EBS) provider. For architecture details, see  the topic "Architecture  of External BLOB Storage" in the WSS 3.0 SDK.                                         
SharePoint loads your EBS provider when you  set the ExternalBinaryStoreClassId property of the local SPFarm object  (SPFarm.Local.ExternalBinaryStoreClassId) to the provider's COM class  identifier (CLSID). SharePoint then calls the provider's StoreBinary  method whenever you submit BLOB data, such as when you're uploading a  file to a document library. The EBS provider can decide to store the  BLOB in its associated external storage system and return a  corresponding BLOB identifier ( BLOB ID) to SharePoint, or it can set  the pfAccepted parameter in the StoreBinary method to false to indicate  that it did not handle the BLOB. In the latter case, SharePoint stores  the BLOB in the content database as usual. On the other hand, if the EBS  provider accepted the BLOB, SharePoint only inserts the BLOB ID into  the Content column of the AllDocStreams table, as indicated in Figure  2. The BLOB ID can be any value that enables the EBS provider  to locate the content in the external storage system, such as a  filename, a file path, a globally unique identifier (GUID), or a content  digest. The sample providers included in the companion material, for  instance, use GUIDs as filenames for reliable identification of BLOBs on  a file server.                                         
                                                    DSC0002.gif                                                    
Figure 2 Storing a SharePoint  BLOB in an external storage system                                                                                 
SharePoint also keeps track of externally  stored files by setting the highest DocFlags bit of these files to 1.  DocFlags is a column of the AllDocs table. When a user requests to  download an externally stored file, SharePoint checks DocFlags and  passes the Content value from the AllDocStreams table to the  RetrieveBinary method of the EBS provider. In response to the  RetrieveBinary call, the EBS provider must retrieve the indicated BLOB  from the external storage system and return the binary content to  SharePoint in form of a COM object that implements the ILockBytes  interface. Note that SharePoint does not call the RetrieveBinary method  for BLOBs stored directly in the content database.                                         
Note also that the storage and retrieval  processes are transparent to the user as long as the user doesn't  attempt to bypass SharePoint. So, you don't need to replace built-in Web  parts with custom versions that tie metadata in a list with a document  stored externally; productivity applications, such as Microsoft Office,  don't need to know how to store metadata in one place and then the  document in another; and Search does not need to process metadata  separate from documents. Moreover, and this is one of my favorite  advantages of the EBS provider architecture, the user must go through  SharePoint to access externally stored BLOB data. A user bypassing  SharePoint and directly accessing a content database through a SQL  Server connection ends up downloading BLOB IDs instead of actual file  contents, as illustrated in Figure 3. You can verify  this behavior if you deploy the SQL Download Web Part (which I used in  the April 2009 column to demonstrate how to bypass SharePoint AD RMS  protection) in a test environment. Furthermore, users don't need—and  should not have—access permissions to the external BLOB store. Only  SharePoint security accounts require access because SharePoint calls the  EBS provider methods in the security context of the site's application  pool account.                                         
                                                    DSC0003.gif                                                    
Figure 3 The EBS provider can be  a roadblock to bypassing SharePoint permissions for file downloads                                                                                 
Keep in mind, however, that EBS providers also  have drawbacks due to the complexity of maintaining integrity between  metadata in the SharePoint farm's content databases and the external  BLOB store. For a good discussion of pros and cons, check out the topic "Operational  Limits and Trade-Off Analysis" in the WSS 3.0 SDK. Make sure you  read this very important topic before implementing an EBS provider in a  SharePoint environment.                                         
                                         
Building an Unmanaged EBS Provider                                         
Now let's tackle the challenges of building  EBS providers. The ISPExternalBinaryProvider interface is  well-documented in the WSS 3.0 SDK under "The BLOB  Access Interface: ISPExternalBinaryProvider." However, it seems  Microsoft forgot to cover the EBS provider details. After all, we are  not just consuming the interface of an existing COM server. We are  tasked with building that COM server ourselves and implementing the  ISPExternalBinaryProvider interface. Most importantly, the WSS 3.0 SDK  fails to mention the type of COM server we are supposed to build and the  required threading model. A classic COM server can run out-of-process  or in-process, and it can support the single-threaded apartment (STA)  model, the multithreaded apartment ( MTA) model, or both, or the  free-threaded model. For the EBS provider to work properly, make sure  you build a thread-safe in-process COM server that supports the  threading model "Both" for STAs and the MTA.                                         
You also need to think about which programming  language to use. This is important because the  ISPExternalBinaryProvider interface is the lowest-level API of  SharePoint. Performance issues can affect the entire SharePoint farm.  For this reason, I recommend using a language that enables you to build  small and fast COM objects, such as Visual C++ and Active Template  Library (ATL). ATL provides helpful C++ classes to simplify the  development of thread-safe COM servers in unmanaged code with the  correct level of threading support.                                          
Visual Studio also includes a variety of ATL  wizards. Just create an ATL project, select Dynamic-link library ( DLL)  for the server type, copy the ISPExternalBinaryProvider interface  definition from the WSS 3.0 SDK into the interface definition language  ( IDL) file of your ATL project, add a new class for an ATL Simple  Object, select "Both" as the threading model and no aggregation, then  right-click the new class, point to Add, click Implement Interface, and  select ISPExternalBinaryProvider. That's it! The Implement Interface  Wizard performs all necessary plumbing, so you can focus on implementing  the StoreBinary and RetrieveBinary methods.                                         
And don't let unmanaged C++ code intimidate  you. If you analyze the SampleStore.cpp file in the companion material,  you can see that the StoreBinary and RetrieveBinary implementations are  relatively straightforward. Essentially, the sample StoreBinary method  constructs a file path based on a StorePath registry value, the Site ID  passed in from SharePoint, and a GUID generated for the BLOB, and then  uses the Win32 WriteFile function to save the binary data obtained from  the ILockBytes instance. The sample RetrieveBinary method, on the other  hand, constructs the file path based on the same StorePath registry  value, the Site ID, and the BLOB ID passed in from SharePoint, and then  uses the Win32 ReadFile function to retrieve the unstructured data,  which the EBS provider copies into a new ILockBytes instance that it  then passes back to SharePoint. Figure 4 illustrates  how the EBS provider constructs the file path.                                         
                                                    DSC0004.gif                                                    
Figure 4 Constructing file paths  for StoreBinary and RetrieveBinary operations in the sample EBS  providers                                                                                 
                                         
Building a Managed EBS Provider                                         
Of course, SharePoint developers might prefer  using familiar managed languages to build EBS providers, even though  building managed EBS providers is not necessarily less complicated than  building unmanaged providers due to the complexity of COM  interoperability. Keep in mind that an application written in unmanaged  code can only load one version of the common language runtime (CLR), so  your code needs to work with the same version of the CLR that the rest  of SharePoint is using, otherwise you might end up with unexpected  behavior. Also, you still must deal with unmanaged interfaces and the  corresponding marshalling of parameters and buffers. Just compare  SampleStore.cpp with SampleStore.cs in the companion material. There are  no gains using a managed language in terms of code structure or  programming simplicity.                                         
Moreover, be aware of 64-bit compatibility  issues if you develop managed EBS providers on the x64 platform. Figure  5 shows a typical error that results from invalid COM  registration settings on a development computer. If you enable the  Register for COM Interop checkbox in the project properties in Visual  Studio 2005 or Visual Studio 2008, you'll end up with COM registration  settings for your provider in the registry under  HKEY_CLASSES_ROOT\Wow6432Node\CLSID\<ProviderCLSID>. Visual Studio  uses the 32-bit version of the Assembly Registration Tool (Regasm.exe)  even on the x64 platform.                                          
                                                    DSC0005.gif                                                    
Figure 5 Due to invalid COM  registration settings, a managed EBS provider could not be loaded                                                                                 
However, the 64-bit version of SharePoint  cannot load a 32-bit COM server registered under the Wow6432Node, so you  must manually register your managed EBS provider by using the 64-bit  Regasm.exe version, located in the  %WINDIR%\Microsoft.NET\Framework64\v2.0.50727 directory. For example,  the command "%WINDIR%\Microsoft.NET\Framework64\v2.0.50727\Regasm.exe"  ManagedProvider.dll creates the required registry settings for the  managed sample provider under  HKEY_CLASSES_ROOT\CLSID\<ProviderCLSID>. Another approach is to  create a Setup program and mark the EBS provider for automatic COM  registration.                                         
Remember also that managed EBS providers come  with significantly more overhead and performance penalties than their  unmanaged ATL counterparts. You can see this if you compare the COM  registration settings in the registry. As the InProcServer32 key  reveals, the COM runtime loads unmanaged EBS provider DLLs directly,  while managed EBS providers rely on Mscoree.dll as the in-proc server,  which is the core engine of the CLR. So, for managed providers, the COM  runtime loads the CLR and then the CLR loads the EBS provider assembly  as registered under the Assembly key and creates a COM Callable Wrapper  (CCW) proxy to handle the interaction between the unmanaged SharePoint  client (Owssvr.dll) and the managed EBS provider.                                         
Keep in mind that the unmanaged SharePoint  server does not directly interact with your managed provider. It's the  CCW that marshals parameters, calls the managed methods, and handles  HRESULTs. This indirection is especially apparent in the different  return types of managed methods in comparison to unmanaged methods.  Unmanaged methods return HRESULTs to indicate success or failures while  managed methods are supposed to have the void return type. So don't  return explicit HRESULTs in managed code. You must raise system or  user-defined exceptions in response to error conditions. If a managed  method completes without an exception, the CCW automatically returns  S_OK to the unmanaged client.                                          
On the other hand, if a managed method raises  an exception, the CCW maps error codes and messages to HRESULTs and  error information. The CCW implements various error-handling interfaces  for this purpose, such as ISupportErrorInfo and IErrorInfo, but  SharePoint does not take advantage of these interfaces. EBS providers  must implement their own error reporting through the Windows event log,  SharePoint diagnostic logs, trace files, or other means. SharePoint only  expects the HRESULT values S_OK for success and E_FAIL for any error.  You can use the Marshal.ThrowExceptionForHR method to return E_FAIL to  SharePoint, as demonstrated in SampleStore.cs.                                         
                                         
Registering an EBS Provider in SharePoint                                         
Easily the most confusing section on  ISPExternalBinaryProvider in the WSS 3.0 SDK is the topic "Installing  and Configuring Your BLOB Provider." At the time of this writing,  this section was filled with misleading information and errors. Even the  Windows PowerShell commands were incorrect. If you assign the EBS  provider to $yourProviderConfig and afterwards use  $providerConfig.ProviderCLSID, don't be surprised when you receive an  error stating that $providerConfig doesn't exist. Of course, you won't  even reach this point because the Active and ProviderCLSID properties  aren't part of the ISPExternalBinaryProvider interface. These mysterious  properties belong to a dual interface that is not covered in the  documentation. Just for fun, I implemented a sample version in both  unmanaged and managed code, but your ISPExternalBinaryProvider  implementation does not require these proprietary properties at all.                                         
The ProviderCLSID property might be handy, but  the CLSID is also available in the registry if you search for the  ProgID, such as UnmanagedProvider.SampleStore or  ManagedProvider.SampleStore, and you can also find the CLSIDs in the  code files SampleStore.rgs and SampleStore.cs. As mentioned earlier,  setting the ExternalBinaryStoreClassId property of the local SPFarm  object to the CLSID registers the EBS provider. Setting the  ExternalBinaryStoreClassId property of the local SPFarm object to an  empty GUID ("00000000-0000-0000-0000-000000000000") removes the EBS  provider registration. Don't forget to call the SPFarm object's Update  method to save the changes in the configuration database and restart  Internet Information Services (&#8201;IIS). The following code listing  illustrates how to accomplish these tasks in Windows PowerShell:                                          
         
         
                                         
     [System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SharePoint')
$farm = [Microsoft.SharePoint.Administration.SPFarm]::Local
# Registering the CLSID of an EBS provider
$farm.ExternalBinaryStoreClassId = "C4A543C2-B7DB-419F-8C79-68B8842EC005"
$farm.Update()
IISRESET
# Removing the EBS provider registration
$farm.ExternalBinaryStoreClassId = "00000000-0000-0000-0000-000000000000"
$farm.Update()
IISRESET
                                                
                                         
Implementing Garbage Collection                                         
Another section in the WSS 3.0 SDK featuring  mysterious components and critical code snippets is titled "Implementing  Lazy Garbage Collection." At the time of this writing, this section  contained references to another mysterious Utility class with  DirFromSiteId and FileFromBlobid methods as well as an incorrect  assignment of Directory.GetFiles results to a FileInfo array, but let's  not be too demanding on WSS 3.0 documentation quality. The DirFromSiteId  and FileFromBlobid helper methods reveal their purpose through their  names and the incorrect FileInfo array is easily replaced with a string  array, or you can replace the Directory.GetFiles method with a call to  the GetFiles method of a DirectoryInfo object. The Garbage Collector  sample program in the companion material uses the DirectoryInfo approach  and follows the suggested sequence of steps for garbage collection.                                         
An important deviation of the Garbage  Collector sample from the SDK explanations concerns the handling of  timing conditions. This is a critical issue because timing conditions  can lead to misidentification and deletion of valid files during garbage  collection. Take a look at Figure 6, which illustrates  the WSS 3.0 SDK&#8211;recommended approach to determine orphaned files by  enumerating all BLOB files in the EBS store and then removing all those  references from the BLOB list that are still in the content database as  indicated through the site's ExternalBinaryIds collection. The remaining  references in the BLOB list are supposed to indicate orphaned files  that should be deleted.                                         
                                                                                                      
Figure 6 Misidentification of a  valid BLOB as orphaned due to a timing condition                                                                                 
However, the EBS provider must, of course,  first finish writing BLOB data before it can return a BLOB ID to  SharePoint. Depending on network bandwidth and other conditions, I/O  performance can fluctuate. So, there is a chance that the EBS provider  could create a new BLOB&#8212;which then appears in your BLOB list&#8212;but  completes writing the BLOB data after you have determined the  ExternalBinaryIds so the BLOB ID is not yet present in this collection.  Accordingly, the reference to the new BLOB remains in the orphaned BLOB  list and if you purge the orphaned BLOBs at this point, you accidentally  delete a valid content item and lose data! In order to avoid this  problem, the sample Garbage Collector checks the file creation time and  adds only those items to the BLOB list that are more than one hour old.                                         
                                         
Conclusion                                         
By integrating an external storage solution  with SharePoint, you can increase storage efficiency, system  performance, and scalability of a SharePoint farm. Another advantage is  that this forces users to go through SharePoint to access unstructured  contents. Pulling data out of the content databases via direct SQL  Server connections only yields binary BLOB identifiers instead of the  actual files. However, EBS providers also have drawbacks due to the  complexity of maintaining integrity between metadata in the SharePoint  farm's content databases and the external BLOB store.                                         
In order to integrate SharePoint with an  external storage solution, you must build an EBS provider, which is a  COM server that implements the ISPExternalBinaryProvider interface with  its StoreBinary and RetrieveBinary methods. You can create unmanaged and  managed EBS providers, but be aware of performance and compatibility  issues if you decide to use managed code. Also keep in mind that the  ISPExternalBinaryProvider interface does not include a DeleteBinary  method. You must explicitly remove orphaned BLOBs through lazy garbage  collection, and be careful to avoid timing conditions that can lead to  the accidental deletion of valid BLOB items.                                         
                                         
                                                   Pav Cherny  is an IT expert and author specializing in Microsoft technologies for  collaboration and unified communication. His publications include white  papers, product manuals, and books with a focus on IT operations and  system administration. Pav is President of Biblioso Corporation, a  company that specializes in managed documentation and localization  services.Inside SharePoint                                         Creating  an External Storage Solution for SharePoint                                         
Pav  Cherny                                         
                                         
                                                   Code download available at:                                                    ChernySharePoint2009_06.exe                                                   (2,006 KB)                                                   
                                                                                 
                                                   

    Contents  
                                                                Internal  Binary Storage                                                            
                                                             External  Binary Storage                                                            
                                                             Building  an Unmanaged EBS Provider                                                            
                                                             Building  a Managed EBS Provider                                                            
                                                             Registering  an EBS Provider in SharePoint                                                            
                                                             Implementing  Garbage Collection                                                            
                                                             Conclusion                                                            
                                                                                                                                       
Microsoft estimates that as much as 80 percent  of the data stored in Microsoft Windows SharePoint Services (&#8201;WSS&#8201;) 3.0  and Microsoft Office SharePoint Server (MOSS) 2007 content databases is  non-relational binary large object (&#8201;BLOB) data, such as Microsoft  Office Word documents, Microsoft Office Excel spreadsheets, and  Microsoft Office PowerPoint presentations. Only 20 percent is relational  metadata, which implies a suboptimal use of Microsoft SQL Server  resources at the database backend. SharePoint does not take advantage of  recent SQL Server innovations for unstructured data introduced in SQL  Server 2008, such as the FILESTREAM attribute or Remote BLOB Storage  API, but provides its own options to increase the storage efficiency and  manageability of massive data volumes.                                          
Specifically, SharePoint includes an external  binary storage provider API, ISPExternalBinaryProvider, which Microsoft  first published as a hotfix in May 2007 and incorporated later into  Service Pack 1. The ISPExternalBinaryProvider API is separate from the  Remote BLOB Storage API. Third-party vendors can use this API to  integrate SharePoint with advanced storage solutions, such as  content-addressable storage (CAS) systems. You can also use this API to  maintain SharePoint BLOB data on a central file server outside of  content databases if you want to build a custom solution to increase  storage efficiency and scalability in a SharePoint farm. Keep in mind,  however, that this API is specific to WSS 3.0 and MOSS 2007. It will  change in the next SharePoint release, which means that you will have to  update your provider.                                         
In this column, I discuss how to extend the  SharePoint storage architecture using the ISPExternalBinaryProvider API,  including advantages and disadvantages, implementation details,  performance considerations, and garbage collection. I also discuss a  64-bit compatibility issue of Microsoft Visual Studio that can cause  SharePoint to fail loading managed ISPExternalBinaryProvider components  despite a correct interface implementation. Where appropriate, I refer  to the ISPExternalBinaryProvider documentation in the WSS 3.0 SDK.  Another reference worth mentioning is Kyle  Tillman's blog.                                          
Kyle does a great job explaining how he  mastered the implementation hurdles in managed code, but neither the WSS  3.0 SDK nor Kyle's blog post includes a Visual Studio sample project,  so I decided to provide ISPExternalBinaryProvider samples in both  unmanaged and managed code in this column's companion material. The  purpose of these samples is to help you get started if you are  interested in integrating external storage solutions with SharePoint.  Remember, though, that these samples are untested and not ready for  production use.                                         
                                         
Internal Binary Storage                                         
By default, SharePoint stores BLOB data in the  Content column of the AllDocStreams table in the content database. The  obvious advantage of this approach is straightforward transactional  consistency between relational data and the associated non-relational  file contents. For example, it's  not complicated to insert the metadata  of a Word document along with the unstructured content into a content  database, nor is it complicated to associate metadata with the  corresponding unstructured content in select, update, or delete  operations. However, the most obvious disadvantage of the default  approach is an inefficient use of storage resources. Despite an I/O  subsystem optimized for high performance, the SQL Server storage engine  is not exactly a file-server replacement.                                         
A SQL Server database consists of transaction  log and data files, as illustrated in Figure 1. In  order to ensure reliable transactional behavior, SQL Server first writes  all transaction records to the log file before it flushes the  corresponding data in 8KB pages to the data file on disk. Depending on  the selected recovery model, this requires more than twice the BLOB size  in storage capacity until you perform a backup and purge the  transaction log. Moreover, SQL Server does not store unstructured  SharePoint content directly in data pages. Instead, SQL Server uses a  separate collection of text/image pages and only stores a 16-byte text  pointer to the BLOB's root node in the data row. Text/image pages are  organized in a balanced tree, yet there is only one collection of  text/image pages for each table. For the AllDocStreams table, this means  that the content of all files is spread across the same text/image page  collection. A single text/image page can hold data fragments from  multiple BLOBs, or it may hold intermediate nodes for BLOBs larger than  32KB in size.                                          
                                                                                                      
Figure 1 Default SharePoint BLOB  storage in SQL Server                                                                                 
Let's not dive too deeply into SQL Server  internals, though. The point is that when reading unstructured content,  SQL Server must go through the data row to get the text pointer and then  through the BLOB's root node and possibly additional intermediate nodes  to locate all data fragments spread across any number of text/image  pages that SQL Server must load into memory in full to get all data  blocks. This is because SQL Server performs I/O operations at the page  level. These complexities impair file-streaming performance in  comparison to direct access through the file system. SQL Server also  imposes a hard size limit of 2GB on SharePoint because this is the  maximum capacity of the image data type. The Content column of the  AllDocStreams table is an image column, so you cannot store files larger  than 2GB in a SharePoint content database.                                         
                                         
External Binary Storage                                         
The ISPExternalBinaryProvider API offers a  clever alternative to internal BLOB storage in SharePoint content  databases. It is a straightforward COM interface with only two methods  (StoreBinary and RetrieveBinary), which you can use to implement an  External Binary Storage (EBS) provider. For architecture details, see  the topic "Architecture  of External BLOB Storage" in the WSS 3.0 SDK.                                         
SharePoint loads your EBS provider when you  set the ExternalBinaryStoreClassId property of the local SPFarm object  (SPFarm.Local.ExternalBinaryStoreClassId) to the provider's COM class  identifier (CLSID). SharePoint then calls the provider's StoreBinary  method whenever you submit BLOB data, such as when you're uploading a  file to a document library. The EBS provider can decide to store the  BLOB in its associated external storage system and return a  corresponding BLOB identifier (&#8201;BLOB ID) to SharePoint, or it can set  the pfAccepted parameter in the StoreBinary method to false to indicate  that it did not handle the BLOB. In the latter case, SharePoint stores  the BLOB in the content database as usual. On the other hand, if the EBS  provider accepted the BLOB, SharePoint only inserts the BLOB ID into  the Content column of the AllDocStreams table, as indicated in Figure  2. The BLOB ID can be any value that enables the EBS provider  to locate the content in the external storage system, such as a  filename, a file path, a globally unique identifier (GUID), or a content  digest. The sample providers included in the companion material, for  instance, use GUIDs as filenames for reliable identification of BLOBs on  a file server.                                         
                                                                                                      
Figure 2 Storing a SharePoint  BLOB in an external storage system                                                                                 
SharePoint also keeps track of externally  stored files by setting the highest DocFlags bit of these files to 1.  DocFlags is a column of the AllDocs table. When a user requests to  download an externally stored file, SharePoint checks DocFlags and  passes the Content value from the AllDocStreams table to the  RetrieveBinary method of the EBS provider. In response to the  RetrieveBinary call, the EBS provider must retrieve the indicated BLOB  from the external storage system and return the binary content to  SharePoint in form of a COM object that implements the ILockBytes  interface. Note that SharePoint does not call the RetrieveBinary method  for BLOBs stored directly in the content database.                                         
Note also that the storage and retrieval  processes are transparent to the user as long as the user doesn't  attempt to bypass SharePoint. So, you don't need to replace built-in Web  parts with custom versions that tie metadata in a list with a document  stored externally; productivity applications, such as Microsoft Office,  don't need to know how to store metadata in one place and then the  document in another; and Search does not need to process metadata  separate from documents. Moreover, and this is one of my favorite  advantages of the EBS provider architecture, the user must go through  SharePoint to access externally stored BLOB data. A user bypassing  SharePoint and directly accessing a content database through a SQL  Server connection ends up downloading BLOB IDs instead of actual file  contents, as illustrated in Figure 3. You can verify  this behavior if you deploy the SQL Download Web Part (which I used in  the April 2009 column to demonstrate how to bypass SharePoint AD RMS  protection) in a test environment. Furthermore, users don't need&#8212;and  should not have&#8212;access permissions to the external BLOB store. Only  SharePoint security accounts require access because SharePoint calls the  EBS provider methods in the security context of the site's application  pool account.                                         
                                                                                                      
Figure 3 The EBS provider can be  a roadblock to bypassing SharePoint permissions for file downloads                                                                                 
Keep in mind, however, that EBS providers also  have drawbacks due to the complexity of maintaining integrity between  metadata in the SharePoint farm's content databases and the external  BLOB store. For a good discussion of pros and cons, check out the topic "Operational  Limits and Trade-Off Analysis" in the WSS 3.0 SDK. Make sure you  read this very important topic before implementing an EBS provider in a  SharePoint environment.                                         
                                         
Building an Unmanaged EBS Provider                                         
Now let's tackle the challenges of building  EBS providers. The ISPExternalBinaryProvider interface is  well-documented in the WSS 3.0 SDK under "The BLOB  Access Interface: ISPExternalBinaryProvider." However, it seems  Microsoft forgot to cover the EBS provider details. After all, we are  not just consuming the interface of an existing COM server. We are  tasked with building that COM server ourselves and implementing the  ISPExternalBinaryProvider interface. Most importantly, the WSS 3.0 SDK  fails to mention the type of COM server we are supposed to build and the  required threading model. A classic COM server can run out-of-process  or in-process, and it can support the single-threaded apartment (STA)  model, the multithreaded apartment (&#8201;MTA) model, or both, or the  free-threaded model. For the EBS provider to work properly, make sure  you build a thread-safe in-process COM server that supports the  threading model "Both" for STAs and the MTA.                                         
You also need to think about which programming  language to use. This is important because the  ISPExternalBinaryProvider interface is the lowest-level API of  SharePoint. Performance issues can affect the entire SharePoint farm.  For this reason, I recommend using a language that enables you to build  small and fast COM objects, such as Visual C++ and Active Template  Library (ATL). ATL provides helpful C++ classes to simplify the  development of thread-safe COM servers in unmanaged code with the  correct level of threading support.                                          
Visual Studio also includes a variety of ATL  wizards. Just create an ATL project, select Dynamic-link library (&#8201;DLL)  for the server type, copy the ISPExternalBinaryProvider interface  definition from the WSS 3.0 SDK into the interface definition language  (&#8201;IDL) file of your ATL project, add a new class for an ATL Simple  Object, select "Both" as the threading model and no aggregation, then  right-click the new class, point to Add, click Implement Interface, and  select ISPExternalBinaryProvider. That's it! The Implement Interface  Wizard performs all necessary plumbing, so you can focus on implementing  the StoreBinary and RetrieveBinary methods.                                         
And don't let unmanaged C++ code intimidate  you. If you analyze the SampleStore.cpp file in the companion material,  you can see that the StoreBinary and RetrieveBinary implementations are  relatively straightforward. Essentially, the sample StoreBinary method  constructs a file path based on a StorePath registry value, the Site ID  passed in from SharePoint, and a GUID generated for the BLOB, and then  uses the Win32 WriteFile function to save the binary data obtained from  the ILockBytes instance. The sample RetrieveBinary method, on the other  hand, constructs the file path based on the same StorePath registry  value, the Site ID, and the BLOB ID passed in from SharePoint, and then  uses the Win32 ReadFile function to retrieve the unstructured data,  which the EBS provider copies into a new ILockBytes instance that it  then passes back to SharePoint. Figure 4 illustrates  how the EBS provider constructs the file path.                                         
                                                                                                      
Figure 4 Constructing file paths  for StoreBinary and RetrieveBinary operations in the sample EBS  providers                                                                                 
                                         
Building a Managed EBS Provider                                         
Of course, SharePoint developers might prefer  using familiar managed languages to build EBS providers, even though  building managed EBS providers is not necessarily less complicated than  building unmanaged providers due to the complexity of COM  interoperability. Keep in mind that an application written in unmanaged  code can only load one version of the common language runtime (CLR), so  your code needs to work with the same version of the CLR that the rest  of SharePoint is using, otherwise you might end up with unexpected  behavior. Also, you still must deal with unmanaged interfaces and the  corresponding marshalling of parameters and buffers. Just compare  SampleStore.cpp with SampleStore.cs in the companion material. There are  no gains using a managed language in terms of code structure or  programming simplicity.                                         
Moreover, be aware of 64-bit compatibility  issues if you develop managed EBS providers on the x64 platform. Figure  5 shows a typical error that results from invalid COM  registration settings on a development computer. If you enable the  Register for COM Interop checkbox in the project properties in Visual  Studio 2005 or Visual Studio 2008, you'll end up with COM registration  settings for your provider in the registry under  HKEY_CLASSES_ROOT\Wow6432Node\CLSID\<ProviderCLSID>. Visual Studio  uses the 32-bit version of the Assembly Registration Tool (Regasm.exe)  even on the x64 platform.                                          
                                                                                                      
Figure 5 Due to invalid COM  registration settings, a managed EBS provider could not be loaded                                                                                 
However, the 64-bit version of SharePoint  cannot load a 32-bit COM server registered under the Wow6432Node, so you  must manually register your managed EBS provider by using the 64-bit  Regasm.exe version, located in the  %WINDIR%\Microsoft.NET\Framework64\v2.0.50727 directory. For example,  the command "%WINDIR%\Microsoft.NET\Framework64\v2.0.50727\Regasm.exe"  ManagedProvider.dll creates the required registry settings for the  managed sample provider under  HKEY_CLASSES_ROOT\CLSID\<ProviderCLSID>. Another approach is to  create a Setup program and mark the EBS provider for automatic COM  registration.                                         
Remember also that managed EBS providers come  with significantly more overhead and performance penalties than their  unmanaged ATL counterparts. You can see this if you compare the COM  registration settings in the registry. As the InProcServer32 key  reveals, the COM runtime loads unmanaged EBS provider DLLs directly,  while managed EBS providers rely on Mscoree.dll as the in-proc server,  which is the core engine of the CLR. So, for managed providers, the COM  runtime loads the CLR and then the CLR loads the EBS provider assembly  as registered under the Assembly key and creates a COM Callable Wrapper  (CCW) proxy to handle the interaction between the unmanaged SharePoint  client (Owssvr.dll) and the managed EBS provider.                                         
Keep in mind that the unmanaged SharePoint  server does not directly interact with your managed provider. It's the  CCW that marshals parameters, calls the managed methods, and handles  HRESULTs. This indirection is especially apparent in the different  return types of managed methods in comparison to unmanaged methods.  Unmanaged methods return HRESULTs to indicate success or failures while  managed methods are supposed to have the void return type. So don't  return explicit HRESULTs in managed code. You must raise system or  user-defined exceptions in response to error conditions. If a managed  method completes without an exception, the CCW automatically returns  S_OK to the unmanaged client.                                          
On the other hand, if a managed method raises  an exception, the CCW maps error codes and messages to HRESULTs and  error information. The CCW implements various error-handling interfaces  for this purpose, such as ISupportErrorInfo and IErrorInfo, but  SharePoint does not take advantage of these interfaces. EBS providers  must implement their own error reporting through the Windows event log,  SharePoint diagnostic logs, trace files, or other means. SharePoint only  expects the HRESULT values S_OK for success and E_FAIL for any error.  You can use the Marshal.ThrowExceptionForHR method to return E_FAIL to  SharePoint, as demonstrated in SampleStore.cs.                                         
                                         
Registering an EBS Provider in SharePoint                                         
Easily the most confusing section on  ISPExternalBinaryProvider in the WSS 3.0 SDK is the topic "Installing  and Configuring Your BLOB Provider." At the time of this writing,  this section was filled with misleading information and errors. Even the  Windows PowerShell commands were incorrect. If you assign the EBS  provider to $yourProviderConfig and afterwards use  $providerConfig.ProviderCLSID, don't be surprised when you receive an  error stating that $providerConfig doesn't exist. Of course, you won't  even reach this point because the Active and ProviderCLSID properties  aren't part of the ISPExternalBinaryProvider interface. These mysterious  properties belong to a dual interface that is not covered in the  documentation. Just for fun, I implemented a sample version in both  unmanaged and managed code, but your ISPExternalBinaryProvider  implementation does not require these proprietary properties at all.                                         
The ProviderCLSID property might be handy, but  the CLSID is also available in the registry if you search for the  ProgID, such as UnmanagedProvider.SampleStore or  ManagedProvider.SampleStore, and you can also find the CLSIDs in the  code files SampleStore.rgs and SampleStore.cs. As mentioned earlier,  setting the ExternalBinaryStoreClassId property of the local SPFarm  object to the CLSID registers the EBS provider. Setting the  ExternalBinaryStoreClassId property of the local SPFarm object to an  empty GUID ("00000000-0000-0000-0000-000000000000") removes the EBS  provider registration. Don't forget to call the SPFarm object's Update  method to save the changes in the configuration database and restart  Internet Information Services (&#8201;IIS). The following code listing  illustrates how to accomplish these tasks in Windows PowerShell:                                          
         
         
                                         
     [System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SharePoint')
$farm = [Microsoft.SharePoint.Administration.SPFarm]::Local
# Registering the CLSID of an EBS provider
$farm.ExternalBinaryStoreClassId = "C4A543C2-B7DB-419F-8C79-68B8842EC005"
$farm.Update()
IISRESET
# Removing the EBS provider registration
$farm.ExternalBinaryStoreClassId = "00000000-0000-0000-0000-000000000000"
$farm.Update()
IISRESET
                                                
                                         
Implementing Garbage Collection                                         
Another section in the WSS 3.0 SDK featuring  mysterious components and critical code snippets is titled "Implementing  Lazy Garbage Collection." At the time of this writing, this section  contained references to another mysterious Utility class with  DirFromSiteId and FileFromBlobid methods as well as an incorrect  assignment of Directory.GetFiles results to a FileInfo array, but let's  not be too demanding on WSS 3.0 documentation quality. The DirFromSiteId  and FileFromBlobid helper methods reveal their purpose through their  names and the incorrect FileInfo array is easily replaced with a string  array, or you can replace the Directory.GetFiles method with a call to  the GetFiles method of a DirectoryInfo object. The Garbage Collector  sample program in the companion material uses the DirectoryInfo approach  and follows the suggested sequence of steps for garbage collection.                                         
An important deviation of the Garbage  Collector sample from the SDK explanations concerns the handling of  timing conditions. This is a critical issue because timing conditions  can lead to misidentification and deletion of valid files during garbage  collection. Take a look at Figure 6, which illustrates  the WSS 3.0 SDK&#8211;recommended approach to determine orphaned files by  enumerating all BLOB files in the EBS store and then removing all those  references from the BLOB list that are still in the content database as  indicated through the site's ExternalBinaryIds collection. The remaining  references in the BLOB list are supposed to indicate orphaned files  that should be deleted.                                         
                                                                                                      
Figure 6 Misidentification of a  valid BLOB as orphaned due to a timing condition                                                                                 
However, the EBS provider must, of course,  first finish writing BLOB data before it can return a BLOB ID to  SharePoint. Depending on network bandwidth and other conditions, I/O  performance can fluctuate. So, there is a chance that the EBS provider  could create a new BLOB&#8212;which then appears in your BLOB list&#8212;but  completes writing the BLOB data after you have determined the  ExternalBinaryIds so the BLOB ID is not yet present in this collection.  Accordingly, the reference to the new BLOB remains in the orphaned BLOB  list and if you purge the orphaned BLOBs at this point, you accidentally  delete a valid content item and lose data! In order to avoid this  problem, the sample Garbage Collector checks the file creation time and  adds only those items to the BLOB list that are more than one hour old.                                         
                                         
Conclusion                                         
By integrating an external storage solution  with SharePoint, you can increase storage efficiency, system  performance, and scalability of a SharePoint farm. Another advantage is  that this forces users to go through SharePoint to access unstructured  contents. Pulling data out of the content databases via direct SQL  Server connections only yields binary BLOB identifiers instead of the  actual files. However, EBS providers also have drawbacks due to the  complexity of maintaining integrity between metadata in the SharePoint  farm's content databases and the external BLOB store.                                         
In order to integrate SharePoint with an  external storage solution, you must build an EBS provider, which is a  COM server that implements the ISPExternalBinaryProvider interface with  its StoreBinary and RetrieveBinary methods. You can create unmanaged and  managed EBS providers, but be aware of performance and compatibility  issues if you decide to use managed code. Also keep in mind that the  ISPExternalBinaryProvider interface does not include a DeleteBinary  method. You must explicitly remove orphaned BLOBs through lazy garbage  collection, and be careful to avoid timing conditions that can lead to  the accidental deletion of valid BLOB items.                                         
                                         
                                                   Pav Cherny  is an IT expert and author specializing in Microsoft technologies for  collaboration and unified communication. His publications include white  papers, product manuals, and books with a focus on IT operations and  system administration. Pav is President of Biblioso Corporation, a  company that specializes in managed documentation and localization  services.

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-119829-1-1.html 上篇帖子: 开启Sharepoint Developer Dashboard 下篇帖子: 通过SharePoint Desgin连接并显示数据库内容
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表