Reference file system paths using URLs using the wasb scheme.
Tested on both Linux and Windows. Tested at scale.
Azure Blob Storage 包含三部分内容:
Storage Account: All access is done through a storage account
Container: A container is a grouping of multiple blobs. A storage account may have multiple containers. In Hadoop, an entire file system hierarchy is stored in a single container. It is also possible to configure multiple containers, effectively presenting multiple file systems that can be referenced using distinct URLs.
Blob: A file of any type and size. In Hadoop, files are stored in blobs. The internal implementation also uses blobs to persist the file system hierarchy and other metadata
配置 :
在 china Azure 门户(https://manage.windowsazure.cn) 创建一个 blob storage Account, 如下图命名:localhbase
Azure Blob Storage interface for Hadoop supports two kinds of blobs, block blobs and page blobs;Block blobs are the default kind of blob and are good for most big-data use cases, like input data for Hive, Pig, analytical map-reduce jobs etc
Page blob handling in hadoop-azure was introduced to support HBase log files. Page blobs can be written any number of times, whereas block blobs can only be appended to 50,000 times before you run out of blocks and your writes will fail,That won’t work for HBase logs, so page blob support was introduced to overcome this limitation
Page blobs can be up to 1TB in size, larger than the maximum 200GB size for block blobs
In order to have the files you create be page blobs, you must set the configuration variable fs.azure.page.blob.dir to a comma-separated list of folder names