DW(六):polybase访问Azure Blob Storage
目录:[*]连接hadoop配置语法
[*]配置hadoop连接
[*]Pushdown配置
[*]Create external tables for Azure blob storage
连接hadoop配置语法:
global configuration settings for PolyBase Hadoop and Azure blob storage connectivity, Syntax:
[*]sp_configure:List all of the configuration options
[*]配置与hadoop连接语法:
[*]
sp_configure [ @configname = ] 'hadoop connectivity',
[ @configvalue = ] { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 }
[;]
RECONFIGURE
[;]
[*]Option 0: Disable Hadoop connectivity
[*]Option 1: Hortonworks HDP 1.3 on Windows Server
[*]Option 1: Azure blob storage (WASB)
[*]Option 2: Hortonworks HDP 1.3 on Linux
[*]Option 3: Cloudera CDH 4.3 on Linux
[*]Option 4: Hortonworks HDP 2.0 on Windows Server
[*]Option 4: Azure blob storage (WASB)
[*]Option 5: Hortonworks HDP 2.0 on Linux
[*]Option 6: Cloudera 5.1, 5.2, 5.3, 5.4, and 5.5 on Linux
[*]Option 7: Hortonworks 2.1, 2.2, and 2.3 on Linux
[*]Option 7: Hortonworks 2.1, 2.2, and 2.3 on Windows Server
[*]Option 7: Azure blob storage (WASB)
[*]示例代码:
[*]显示所有可用的配置列表: EXEC sp_configure;
[*]显示某个配置项的配置信息:EXEC sp_configure @configname='hadoop connectivity';
[*]设置配置项示例语法如下:
[*]
sp_configure @configname = 'hadoop connectivity', @configvalue = 7;
GO
RECONFIGURE
GO
配置hadoop连接
[*]在本节示例中,连接HDInsight3.2集,对应关系如下:
[*]
[*] 配置polybase连接配置项信息信息如下:
[*]
sp_configure @configname = 'hadoop connectivity', @configvalue = 7;
GO
RECONFIGURE
GO
[*]命令行:services.msc, 重启下面的服务
[*]SQL Server PolyBase Data Movement Service
[*]SQL Server PolyBase Engine
Pushdown配置:
[*]To improve query performance, enable pushdown computation to a Hadoop cluster
[*]Find the file yarn-site.xml in the installation path of SQL Server. Typically, the path is: C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn\Polybase\Hadoop\conf
[*]On the Hadoop machine, find the analogous(类似) file in the Hadoop configuration directory. In the file, find and copy the value of the configuration key yarn.application.classpath
[*]On the SQL Server machine, in the yarn-site.xml file, find the yarn.application.classpath property. Paste the value from the Hadoop machine into the value element.
[*]
$HADOOP_CONF_DIR,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*
Create external tables for Azure blob storage
[*]The Elastic(弹性) Database query feature relies on(依靠) the these four DDL statements. Typically, these DDL statements are used once or rarely when the schema of your application changes
[*]
[*] (https://msdn.microsoft.com/library/ms174382.aspx)
[*] (https://msdn.microsoft.com/library/ms189522.aspx)
[*] (https://msdn.microsoft.com/library/mt270260.aspx)
[*] (https://msdn.microsoft.com/library/dn935022.aspx)
[*] (https://msdn.microsoft.com/library/dn935021.aspx)
[*]You can use the following syntax to drop the master key and credentials
[*]
[*]DROP CREDENTIAL <credential_name> ON DATABASE;
[*]DROP DATABASE SCOPED CREDENTIAL <credential_name>;
[*]DROP MASTER KEY;
[*]模拟简单文本数据,如下:
[*]
[*]将文本文件上传到 Azure blob storage, 如下图:
[*]
[*]远程连接ploybase1, 打开ssms(注:此版本需单独安装),执行脚本如下:
[*]
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Password01!';
CREATE DATABASE SCOPED CREDENTIAL JNAzureCredit
WITH IDENTITY = 'zhushy', Secret =XXXX';/*存储访问Key*/
CREATE EXTERNAL DATA SOURCE JNAzureStorage with (
TYPE = HADOOP,
LOCATION ='wasb://<blob_container_name>@<azure_storage_account_name>.blob.core.chinacloudapi.cn',
CREDENTIAL = JNAzureCredit
);
CREATE EXTERNAL FILE FORMAT TextFileFormat WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR =' ', USE_TYPE_DEFAULT = TRUE))
CREATE EXTERNAL TABLE . (
varchar(128) NULL,
varchar(128) null
)
WITH (LOCATION='/zhu/data1.txt',
DATA_SOURCE = JNAzureStorage,
FILE_FORMAT = TextFileFormat
);
[*]验证数据,输入如下SQL: select * from . ,如下图,则OK
[*]
页:
[1]