随着机器学习(ML)成为软件行业的主流,重要的是要了解它的工作原理,并将其置于开发栈中。了解如何为您的应用程序构建ML服务,您可以确定您的ML应用程序中的机会,实施ML,并与您的团队的ML专业人士清楚沟通。
在整个系列中,我们将构建一个基于信用记录预测贷款审批的ML服务,创建一个Web服务,并从各种平台使用Web服务。通过这个过程,我们将了解有关构建自定义ML服务的ML工具Microsoft Azure ML Studio。对于系列的第一部分,我们将重点介绍如何构建培训实验,了解Azure ML studio的基础知识,并体验构建预测模型的过程。
我们开始讨论ML与应用程序堆栈的关系。
SELECT
[Loan ID],
[Customer ID],
[Loan Status],
CAST(REPLACE([Current Loan Amount],'99999999',0) AS INT) AS [Loan Amount],
[Term],
CAST(SUBSTR([Credit Score],1,3) AS INT) AS [Credit Score] /* Clean outlier values of credit score */,
REPLACE([Years in current job],'n/a','< 1 year') AS [Years in Current Job] /* Clean n/a values in Years in current job */,
REPLACE([Home Ownership], 'HaveMortgage', 'Home Mortgage') AS [Home Ownership] /* combine home ownership: 'HaveMortgage' and 'Home Mortgage' */,
CAST([Annual Income] AS BIGINT) AS [Annual Income],
Replace([Purpose], 'other', 'Other') AS [Purpose] /* # combine Purpose: 'other' and 'Other' */,
CAST([Monthly Debt] AS FLOAT) AS [Monthly Debt],
CAST([Years of Credit History] AS FLOAT) AS [Years of Credit History],
CAST(REPLACE([Months since last delinquent],'NA','') AS INT) AS [Months Since Last Delinquent],
CAST([Number of Open Accounts] AS INT) AS [Number of Open Accounts],
CAST([Number of Credit Problems] AS INT) AS [Number of Credit Problems],
CAST([Current Credit Balance] AS BIGINT) AS [Current Credit Balance],
CAST([Maximum Open Credit] AS BIGINT) AS [Maximum Open Credit],
CAST(REPLACE([Bankruptcies],'NA','') AS INT) AS [Bankruptcies],
CAST(REPLACE([Tax Liens],'NA','') AS INT) AS [Tax Liens]
FROM t1;
我们已经学习了使用Azure ML Studio清理数据的一些基本模块,还有更多的模块,我们没有覆盖。清理数据是在过程中了解的重要步骤,因为数据是我们依赖于我们的模型的基础。请记住,即使在实验的培训阶段给出了完美的数据集,该清理和数据操作模块也可用于从Web端点接收的数据,以及过滤来自用户或应用程序的数据。
第一步,可以给这个实验添加一个标题,本文命名为"Experiment by Jiahua"
第二步,在左侧找到上传的数据,名字为上传数据是给定的数据名字,本文为"UCI German Credit Card Data",将数据拖到中间的工作区,然后右侧会给出数据的描述信息。数据进入工作区之后,用一个圆角的矩形表示,矩形下有一个圆圈,称为"output port",将鼠标放在上面并点击右键后,可以进行数据可视化等操作。拖动圆圈,可以指向下一个数据处理操作。
第三步,添加完数据集之后,就需要对数据集进行相应的处理,包括数据预处理,训练样本和测试样本划分,选择机器学习算法等等,详细操作课参加官方实例。完成上述操作后,一个可视化的机器学习过程就完成,如下图所示: