关于RandomizedSearchCV 和GridSearchCV(区别:参数个数的选择方式)
# -*- coding: utf-8 -*-"""
Created on Tue Aug 09 22:38:37 2016
@author: Administrator
"""
import time
import numpy as np
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
from sklearn.grid_search import GridSearchCV
from sklearn.grid_search import RandomizedSearchCV
# 生成数据
digits = load_digits()
X, y = digits.data, digits.target
# 元分类器
meta_clf = RandomForestClassifier(n_estimators=20)
# =================================================================
# 设置参数
param_dist = {"max_depth": ,
"max_features": ,
"min_samples_split": ,
"min_samples_leaf": ,
"bootstrap": ,
"criterion": ["gini", "entropy"]}
# 运行随机搜索 RandomizedSearch
n_iter_search = 20
rs_clf = RandomizedSearchCV(meta_clf, param_distributions=param_dist,n_iter=n_iter_search)
start = time.time()
rs_clf.fit(X, y)
print("RandomizedSearchCV took %.2f seconds for %d candidates parameter settings." % ((time.time() - start), n_iter_search))
print(rs_clf.grid_scores_)
# =================================================================
# 设置参数
param_grid = {"max_depth": ,
"max_features": ,
"min_samples_split": ,
"min_samples_leaf": ,
"bootstrap": ,
"criterion": ["gini", "entropy"]}
# 运行网格搜索 GridSearch
gs_clf = GridSearchCV(meta_clf, param_grid=param_grid)
start = time.time()
gs_clf.fit(X, y)
print("GridSearchCV took %.2f seconds for %d candidate parameter settings." % (time.time() - start, len(gs_clf.grid_scores_)))
print(gs_clf.grid_scores_)
RandomizedSearchCV took 8.64 seconds for 20 candidates parameter settings.
GridSearchCV took 83.70 seconds for 216 candidate parameter settings.
GridSearchCV:Does exhaustive search over a grid of parameters.RandomizedSearchCV:Randomized search on hyper parameters. RandomizedSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.
The parameters of the estimator used to apply these methods are optimized by cross-validated search over parameter settings.
In contrast to GridSearchCV, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from the specified distributions. The number of parameter settings that are tried is given by n_iter.
If all parameters are presented as a list, sampling without replacement is performed. If at least one parameter is given as a distribution, sampling with replacement is used. It is highly recommended to use continuous distributions for continuous parameters.
页:
[1]