窝窝插件 发表于 2017-6-22 09:55:55

关于RandomizedSearchCV 和GridSearchCV(区别:参数个数的选择方式)

# -*- coding: utf-8 -*-
"""
Created on Tue Aug 09 22:38:37 2016
@author: Administrator
"""
import time
import numpy as np
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
from sklearn.grid_search import GridSearchCV
from sklearn.grid_search import RandomizedSearchCV
# 生成数据
digits = load_digits()
X, y = digits.data, digits.target
# 元分类器
meta_clf = RandomForestClassifier(n_estimators=20)
# =================================================================
# 设置参数
param_dist = {"max_depth": ,
"max_features": ,
"min_samples_split": ,
"min_samples_leaf": ,
"bootstrap": ,
"criterion": ["gini", "entropy"]}
# 运行随机搜索 RandomizedSearch
n_iter_search = 20
rs_clf = RandomizedSearchCV(meta_clf, param_distributions=param_dist,n_iter=n_iter_search)
start = time.time()
rs_clf.fit(X, y)
print("RandomizedSearchCV took %.2f seconds for %d candidates parameter settings." % ((time.time() - start), n_iter_search))
print(rs_clf.grid_scores_)
# =================================================================
# 设置参数
param_grid = {"max_depth": ,
"max_features": ,
"min_samples_split": ,
"min_samples_leaf": ,
"bootstrap": ,
"criterion": ["gini", "entropy"]}
# 运行网格搜索 GridSearch
gs_clf = GridSearchCV(meta_clf, param_grid=param_grid)
start = time.time()
gs_clf.fit(X, y)
print("GridSearchCV took %.2f seconds for %d candidate parameter settings." % (time.time() - start, len(gs_clf.grid_scores_)))
print(gs_clf.grid_scores_)
  RandomizedSearchCV took 8.64 seconds for 20 candidates parameter settings.

GridSearchCV took 83.70 seconds for 216 candidate parameter settings.


      GridSearchCV:Does exhaustive search over a grid of parameters.RandomizedSearchCV:Randomized search on hyper parameters.  RandomizedSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.
  The parameters of the estimator used to apply these methods are optimized by cross-validated search over parameter settings.
  In contrast to GridSearchCV, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from the specified distributions. The number of parameter settings that are tried is given by n_iter.
  If all parameters are presented as a list, sampling without replacement is performed. If at least one parameter is given as a distribution, sampling with replacement is used. It is highly recommended to use continuous distributions for continuous parameters.
页: [1]
查看完整版本: 关于RandomizedSearchCV 和GridSearchCV(区别:参数个数的选择方式)