evaluate module¶
The evaluate
module defines the evaluate()
function and GridSearch
class
-
class
surprise.evaluate.
GridSearch
(algo_class, param_grid, measures=[u'rmse', u'mae'], n_jobs=-1, pre_dispatch=u'2*n_jobs', seed=None, verbose=1, joblib_verbose=0)¶ Warning
Deprecated since version 1.05. Use
GridSearchCV
instead. This class will be removed in later versions.The
GridSearch
class, used to evaluate the performance of an algorithm on various combinations of parameters, and extract the best combination. It is analogous to GridSearchCV from scikit-learn.See User Guide for usage.
Parameters: - algo_class (
AlgoBase
) – The class object of the algorithm to evaluate. - param_grid (dict) – Dictionary with algorithm parameters as keys and
list of values as keys. All combinations will be evaluated with
desired algorithm. Dict parameters such as
sim_options
require special treatment, see this note. - measures (list of string) – The performance measures to compute. Allowed
names are function names as defined in the
accuracy
module. Default is['rmse', 'mae']
. - n_jobs (int) –
The maximum number of algorithm training in parallel.
- If
-1
, all CPUs are used. - If
1
is given, no parallel computing code is used at all, which is useful for debugging. - For
n_jobs
below-1
,(n_cpus + n_jobs + 1)
are used. For example, withn_jobs = -2
all CPUs but one are used.
Default is
-1
. - If
- pre_dispatch (int or string) –
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
None
, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs.- An int, giving the exact number of total jobs that are spawned.
- A string, giving an expression as a function of
n_jobs
, as in'2*n_jobs'
.
Default is
'2*n_jobs'
. - seed (int) – The value to use as seed for RNG. It will determine how
splits are defined. If
None
, the current time since epoch is used. Default isNone
. - verbose (bool) – Level of verbosity. If
False
, nothing is printed. IfTrue
, The mean values of each measure are printed along for each parameter combination. Default isTrue
. - joblib_verbose (int) – Controls the verbosity of joblib: the higher, the more messages.
-
cv_results
¶ dict of arrays – A dict that contains all parameters and accuracy information for each combination. Can be imported into a pandas DataFrame.
-
best_estimator
¶ dict of AlgoBase – Using an accuracy measure as key, get the estimator that gave the best accuracy results for the chosen measure.
-
best_score
¶ dict of floats – Using an accuracy measure as key, get the best score achieved for that measure.
-
best_params
¶ dict of dicts – Using an accuracy measure as key, get the parameters combination that gave the best accuracy results for the chosen measure.
-
best_index
¶ dict of ints – Using an accuracy measure as key, get the index that can be used with cv_results that achieved the highest accuracy for that measure.
- algo_class (
-
surprise.evaluate.
evaluate
(algo, data, measures=[u'rmse', u'mae'], with_dump=False, dump_dir=None, verbose=1)¶ Warning
Deprecated since version 1.05. Use
cross_validate
instead. This function will be removed in later versions.Evaluate the performance of the algorithm on given data.
Depending on the nature of the
data
parameter, it may or may not perform cross validation.Parameters: - algo (
AlgoBase
) – The algorithm to evaluate. - data (
Dataset
) – The dataset on which to evaluate the algorithm. - measures (list of string) – The performance measures to compute. Allowed
names are function names as defined in the
accuracy
module. Default is['rmse', 'mae']
. - with_dump (bool) – If True, the predictions and the algorithm will be
dumped for later further analysis at each fold (see FAQ). The file names will be set as:
'<date>-<algorithm name>-<fold number>'
. Default isFalse
. - dump_dir (str) – The directory where to dump to files. Default is
'~/.surprise_data/dumps/'
, or the folder specified by the'SURPRISE_DATA_FOLDER'
environment variable (see FAQ). - verbose (int) – Level of verbosity. If 0, nothing is printed. If 1 (default), accuracy measures for each folds are printed, with a final summary. If 2, every prediction is printed.
Returns: A dictionary containing measures as keys and lists as values. Each list contains one entry per fold.
- algo (