Trainset class¶
- class surprise.Trainset(ur, ir, n_users, n_items, n_ratings, rating_scale, raw2inner_id_users, raw2inner_id_items)[source]¶
A trainset contains all useful data that constitute a training set.
It is used by the
fit()
method of every prediction algorithm. You should not try to build such an object on your own but rather use theDataset.folds()
method or theDatasetAutoFolds.build_full_trainset()
method.Trainsets are different from
Datasets
. You can think of aDataset
as the raw data, and Trainsets as higher-level data where useful methods are defined. Also, aDataset
may be comprised of multiple Trainsets (e.g. when doing cross validation).- ur¶
The users ratings. This is a dictionary containing lists of tuples of the form
(item_inner_id, rating)
. The keys are user inner ids.- Type
defaultdict
oflist
- ir¶
The items ratings. This is a dictionary containing lists of tuples of the form
(user_inner_id, rating)
. The keys are item inner ids.- Type
defaultdict
oflist
- n_users¶
Total number of users \(|U|\).
- n_items¶
Total number of items \(|I|\).
- n_ratings¶
Total number of ratings \(|R_{train}|\).
- rating_scale¶
The minimum and maximal rating of the rating scale.
- Type
tuple
- global_mean¶
The mean of all ratings \(\mu\).
- all_ratings()[source]¶
Generator function to iterate over all ratings.
- Yields
A tuple
(uid, iid, rating)
where ids are inner ids (see this note).
- build_anti_testset(fill=None)[source]¶
Return a list of ratings that can be used as a testset in the
test()
method.The ratings are all the ratings that are not in the trainset, i.e. all the ratings \(r_{ui}\) where the user \(u\) is known, the item \(i\) is known, but the rating \(r_{ui}\) is not in the trainset. As \(r_{ui}\) is unknown, it is either replaced by the
fill
value or assumed to be equal to the mean of all ratingsglobal_mean
.- Parameters
fill (float) – The value to fill unknown ratings. If
None
the global mean of all ratingsglobal_mean
will be used.- Returns
A list of tuples
(uid, iid, fill)
where ids are raw ids.
- build_testset()[source]¶
Return a list of ratings that can be used as a testset in the
test()
method.The ratings are all the ratings that are in the trainset, i.e. all the ratings returned by the
all_ratings()
generator. This is useful in cases where you want to to test your algorithm on the trainset.
- knows_item(iid)[source]¶
Indicate if the item is part of the trainset.
An item is part of the trainset if the item was rated at least once.
- Parameters
iid (int) – The (inner) item id. See this note.
- Returns
True
if item is part of the trainset, elseFalse
.
- knows_user(uid)[source]¶
Indicate if the user is part of the trainset.
A user is part of the trainset if the user has at least one rating.
- Parameters
uid (int) – The (inner) user id. See this note.
- Returns
True
if user is part of the trainset, elseFalse
.
- to_inner_iid(riid)[source]¶
Convert an item raw id to an inner id.
See this note.
- Parameters
riid (str) – The item raw id.
- Returns
The item inner id.
- Return type
int
- Raises
ValueError – When item is not part of the trainset.
- to_inner_uid(ruid)[source]¶
Convert a user raw id to an inner id.
See this note.
- Parameters
ruid (str) – The user raw id.
- Returns
The user inner id.
- Return type
int
- Raises
ValueError – When user is not part of the trainset.