Ratings | 20,000,263 |
Users | 138,493 |
Items | 26,744 |
Density | 0.540% |
Item Gini | 0.903 |
Start Date | 1995-01-09 11:46:44 |
End Date | 2015-03-31 06:40:02 |
ML20M Data Description
Rating Statistics
Item Statistics
This section describes the distribution of various item statistics from the data set.
Item Popularity
What is the distribution of popularity?
Let’s also look at this as a Lorenz curve, for clarity:
We’ll now fit a distribution.
[distfit] >INFO> fit
[distfit] >INFO> transform
[distfit] >INFO> [pareto ] [0.00 sec] [RSS: 0.000310844] [loc=0.511 scale=0.489]
[distfit] >INFO> [powerlaw] [0.01 sec] [RSS: 0.000901913] [loc=1.000 scale=67309.000]
[distfit] >INFO> [expon ] [0.00 sec] [RSS: 0.00118941] [loc=1.000 scale=746.841]
[distfit] >INFO> [lognorm ] [0.02 sec] [RSS: 0.00101221] [loc=1.000 scale=0.120]
[distfit] >INFO> [pareto ] [0.04 sec] [RSS: 0.000310844] [loc=0.511 scale=0.489]
[distfit] >INFO> [powerlaw] [0.04 sec] [RSS: 0.000901913] [loc=1.000 scale=67309.000]
[distfit] >INFO> [expon ] [0.02 sec] [RSS: 0.00118941] [loc=1.000 scale=746.841]
[distfit] >INFO> [lognorm ] [0.02 sec] [RSS: 0.00101221] [loc=1.000 scale=0.120]
[distfit] >INFO> Compute confidence intervals [parametric]
Summary of fits:
name | score | loc | scale | arg | |
---|---|---|---|---|---|
0 | pareto | 0.000311 | 0.511316 | 0.488684 | (0.2562068171214625,) |
1 | powerlaw | 0.000902 | 1.0 | 67309.0 | (0.07494802059946808,) |
2 | lognorm | 0.001012 | 1.0 | 0.119722 | (14.36541422380245,) |
3 | expon | 0.001189 | 1.0 | 746.841123 | () |
Item Average Rating
What is the distribution of average ratings?
User Statistics
We now turn to the distribution of various user statistics.
User Average Ratings
How are user averages distributed?
User Activity Level
And what is the distribution of user activity levels (# of ratings)?
Ratings over Time
The MovieLens ratings have timestamps, so we’ll also look at a temporal view of the data.
Data Volume
How did the data grow over time?
How many ratings are we getting each month through the life of the data set?
User Activity
Monthly unique users is a good measure of user activity.
How long do users usually stick around?