ML1M Data Description

Rating Statistics

Ratings 999,480
Users 6,039
Items 3,705
Density 4.467%
Item Gini 0.634
Start Date 2000-04-25 23:25:58
End Date 2003-02-28 17:49:50

Item Statistics

This section describes the distribution of various item statistics from the data set.

Item Popularity

What is the distribution of popularity?

Let’s also look at this as a Lorenz curve, for clarity:

We’ll now fit a distribution.

[distfit] >INFO> fit
[distfit] >INFO> transform
[distfit] >INFO> [pareto  ] [0.00 sec] [RSS: 2.63488e-06] [loc=-222.014 scale=223.014]
[distfit] >INFO> [powerlaw] [0.00 sec] [RSS: 1.92112e-05] [loc=1.000 scale=3426.000]
[distfit] >INFO> [expon   ] [0.00 sec] [RSS: 1.50276e-05] [loc=1.000 scale=268.765]
[distfit] >INFO> [lognorm ] [0.00 sec] [RSS: 6.32251e-07] [loc=-0.765 scale=95.913]
[distfit] >INFO> [pareto  ] [0.00 sec] [RSS: 2.63488e-06] [loc=-222.014 scale=223.014]
[distfit] >INFO> [powerlaw] [0.00 sec] [RSS: 1.92112e-05] [loc=1.000 scale=3426.000]
[distfit] >INFO> [expon   ] [0.00 sec] [RSS: 1.50276e-05] [loc=1.000 scale=268.765]
[distfit] >INFO> [lognorm ] [0.00 sec] [RSS: 6.32251e-07] [loc=-0.765 scale=95.913]
[distfit] >INFO> Compute confidence intervals [parametric]

Summary of fits:

name score loc scale arg
0 lognorm 0.000001 -0.765407 95.91261 (1.7014004296924161,)
1 pareto 0.000003 -222.013965 223.013965 (1.6653227769596757,)
2 expon 0.000015 1.0 268.765182 ()
3 powerlaw 0.000019 1.0 3426.0 (0.20853226007445616,)

Item Average Rating

What is the distribution of average ratings?

User Statistics

We now turn to the distribution of various user statistics.

User Average Ratings

How are user averages distributed?

User Activity Level

And what is the distribution of user activity levels (# of ratings)?

Ratings over Time

The MovieLens ratings have timestamps, so we’ll also look at a temporal view of the data.

Data Volume

How did the data grow over time?

How many ratings are we getting each month through the life of the data set?

User Activity

Monthly unique users is a good measure of user activity.

How long do users usually stick around?