Data Description

Rating Statistics

Ratings 99,831
Users 942
Items 1,681
Density 6.304%
Item Gini 0.629
Start Date 1997-09-20 03:05:10
End Date 1998-04-22 23:10:38

Item Statistics

This section describes the distribution of various item statistics from the data set.

Item Popularity

What is the distribution of popularity?

Let’s also look at this as a Lorenz curve, for clarity:

Item Average Rating

What is the distribution of average ratings?

User Statistics

We now turn to the distribution of various user statistics.

User Average Ratings

How are user averages distributed?

User Activity Level

And what is the distribution of user activity levels (# of ratings)?

Ratings over Time

The MovieLens ratings have timestamps, so we’ll also look at a temporal view of the data.

Data Volume

How did the data grow over time?

How many ratings are we getting each month through the life of the data set?

User Activity

Monthly unique users is a good measure of user activity.

How long do users usually stick around?

Parametric Activity Distributions

Some downstream uses benefit from parametric distributions of user/item activity levels.

Item Activity Distributions

This section models the item popularity distribution with various parametric distributions.

Distribution Params Location Scale D(KL) Δ(JS)
Log-normal s=1.947 0.779 17.847 0.231 0.223
Pareto b=0.306 0.04 0.952 0.519 0.291
Power law a=0.166 1 581.083 0.43 0.344
Geometric p=0.017 0 0.309 0.261

User Activity Distributions

Now the same, for user activity distributions.

Distribution Params Location Scale D(KL) Δ(JS)
Log-normal s=1.474 18.677 38.901 0.314 0.27
Pareto b=0.830 0.902 19.055 0.394 0.282
Power law a=0.239 19.999 762.609 0.53 0.382
NegBinom n=1.000, p=0.011 20 0.305 0.277
BetaNegBinom n=1.000, a=3.886, b=253.067 20 0.285 0.268