FlexMF WARP

This page analyzes the hyperparameter tuning results for the FlexMF scorer in implicit-feedback mode with WARP loss.

Parameter Search Space

/home/mde48/lenskit/lenskit-codex/.venv/lib/python3.12/site-packages/ray/tune/search/sample.py:700: RayDeprecationWarning: The `base` argument is deprecated. Please remove it as it is not actually needed in this method.

Parameter	Type	Distribution	Values	Selected
embedding_size	Integer	LogUniform	4 ≤ \(x\) ≤ 512	24
regularization	Float	LogUniform	0.0001 ≤ \(x\) ≤ 10	0.0174
learning_rate	Float	LogUniform	0.001 ≤ \(x\) ≤ 0.1	0.00635
reg_method	Categorical	Uniform	L2, AdamW	AdamW
item_bias	Categorical	Uniform	True, False	True

Final Result

Searching selected the following configuration:

{
    'embedding_size': 24,
    'regularization': 0.01743226405485094,
    'learning_rate': 0.006352784697660642,
    'reg_method': 'AdamW',
    'item_bias': True,
    'epochs': 9
}

With these metrics:

{
    'RBP': 0.19015425062079377,
    'LogRBP': 2.1343202789545064,
    'NDCG': 0.43780242876109193,
    'RecipRank': 0.37138329866684155,
    'TrainTask': '27fe16f1-91fb-4293-9590-b24a8198abb5',
    'TrainTime': None,
    'TrainCPU': None,
    'max_epochs': 50,
    'done': False,
    'training_iteration': 9,
    'trial_id': 'f66f5633',
    'date': '2025-05-05_19-11-42',
    'timestamp': 1746486702,
    'time_this_iter_s': 60.38285946846008,
    'time_total_s': 522.8673887252808,
    'pid': 321186,
    'hostname': 'CCI-ws21',
    'node_ip': '10.248.127.152',
    'config': {
        'embedding_size': 24,
        'regularization': 0.01743226405485094,
        'learning_rate': 0.006352784697660642,
        'reg_method': 'AdamW',
        'item_bias': True,
        'epochs': 9
    },
    'time_since_restore': 522.8673887252808,
    'iterations_since_restore': 9
}

Parameter Analysis

Embedding Size

The embedding size is the hyperparameter that most affects the model’s fundamental logic, so let’s look at performance as a fufnction of it:

Iteration Completion

How many iterations, on average, did we complete?

How did the metric progress in the best result?

How did the metric progress in the longest results?