FlexMF WARP

This page analyzes the hyperparameter tuning results for the FlexMF scorer in implicit-feedback mode with WARP loss.

Parameter Search Space

/home/mde48/lenskit/lenskit-codex/.venv/lib/python3.12/site-packages/ray/tune/search/sample.py:700: RayDeprecationWarning: The `base` argument is deprecated. Please remove it as it is not actually needed in this method.

Parameter	Type	Distribution	Values	Selected
embedding_size	Integer	LogUniform	4 ≤ \(x\) ≤ 512	9
regularization	Float	LogUniform	0.0001 ≤ \(x\) ≤ 10	0.0159
learning_rate	Float	LogUniform	0.001 ≤ \(x\) ≤ 0.1	0.00287
reg_method	Categorical	Uniform	L2, AdamW	AdamW
item_bias	Categorical	Uniform	True, False	True

Final Result

Searching selected the following configuration:

{
    'embedding_size': 9,
    'regularization': 0.015922899971229115,
    'learning_rate': 0.0028666698840842833,
    'reg_method': 'AdamW',
    'item_bias': True,
    'epochs': 18
}

With these metrics:

{
    'RBP': 0.16809865233240098,
    'LogRBP': 2.011035714113172,
    'NDCG': 0.4158398961397396,
    'RecipRank': 0.32732768642869414,
    'TrainTask': '03843709-a01b-4955-b95a-ae5a254a8b3c',
    'TrainTime': None,
    'TrainCPU': None,
    'max_epochs': 50,
    'done': False,
    'training_iteration': 18,
    'trial_id': '69a32768',
    'date': '2025-05-07_10-19-31',
    'timestamp': 1746627571,
    'time_this_iter_s': 70.88943409919739,
    'time_total_s': 1716.4671351909637,
    'pid': 284894,
    'hostname': 'CCI-ws21',
    'node_ip': '10.248.127.152',
    'config': {
        'embedding_size': 9,
        'regularization': 0.015922899971229115,
        'learning_rate': 0.0028666698840842833,
        'reg_method': 'AdamW',
        'item_bias': True,
        'epochs': 18
    },
    'time_since_restore': 155.94253373146057,
    'iterations_since_restore': 2
}

Parameter Analysis

Embedding Size

The embedding size is the hyperparameter that most affects the model’s fundamental logic, so let’s look at performance as a fufnction of it:

Iteration Completion

How many iterations, on average, did we complete?

How did the metric progress in the best result?

How did the metric progress in the longest results?