GSgnnLPEvaluator

class graphstorm.eval.GSgnnLPEvaluator(eval_frequency, eval_metric_list=None, use_early_stop=False, early_stop_burnin_rounds=0, early_stop_rounds=3, early_stop_strategy='average_increase')

Bases: GSgnnBaseEvaluator, GSgnnLPRankingEvalInterface

Evaluator for Link Prediction tasks using “mrr” and/or “hit@k” as metric(s).

A built-in evaluator for Link Prediction tasks. It uses “mrr” as the default eval metric, which implements the GSgnnLPRankingEvalInterface.

Parameters

eval_frequency: int: The frequency (number of iterations) of doing evaluation.
eval_metric_list: list of string: Evaluation metric(s) used during evaluation, for example, [“mrr”, “hit_at_10”]. Default: [“mrr”].
use_early_stop: bool: Set true to use early stop. Default: False.
early_stop_burnin_rounds: int: Burn-in rounds (number of evaluations) before starting to check for the early stop condition. Default: 0.
early_stop_rounds: int: The number of rounds (number of evaluations) for validation scores used to decide early stop. Default: 3.
early_stop_strategy: str: The early stop strategy. GraphStorm supports two strategies: 1) consecutive_increase, and 2) average_increase. Default: average_increase.

New in version 0.4.0: The GSgnnLPEvaluator.

evaluate(val_rankings, test_rankings, total_iters, **kwargs)

GSgnnLinkPredictionTrainer and GSgnnLinkPredictionInferrer will call this function to compute validation and test scores.

Parameters

val_rankings: dict of tensors

Rankings of positive scores of validation edges for each edge type in the format of {etype: ranking}.

test_rankings: dict of tensors

Rankings of positive scores of test edges for each edge type in the format of {etype: ranking}.

total_iters: int

The current iteration number.

kwargs:

Keyword arguments to pass downstream to the metric computation.

Currently we support:

val_candidate_sizestorch.Tensor: A tensor containing the size of each candidate list (positive + negative pairs) for each testing edge in the validation set. If the tensor has a single element we use that as the size of all lists.
test_candidate_sizestorch.Tensor: A tensor containing the size of each candidate list (positive + negative pairs) for every edge in the test set. If the tensor has a single element we use that as the size of all lists.

..versionadded:: 0.4.0

Returns

val_score: dict of float: Validation score in the format of {metric: val_score}. If the val_ranking is None, return {metric: “N/A”}.
test_score: dict of float: Test score in the format of {metric: test_score}. If the test_ranking is None, return {metric: “N/A”}.

compute_score(rankings: Dict[str, Tensor], train=True, **kwargs)

Compute evaluation score.

Parameters

rankings: dict of tensors

Rankings of positive scores in the format of {etype: ranking}.

train: boolean

If in model training.

kwargs: dict

Keyword arguments to pass downstream to the metric computation.

Currently we support:

candidate_sizesdict of tensors: A mapping from edge type to the the size of each candidate list (positive + negative pairs). If the tensor has a single element we use that as the size of all lists.

Returns

return_metrics: dict of float: Evaluation score of in the format of {metric: score}.