GSgnnEdgePredictionTrainer

class graphstorm.trainer.GSgnnEdgePredictionTrainer(model, topk_model_to_save)

Bases: GSgnnTrainer

Edge prediction trainer.

This class is used to train models for edge prediction tasks, such as edge classification and edge regression.

It makes use of the functions provided by GSgnnTrainer to define two main functions: fit that performs the training for the model that is provided when the object is created, and eval that evaluates a provided model against test and validation data.

Parameters

modelGSgnnEdgeModel: The GNN model for edge prediction.
topk_model_to_saveint: The top K model to save.

Example

from graphstorm.dataloading import GSgnnEdgeDataLoader
from graphstorm.dataset import GSgnnEdgeData
from graphstorm.model import GSgnnEdgeModel
from graphstorm.trainer import GSgnnEdgePredictionTrainer

my_dataset = GSgnnEdgeTrainData(
    "my_graph", "/path/to/part_config", train_etypes="edge_type")
target_idx = {"edge_type": target_edges_tensor}
my_data_loader = GSgnnEdgeDataLoader(
    my_dataset, target_idx, fanout=[10], batch_size=1024)
my_model = GSgnnEdgeModel(alpha_l2norm=0.0)

trainer = GSgnnEdgePredictionTrainer(my_model, topk_model_to_save=1)

trainer.fit(my_data_loader, num_epochs=2)

property device: The device associated with the trainer.

eval(model, val_loader, test_loader, use_mini_batch_infer, total_steps, return_proba=True)

do the model evaluation using validation and test sets

Parameters

modelPytorch model: The GNN model.
val_loader: GSNodeDataLoader: The dataloader for validation data
test_loaderGSNodeDataLoader: The dataloader for test data.
use_mini_batch_inferbool: Whether or not to use mini-batch inference.
total_steps: int: Total number of iterations.
return_proba: bool: Whether to return all the predictions or the maximum prediction.

Returns

float: validation score

property evaluator: The evaluator associated with the trainer.

fit(train_loader, num_epochs, val_loader=None, test_loader=None, use_mini_batch_infer=True, save_model_path=None, save_model_frequency=None, save_perf_results_path=None, freeze_input_layer_epochs=0, max_grad_norm=None, grad_norm_type=2.0)

The fit function for edge prediction.

Performs the training for self.model. Iterates over the training batches in train_loader to compute the loss and perform the backwards step using self.optimizer. If an evaluator has been assigned to the trainer, it will run evaluation at the end of every epoch.

Parameters

train_loaderGSgnnEdgeDataLoader: The mini-batch sampler for training.
num_epochsint: The max number of epochs to train the model.
val_loaderGSgnnEdgeDataLoader: The mini-batch sampler for computing validation scores. The validation scores are used for selecting models.
test_loaderGSgnnEdgeDataLoader: The mini-batch sampler for computing test scores.
use_mini_batch_inferbool: Whether or not to use mini-batch inference.
save_model_pathstr: The path where the model is saved.
save_model_frequencyint: The number of iteration to train the model before saving the model.
save_perf_results_pathstr: The path of the file where the performance results are saved.
freeze_input_layer_epochs: int: Freeze input layer model for N epochs. This is commonly used when the input layer contains language models. Default: 0, no freeze.
max_grad_norm: float: Clip the gradient by the max_grad_norm to ensure stability. Default: None, no clip.
grad_norm_type: float: Norm type for the gradient clip Default: 2.0

get_best_model_path(): Return the path of the best model.

property optimizer: The optimizer associated with the trainer.

remove_saved_model(epoch, i, save_model_path)

remove previously saved model, which may not be the best K performed or other reasons.: This function will remove the entire folder.

Parameters

epoch: int: The number of training epoch.
i: int: The number of iteration in a training epoch.
save_model_pathstr: The path where the model is saved.

restore_model(model_path, model_layer_to_load=None)

Restore a GNN model and the optimizer.

Parameters

model_pathstr: The path where the model and the optimizer state has been saved.
model_layer_to_load: list of str: list of model layers to load. Supported layers include ‘gnn’, ‘embed’, ‘decoder’

save_model(model, epoch, i, save_model_path): Save the model for a certain iteration in an epoch.

save_topk_models(model, epoch, i, val_score, save_model_path)

Based on the given val_score, decided if save the current model trained in the i_th: iteration and the epoch_th epoch.

Parameters

modelpytorch model: The GNN model.
epoch: int: The number of training epoch.
i: int: The number of iteration in a training epoch.
val_score: dict or None: A dictionary contains scores from evaluator’s validation function. It could be None that means there is either no evluator or not do validation. In that case, just set the score rank as 1st to save all models or the last k models.
save_model_pathstr: The path where the model is saved.

setup_device(device)

Set up the device of this trainer.

The CUDA device is set up based on the local rank.

Parameters

device :: The device for model training.

setup_evaluator(evaluator)

Setup the evaluator

If the evaluator has its own task tracker, just setup the evaluator. But if the evaluator has no task tracker, will use this Trainer’s task tracker to setup the evaluator. When there is no self task tracker, will create a new one by using the given evaluator’s evaluation frequency.