GSgnnNodePredictionTrainer

class graphstorm.trainer.GSgnnNodePredictionTrainer(model, topk_model_to_save=1)

Bases: GSgnnTrainer

A trainer for node prediction

This class is used to train models for node prediction tasks, such as node classification and node regression.

It makes use of the functions provided by GSgnnTrainer to define two main functions: fit that performs the training for the model that is provided when the object is created, and eval that evaluates a provided model against test and validation data.

Parameters

modelGSgnnNodeModel

The GNN model for node prediction.

topk_model_to_saveint

The top K model to save.

Example

from graphstorm.dataloading import GSgnnNodeDataLoader
from graphstorm.dataset import GSgnnNodeTrainData
from graphstorm.model.node_gnn import GSgnnNodeModel
from graphstorm.trainer import GSgnnNodePredictionTrainer

my_dataset = GSgnnNodeTrainData(
    "my_graph", "/path/to/part_config", "my_node_type")
target_idx = {"my_node_type": target_nodes_tensor}
my_data_loader = GSgnnNodeDataLoader(
    my_dataset, target_idx, fanout=[10], batch_size=1024, device='cpu')
my_model = GSgnnNodeModel(alpha_l2norm=0.0)

trainer =  GSgnnNodePredictionTrainer(my_model, topk_model_to_save=1)

trainer.fit(my_data_loader, num_epochs=2)
property device

The device associated with the trainer.

eval(model, val_loader, test_loader, use_mini_batch_infer, total_steps, return_proba=True)

do the model evaluation using validation and test sets

Parameters

modelPytorch model

The GNN model.

val_loader: GSNodeDataLoader

The dataloader for validation data

test_loaderGSNodeDataLoader

The dataloader for test data.

use_mini_batch_infer: bool

Whether do mini-batch inference

total_steps: int

Total number of iterations.

return_proba: bool

Whether to return all the predictions or the maximum prediction.

Returns

float: validation score

property evaluator

The evaluator associated with the trainer.

fit(train_loader, num_epochs, val_loader=None, test_loader=None, use_mini_batch_infer=True, save_model_path=None, save_model_frequency=-1, save_perf_results_path=None, freeze_input_layer_epochs=0, max_grad_norm=None, grad_norm_type=2.0)

The fit function for node prediction.

Performs the training for self.model. Iterates over the training batches in train_loader to compute the loss and perform the backwards step using self.optimizer. If an evaluator has been assigned to the trainer, it will run evaluation at the end of every epoch.

Parameters

train_loaderGSgnnNodeDataLoader

The mini-batch sampler for training.

num_epochsint

The max number of epochs to train the model.

val_loaderGSgnnNodeDataLoader

The mini-batch sampler for computing validation scores. The validation scores are used for selecting models.

test_loaderGSgnnNodeDataLoader

The mini-batch sampler for computing test scores.

use_mini_batch_inferbool

Whether or not to use mini-batch inference.

save_model_pathstr

The path where the model is saved.

save_model_frequencyint

The number of iteration to train the model before saving the model.

save_perf_results_pathstr

The path of the file where the performance results are saved.

freeze_input_layer_epochs: int

Freeze the input layer for N epochs. This is commonly used when the input layer contains language models. Default: 0, no freeze.

max_grad_norm: float

Clip the gradient by the max_grad_norm to ensure stability. Default: None, no clip.

grad_norm_type: float

Norm type for the gradient clip Default: 2.0

get_best_model_path()

Return the path of the best model.

property optimizer

The optimizer associated with the trainer.

remove_saved_model(epoch, i, save_model_path)
remove previously saved model, which may not be the best K performed or other reasons.

This function will remove the entire folder.

Parameters

epoch: int

The number of training epoch.

i: int

The number of iteration in a training epoch.

save_model_pathstr

The path where the model is saved.

restore_model(model_path, model_layer_to_load=None)

Restore a GNN model and the optimizer.

Parameters

model_pathstr

The path where the model and the optimizer state has been saved.

model_layer_to_load: list of str

list of model layers to load. Supported layers include ‘gnn’, ‘embed’, ‘decoder’

save_model(model, epoch, i, save_model_path)

Save the model for a certain iteration in an epoch.

save_topk_models(model, epoch, i, val_score, save_model_path)
Based on the given val_score, decided if save the current model trained in the i_th

iteration and the epoch_th epoch.

Parameters

modelpytorch model

The GNN model.

epoch: int

The number of training epoch.

i: int

The number of iteration in a training epoch.

val_score: dict or None

A dictionary contains scores from evaluator’s validation function. It could be None that means there is either no evluator or not do validation. In that case, just set the score rank as 1st to save all models or the last k models.

save_model_pathstr

The path where the model is saved.

setup_device(device)

Set up the device of this trainer.

The CUDA device is set up based on the local rank.

Parameters

device :

The device for model training.

setup_evaluator(evaluator)

Set the evaluator