Model Training and Inference CLI Configurations
Launch CLI Arguments
GraphStorm’s model training and inference launch CLIs (both task-specific and task-agnostic) have a set of parameters to configure system enviornment for training and inference.
workspace: the folder where launch command assume all artifacts were saved. If the other parameters’ file paths are relative paths, launch command will consider these files in the workspace.
Note
Users need to create the workspace folder beforehand to avoid errors.
part-config: (Required) Path to a file containing graph partition configuration. The graph partition is generated by GraphStorm Partition tools.
Note
Use absolute path to avoid any path related problems. Otherwise, the file should be in
workspace
.ip-config: Path to a file containing IP addresses of instances in a distributed cluster. In the ip config file, each line stores one IP. This configuration is required only for model training and inference on distributed clusters.
Note
Use absolute path to avoid any path related problems. Otherwise, the file should be in
workspace
.num-trainers: The number of trainer processes per machine. Should >0.
num-servers: The number of server processes per machine. Should >0.
num-samplers: The number of sampler processes per trainer process. Should >=0.
num-server-threads: The number of OMP threads in the server process. It should be small if server processes and trainer processes run on the same machine. Should >0. By default, it is 1.
ssh-port: SSH port used by the host node to communicate with the other nodes in the cluster.
ssh-username: Optional. When issuing commands (via ssh) to cluster, use the provided username in the ssh command.
graph-format: The format of the graph structure of each partition. The allowed formats are csr, csc and coo. A user can specify multiple formats, separated by “,”. For example, the graph format is “csr,csc”.
extra-envs: Extra environment parameters need to be set. For example, you can set the LD_LIBRARY_PATH and NCCL_DEBUG by adding:
–extra_envs LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
–extra-envs LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
NCCL_DEBUG=INFO
do-nid-remap: Do GraphStorm node ID to Raw input node ID remapping for prediction results and node embeddings. Default is True.
Model Training and Inference Configurations
GraphStorm provides dozens of configurable parameters for users to control their training and inference tasks. You can use a yaml config file to define these parameters or you can use command line arguments to define and update these parameters. Specifically, GraphStorm parses yaml config file first. Then it parses arguments to overwrite parameters defined in the yaml file or add new parameters.
Yaml File Configurations
cf or yaml-config-file: (Required) Path to the YAML configuration file.
Note
Below configurations can be set either in a YAML configuration file or be added as arguments of launch command.
Environment Configurations
backend: PyTorch distributed backend, the suggested backend is gloo. Support backends include gloo and nccl.
Yaml:
backend: gloo
Argument:
--backend gloo
Default value:
gloo
verbose: Set true to print more execution information
Yaml:
verbose: false
Argument:
--verbose false
Default value:
false
use_graphbolt: Set true to use the GraphBolt graph representation during training. See https://docs.dgl.ai/stochastic_training/ for more details and Using GraphBolt to speed up training and inference for a complete example.
Yaml:
use_graphbolt: true
Argument:
--use-graphbolt true
Default value:
false
GNN Model Configurations
GraphStorm provides a set of parameters to config the GNN model structure (input layer, gnn layer, decoder layer, etc)
model_encoder_type: (Required) The Encoder module used to encode graph data. It can be a GNN encoder or a non-GNN encoder. A GNN encoder is composed of an input module, which encodes input node features, and a GNN module. A non-GNN encoder only contains an input module. GraphStorm supports five GNN encoders: rgcn which uses relational graph convolutional network as its GNN module, rgat which uses relational graph attention network as its GNN module, sage which uses GraphSage as its GNN module (only works with homogeneous graph), gat which uses graph attention network as its GNN module (only works with homogeneous graph) and hgt which uses heterogenous graph transformer as its GNN module. GraphStorm supports three non-GNN encoder: lm which requires each node type has and only has text features and uses language model, e.g., Bert, to encode these features, mlp which accepts various types of input node features (text feature, floating points and learnable embeddings) and finally uses an MLP to project these features into same dimension, and learnable_embed which initializes every node with a learnable embedding, i.e., node features will be ignored (it can be used to train knowledge graph embeddings).
Yaml:
model_encoder_type: rgcn
Argument:
--model-encoder-type rgcn
Default value: This parameter must be provided by user.
node_feat_name: User defined feature name. It accepts two formats like: a) fname, if all node types have the same node feature name, the corresponding feature name will be fname; b) ntype0:feat0 ntype1:featA …, if different node types have different node feature name(s). In the below example, ntype0 has a node feature named feat0 and ntype1 has two node features named featA and featB. By default, for nodes of the same type, their features are first concatenated into a unified tensor, which is then transformed through an MLP layer. For instance, suppose ntype0 has a node feature named feat0 and ntype1 has two node features named featA and featB. GraphStorm will encode feat0 of ntype0 with an MLP layer as MLP(feat0) and encode featA and featB of ntype1 with another MLP layer as MLP(featA|featB), where | represents a concatenation operation.
- Yaml:
node_feat_name:
- "ntype0:feat0"
- "ntype1:featA,featB"
- Yaml:
Argument:
--node-feat-name "ntype0:feat0 ntype1:featA,featB"
Default value: If not provided, there will be no node features used by GraphStorm even graphs have node features attached.
Since 0.5.0, GraphStorm supports using different MLP layers to encode different input node features of the same node. For example, suppose the ntype1 has three features featA, featB and featC, GraphStorm can encode featA and featB with an MLP encoder as MLP(featA|featB) and encode featC feature with another MLP encoder MLP(featC). Here is an example:
- Yaml:
node_feat_name:
- "ntype0:feat0"
- "ntype1:featA,featB"
- "ntype1:featC"
- Yaml:
Argument:
--node-feat-name "ntype0:feat0 ntype1:featA,featB ntype1:featC"
Note
Characters
:
and white space are not allowed to be used in node feature names. In Yaml format, users need to put each node’s feature in a separated line that starts with a hyphon.edge_feat_name: User defined edge feature name. It accepts two formats like: a) fname, if all edge types have the same feature name, the corresponding feature name will be fname; b) src_ntype1,etype1,dst_ntype1:feat0,… src_ntype2,etype2,dst_ntype2:featA …, if different edge types have different feature name(s). In the below example, src_ntype1,etype1,dst_ntype1 edge type has two edge features named feat0, and feat1, and src_ntype2,etype2,dst_ntype2 edge type has one edge feature named featA.
- Yaml:
edge_feat_name:
- "src_ntype1,etype1,dst_ntype1:feat0,feat1"
- "src_ntype2,etype2,dst_ntype2:featA"
- Yaml:
Argument:
—-edge-feat-name src_ntype1,etype1,dst_ntype1:feat0,feat1 src_ntype2,etype2dst_ntype2:featA
Default value: If not provided, there will be no edge features used by GraphStorm even graphs have edge features attached.
Note
In version 0.4, the RGCN encoder has been modified to support using edge features during message passing computation. If users would like to use edge features, please set the
model_encoder_type
to bergcn
. Otherwise, GraphStorm will raise an assertion error, warning that the chosen model encoder does not support edge features yet.edge_feat_mp_op: The operations to combine source node embeddings with edge embeddings during GNN message passing computation. Options include
concat
,add
,sub
,mul
, anddiv
.concat
operation will concatenate source node embeddings with edge embeddings;add
operation will add source node embeddings with edge embeddings;sub
operation will subtract source node embeddings by edge embeddings;mul
operation will multiply source node embeddings with edge embeddings;div
operation will divide source node embeddings by edge embeddings.Yaml:
edge_feat_mp_op: "add"
Argument:
-—edge-feat-mp-op add
Default value:
concat
.
Note
If the
edge_feat_name
configuration is not provided, theedge_feat_mp_op
configuration will be ignored.num_layers: Number of GNN layers. Must be an integer larger than 0 if given. By default, it is set to 0, which means no GNN layers.
Yaml:
num_layers: 2
Argument:
--num-layers 2
Default value:
0
hidden_size: (Required) The dimension of hidden GNN layers. Must be an integer larger than 0. Currently, each GNN layer has the same hidden dimension.
Yaml:
hidden_size: 128
Argument:
--hidden-size 128
Default value: This parameter must be provided by user.
use_self_loop: Set true include self feature as a special relation in relational GNN models. Used by built-in RGCN and RGAT model.
Yaml:
use_self_loop: false
Argument:
--use-self-loop false
Default value:
true
Built-in Model Specific Configurations
RGCN
num_bases: Number of filter weight matrices. num_bases is used to reduce the overall parameters of a RGCN model. It allows weight metrics of different relation types to share parameters. Note: the number of relation types of the graph used in training must be divisible by num_bases. By default, num_bases is set to -1, which means weight metrics do not share parameters.
Yaml:
num_bases: 2
Argument:
--num-bases 2
Default value:
-1
RGAT
num_heads: Number of attention heads.
Yaml:
num_heads: 8
Argument:
--num-heads 8
Default value:
4
Model Save/Restore Configurations
GraphStorm provides a set of parameters to control how and where to save and restore models.
save_model_path: A path to save GraphStorm model parameters and the corresponding optimizer status. The saved model parameters can be used in inference or model fine-tuning. See restore_model_path for how to retrieve a saved model and restore_optimizer_path for how to retrieve optimizer status.
Yaml:
save_model_path: /model/checkpoint/
Argument:
--save-model-path /model/checkpoint/
Default value: If not provide, models will not be saved.
save_embed_path: A path to save generated node embeddings.
Yaml:
save_embed_path: /model/emb/
Argument:
--save-embed-path /model/emb/
Default value: If not provide, models will not be saved.
save_model_frequency: Number of iterations to save model once. By default, GraphStorm will save models at the end of each epoch if save_model_path is provided. A user can set a positive integer, e.g. N, to let GraphStorm save models every N` iterations (mini-batches).
Yaml:
save_model_frequency: 1000
Argument:
--save-model-frequency 1000
Default value:
-1
. GraphStorm will not save models within an epoch.
topk_model_to_save: The number of top best GraphStorm model to save. By default, GraphStorm will keep all the saved models in disk, which will consume huge number of disk space. Users can set a positive integer, e.g. K, to let GraphStorm only save K models with the best performance.
Yaml:
topk_model_to_save: 3
Argument:
--topk-model-to-save 3
Default value:
0
. GraphStorm will save all the saved models in disk.
save_perf_results_path: Folder path to save performance results of model evaluation.
Yaml:
save_perf_results_path: /model/results/
Argument:
--save-perf-results-path /model/results/
Default value:
None
task_tracker: A task tracker used to formalize and report model performance metrics. Now GraphStorm supports two task trackers:
sagemaker_task_tracker
andtensorboard_task_tracker
.sagemaker_task_tracker
prints evaluation metrics in a formatted way so that a user can capture those metrics through SageMaker. (See Monitor and Analyze Training Jobs Using Amazon CloudWatch Metrics for more details.)tensorboard_task_tracker
dumps evaluation metrics in a formatted way that can be loaded by TensorBoard. The default path for storing the TensorBoard logs is./runs/
under workspace. Users can define their own TensorBoard log directory by setting task_tracker astensorboard_task_tracker:LOG_PATH
, whereLOG_PATH
will be the TensorBoard log directory. (Note: to usetensorboard_task_tracker
, one should install the tensorboard Python package usingpip install tensorboard
or during graphstorm installation usingpip install graphstorm[tensorboard]
.)Yaml:
task_tracker: tensorboard_task_tracker:./logs/
Argument:
--task_tracker tensorboard_task_tracker:./logs/
Default value:
sagemaker_task_tracker
restore_model_path: A path where GraphStorm model parameters were saved. For training, if restore_model_path is set, GraphStom will retrieve the model parameters from restore_model_path instead of initializing the parameters. For inference, restore_model_path must be provided.
Yaml:
restore_model_path: /model/checkpoint/
Argument:
--restore-model-path /model/checkpoint/
Default value: This parameter must be provided if users want to restore a saved model.
restore_model_layers: Specify which GraphStorm neural network layers to load. This argument is useful when a user wants to pre-train a GraphStorm model using link prediction and fine-tune the same model on a node or edge classification/regression task. Currently, three neural network layers are supported, i.e.,
embed
(input layer),gnn
anddecoder
. A user can select one or more layers to load.Yaml:
restore_model_layers: embed
Argument:
--restore-model-layers embed,gnn
Default value: Load all neural network layers
restore_optimizer_path: A path storing optimizer status corresponding to GraphML model parameters. This is used when a user wants to fine-tune a model from a pre-trained one.
Yaml:
restore_optimizer_path: /model/checkpoint/optimizer
Argument:
--restore-optimizer-path /model/checkpoint/optimizer
Default value: This parameter must be provided if users want to restore a saved optimizer.
Model Training Hyper-parameters Configurations
GraphStorm provides a set of parameters to control training hyper-parameters.
fanout: The fanouts of GNN layers. The fanouts must be integers larger than 0. The number of fanouts must equal to num_layers. It accepts two formats: a) “20,10”, it defines number of neighbors to sample per edge type for each GNN layer with the ith element being the fanout for the ith GNN layer. In the example, the fanout of the 0th GNN layer is 20 and the fanout of the 1st GNN layer is 10. b) "etype2:20@etype3:20@etype1:10,etype2:10@etype3:4@etype1:2". It defines the numbers of neighbors to sample for different edge types for each GNN layers with the i-th element being the fanout for the i-th GNN layer. In the example, the fanouts of etype2, etype3 and etype1 of 0th GNN layer are 20, 20 and 10 respectively and the fanouts of etype2, etype3 and etype1 of 0th GNN layer are 10, 4 and 2 respectively. Each etype (e.g., etype2) should be a canonical etype in format of "srcntype/relation/dstntype"
Yaml:
fanout: 10,10
Argument:
--fanout 10,10
Default value: This parameter must be provided by user. But if set the
--num_layers
to be 0, which means there is no GNN layer, no need to specify this configuration.
dropout: Dropout probability. Dropout must be a float value in [0,1). Dropout is applied to every GNN layer(s).
Yaml:
dropout: 0.5
Argument:
--dropout 0.5
Default value:
0.0
lr: (Required) Learning rate. Learning rate for dense parameters of input encoder, model encoder and decoder.
Yaml:
lr: 0.5
Argument:
--lr 0.5
Default value: This parameter must be provided by user.
max_grad_norm: Gradient clip which limits the magnitude of gradients during training in order to prevent issues like exploding gradients and improve the stability and convergence of the training process.
Yaml:
max_grad_norm: 0.1
Argument:
--max-grad-norm 0.1
Default value: None
grad_norm_type: Type of norm that is used to compute the gradient norm.
Yaml:
grad_norm_type: inf
Argument:
grad_norm_type 2
Default value: 2.0
num_epochs: Number of training epochs. Must be integer.
Yaml:
num_epochs: 5
Argument:
--num-epochs 5
Default value:
0
. By default only do testing/inference.
batch_size: (Required) Mini-batch size. It defines the batch size of each trainer. The global batch size equals to the number of trainers multiply the batch_size. For example, suppose we have 2 machines each with 8 GPUs and set batch_size to 128. The global batch size will be 2 * 8 * 128 = 2048.
Yaml:
batch_size: 128
Argument:
--batch_size 128
Default value: This parameter must be provided by user.
sparse_optimizer_lr: Learning rate of sparse optimizer. Learning rate for the optimizer corresponding to learnable sparse embeddings.
Yaml:
sparse_optimizer_lr: 0.5
Argument:
--sparse-optimizer-lr 0.5
Default value: same as
lr
.
use_node_embeddings: Set true to create and use extra learnable node embeddings for nodes. These learnable embeddings will be concatenated with nodes’ own features to form the inputs for model training.
Yaml:
use_node_embeddings: true
Argument:
--use-node-embeddings true
Default value:
false
wd_l2norm: Weight decay used by torch.optim.Adam.
Yaml:
wd_l2norm: 0.1
Argument:
--wd-l2norm 0.1
Default value:
0
alpha_l2norm: Coefficiency of the l2 norm of dense parameters. GraphStorm adds a regularization loss, i.e., l2 norm of dense parameters, to the final loss. It uses alpha_l2norm to re-scale the regularization loss. Specifically, loss = loss + alpha_l2norm * regularization_loss.
Yaml:
alpha_l2norm: 0.00001
Argument:
--alpha-l2norm 0.00001
Default value:
0.0
num_ffn_layers_in_input: Graphstorm provides this argument as an option to increase the size of the parameters in the input layer. This argument will add an MLP layer after computing the input embeddings for each node type. It accepts an integer greater than zero. Generally, embeds = MLP(embeds) for each node type in the input layer. If the input is n, it could add n Feedforward neural network layers in the MLP.
Yaml:
num_ffn_layers_in_input: 1
Argument:
--num-ffn-layers-in-input 1
Default value:
0
num_ffn_layers_in_gnn: Graphstorm provides this argument as an option to increase the size of the parameters between gnn layers. This argument will add an MLP layer at the end of each GNN layer. Generally, h = MLP(h) between GNN layers in a GNN model. If the input here is n, it could add n feedforward neural network layers here.
Yaml:
num_ffn_layers_in_gnn: 1
Argument:
--num-ffn-layers-in-gnn 1
Default value:
0
num_ffn_layers_in_decoder: Graphstorm provides this argument as an option to increase the size of the parameters in the decoder layer. This argument will add an MLP layer before the last layer of a decoder. If the input here is n, it could add n feedforward neural network layers. Please note, it is only effective when the decoder is an
MLPEdgeDecoder
or anMLPEFeatEdgeDecoder
. Support for other decoders will be added later.Yaml:
num_ffn_layers_in_decoder: 1
Argument:
--num-ffn-layers-in-decoder 1
Default value:
0
input_activate: Graphstorm provides this argument as an option to change the activation function in the input layer. Please note, it only accepts ‘relu’ and ‘none’.
Yaml:
input_activate: relu
Argument:
--input-activate relu
Default value:
none
gnn_norm: Graphstorm provides this argument as an option to define the norm type for gnn layers. Please note, it only accepts ‘batch’ and ‘layer’ for batchnorm and layernorm respectively.
Yaml:
gnn_norm: batch
Argument:
--gnn-norm batch
Default value:
none
Early stop configurations
GraphStorm provides a set of parameters to control early stop of training. By default, GraphStorm finishes training after num_epochs. One can use early stop to exit model training earlier.
Every time evaluation is triggered, GraphStorm checks early stop criteria. For the rounds within early_stop_burnin_rounds evaluation calls, GraphStorm will not use early stop. After early_stop_burnin_rounds, GraphStorm decides if stop early based on the early_stop_strategy. There are two strategies: 1) consecutive_increase, early stop is triggered if the current validation score is lower than the average of the last early_stop_rounds validation scores and 2) average_increase, early stop is triggered if for the last early_stop_rounds consecutive steps, the validation scores are decreasing.
early_stop_burnin_rounds: Burning period calls to start considering early stop.
Yaml:
early_stop_burnin_rounds: 100
Argument:
--early-stop-burnin-rounds 100
Default value:
0.0
early_stop_rounds: The number of rounds for validation scores used to decide if early stop.
Yaml:
early_stop_rounds: 5
Argument:
--early-stop-rounds 5
Default value:
3.
early_stop_strategy: GraphStorm supports two strategies: 1) consecutive_increase and 2) average_increase.
Yaml:
early_stop_strategy: consecutive_increase
Argument:
--early-stop-strategy average_increase
Default value:
average_increase
use_early_stop: Set true to enable early stop.
Yaml:
use_early_stop: true
Argument:
--use-early-stop true
Default value:
false
Model Evaluation Configurations
GraphStorm provides a set of parameters to control model evaluation.
eval_batch_size: Mini-batch size for computing GNN embeddings in evaluation. You can set eval_batch_size larger than batch_size to speedup GNN embedding computation. To be noted, a larger eval_batch_size will consume more GPU memory.
Yaml:
eval_batch_size: 1024
Argument:
--eval-batch-size 1024
Default value: 10000.
eval_fanout: (Required) The fanout of each GNN layers used in model evaluation and inference. It follows the same format as fanout.
Yaml:
eval_fanout: "10,10"
Argument:
--eval-fanout 10,10
Default value: This parameter must be provided by user. But if set the
--num_layers
to be 0, which means there is no GNN layer, no need to specify this configuration.
use_mini_batch_infer: Set true to do mini-batch inference during evaluation and inference. Set false to do full-graph inference during evaluation and inference. For node classification/regression and edge classification/regression tasks, if the evaluation set or testing set is small, mini-batch inference can be more efficient as it does not waste resources to compute node embeddings for nodes not used during inference. However, if the test set is large or the task is link prediction, full graph inference (set use_mini_batch_infer to false) is preferred, as it avoids recomputing node embeddings during inference.
Yaml:
use_mini_batch_infer: false
Argument:
--use-mini-batch-infer false
Default value:
true
eval_frequency: The frequency of doing evaluation. GraphStorm trainers do evaluation at the end of each epoch. However, for large-scale graphs, training one epoch may take hundreds of thousands of iterations. One may want to do evaluations in the middle of an epoch. When eval_frequency is set, every eval_frequency iterations, the trainer will do evaluation once. The evaluation results can be printed and reported.
Yaml:
eval_frequency: 10000
Argument:
--eval-frequency 10000
Default value:
sys.maxsize
. The system will not do evaluation.
no_validation: Set true to avoid do model evaluation (validation) during training.
Yaml:
no_validation: true
Argument:
--no-validation true
Default value:
false
fixed_test_size: Set the number of validation and test data used during link prediction training evaluaiotn. This is useful for reducing the overhead of doing link prediction evaluation when the graph size is large.
Yaml:
fixed_test_size: 100000
Argument:
--fixed-test-size 100000
Default value: None, Use the full validation and test set.
Language Model Specific Configurations
GraphStorm supports co-training language models with GNN. GraphStorm provides a set of parameters to control language model fine-tuning.
lm_tune_lr: Learning rate for fine-tuning language model.
Yaml:
lm_tune_lr: 0.0001
Argument:
--lm-tune-lr 0.0001
Default value: same as lr
lm_train_nodes: Number of nodes used in LM model fine-tuning.
Yaml:
lm_train_nodes: 10
Argument:
--lm-train-nodes 10
Default value:
0
lm_infer_batch_size: Batch size used in LM model inference.
Yaml:
lm_infer_batch_size: 10
Argument:
--lm-infer-batch-size 10
Default value:
32
freeze_lm_encoder_epochs: Before fine-tuning LM model, how many epochs we will take to warmup a GNN model.
Yaml:
freeze_lm_encoder_epochs: 1
Argument:
--freeze-lm-encoder-epochs 1
Default value:
0
Task Specific Configurations
GraphStorm supports node classification, node regression, edge classification, edge regression and link prediction tasks. It provides rich task related configurations.
General Configurations
task_type: (Required) Supported task type includes node_classification, node_regression, edge_classification, edge_regression, and link_prediction.
Yaml:
task_type: node_classification
Argument:
--task-type node_classification
Default value: This parameter must be provided by user.
eval_metric: Evaluation metrics used during evaluation. The input can be a string specifying the evaluation metric to report or a list of strings specifying a list of evaluation metrics to report. The first evaluation metric in the list is treated as the primary metric and is used to choose the best trained model and for early stopping. Each learning task supports different evaluation metrics:
The supported evaluation metrics of classification tasks include
accuracy
,precision_recall
,roc_auc
,f1_score
,per_class_f1_score
,hit_at_k
,precision
,recall
,fscore_at_beta
,recall_at_precision_beta
, andprecision_at_recall_beta
.We only support
recall_at_precision_beta
andprecision_at_recall_beta
metrics for binary classification.The
k
ofhit_at_k
can be any positive integer, for examplehit_at_10
orhit_at_100
. The termhit_at_k
refers to the number of true positives among the topk
predictions with the highest confidence scores. Note thathit_at_k
only works with binary classification tasks.The
beta
offscore_at_beta
can be any positive integer or float numbers, for examplefscore_at_2
orfscore_at_0.5
. Please make sure that thebeta
string can be converted to a float number by Python’s float() method.The
beta
ofrecall_at_precision_beta
andprecision_at_recall_beta
can be a positive number in (0, 1], for examplerecall_at_precision_0.9
,recall_at_precision_1
,precision_at_recall_0.8
, orprecision_at_recall_1.0
. Please make sure that thebeta
string can be converted to a float number by Python’s float() method.
The supported evaluation metrics of regression tasks include
rmse
,mse
andmae
.The supported evaluation metrics of link prediction tasks include
mrr
,amri
andhit_at_k
. MRR refers to the Mean Reciprocal Rank with values between and 0 (worst) and 1 (best), and AMRI refers the Adjusted Mean Rank Index, with values ranging from -1 (worst) to 1 (best). An AMRI value of 0 is equivalent to random guessing or assigning the same score to all edges in the candidate set. For more details on these metrics see Link Prediction Metrics.- Yaml:
eval_metric:
- accuracy
- precision_recall
- hit_at_10
- fscore_at_0.5
- recall_at_precision_0.7
- precision_at_recall_0.8
- Yaml:
Argument:
--eval-metric accuracy precision_recall hit_at_10 fscore_at_0.5 recall_at_precision_0.7 precision_at_recall_0.8
- Default value:
For classification tasks, the default value is
accuracy
.For regression tasks, the default value is
rmse
.For link prediction tasks, the default value is
mrr
.
gamma: Set the value of the hyperparameter denoted by the symbol gamma. Gamma is used in the following cases: i/ focal loss for binary classification ii/ DistMult score function for link prediction, iii/ TransE score function for link prediction, iv/ RotatE score function for link prediction, v/ shrinkage loss for regression.
Yaml:
gamma: 2.0
Argument:
--gamma 2.0
Default value:
2.0
in focal loss function;0.2
in shrinkage loss function;12.0
indistmult
,RotatE
, andTransE
link prediction decoders.
alpha: Set the value of the hyperparameter denoted by the symbol alpha. Alpha is used in the following cases: i/ focal loss for binary classification and ii/ shrinkage loss for regression.
Yaml:
alpha: 0.25
Argument:
--alpha 0.25
Default value:
0.25
in focal loss function;10.0
in shrinkage loss function.
Classification and Regression Task
label_field: (Required) The field name of labelled data in the graph data. For node classification tasks, GraphStorm use
graph.nodes[target_ntype].data[label_field]
to access node labels. For edge classification tasks, GraphStorm usegraph.edges[target_etype].data[label_field]
to access edge labels.Yaml:
label_field: color
Argument:
--label-field color
Default value: This parameter must be provided by user.
num_classes: (Required) The cardinality of labels in a classification task. Used by node classification and edge classification.
Yaml:
num_classes: 10
Argument:
--num-classes 10
Default value: This parameter must be provided by user.
multilabel: If set to true, the task is a multi-label classification task. Used by node classification and edge classification.
Yaml:
multilabel: true
Argument:
--multilabel true
Default value:
false
multilabel_weights: Used to specify a weight of positive examples for each class in a multi-label classification task. This is used together with multilabel. It is feed into
torch.nn.BCEWithLogitsLoss
aspos_weight
. The weights should be in the following format 0.1,0.2,0.3,0.1,0.0. Each field represents the weight of the positive answer for the class n. Suppose there are 3 classes. The multilabel_weights is set to 0.1,0.2,0.3. Class 0 will have weight of 0.1, class 1 will have weight of 0.2 and class 2 will have weight of 0.3. For more details, see BCEWithLogitsLoss. If not provided, all classes are treated equally.Yaml:
multilabel_weights: 0.1,0.2,0.3
Argument:
--multilabel-weights 0.1,0.2,0.3
Default value:
None
imbalance_class_weights: Used to specify a manual rescaling weight given to each class in a single-label multi-class classification task. It is used in imbalanced label use cases. It is feed into torch.nn.CrossEntropyLoss. Each field represents a weight for a class. Suppose there are 3 classes. The imbalance_class_weights is set to 0.1,0.2,0.3. Class 0 will have weight of 0.1, class 1 will have weight of 0.2 and class 2 will have weight of 0.3. If not provided, all classes are treated equally.
Yaml:
imbalance_class_weights: 0.1,0.2,0.3
Argument:
--imbalance-class-weights 0.1,0.2,0.3
Default value:
None
return_proba: In classification inference, this parameter determines whether the output files will contain probability estimates for each class or the maximum probable class in the output predicitons. Set true to return probability estimates and false to return the maximum probable class.
Yaml:
return_proba: true
Argument:
--return-proba true
Default value:
true
save_prediction_path: Path to save prediction results. This is used in node/edge classification/regression inference.
Yaml:
save_prediction_path: /data/infer-output/predictions/
Argument:
--save-prediction-path /data/infer-output/predictions/
Default value: If not provided, it will be the same as save_embed_path.
class_loss_func: Node/Edge classification loss function. Builtin loss functions include
cross_entropy
andfocal
. Setting this tofocal
will use the focal loss function defined in Focal Loss for Dense Object Detection, which is designed for imbalanced binary classification problems. When using focal loss, you may want to adjust the values of the gamma and alpha loss parameters to best fit your data.Yaml:
class_loss_func: cross_entropy
Argument:
--class-loss-func focal
Default value:
cross_entropy
Note
Focal loss can only be used for binary classification problems. Currently, focal loss produces predictions with shape
(N, 1)
, whereN
is the number of target nodes/edges. Inv0.5.0
this may be changed to produce predictions with shape(N, 2)
to match the cross-entropy loss.regression_loss_func: Node/Edge regression loss function. Builtin loss functions include
mse
andshrinkage
.shrinkage
means to use the shrinkage loss function defined in the Deep Regression Tracking with Shrinkage Loss, which is designed for data imbalance in regression tasks. If setshrinkage
, you may want to adjust the values of the gamma and alpha configurations according to your data.Yaml:
regression_loss_func: mse
Argument:
--regression-loss-func shrinkage
Default value:
mse
Node Classification/Regression Specific
target_ntype: The node type for prediction.
Yaml:
target_ntype: movie
Argument:
--target-ntype movie
Default value: For heterogeneous input graph, this parameter must be provided by the user. If not provided, GraphStorm will assume the input graph is a homogeneous graph and set
target_ntype
to “_N”.
infer_all_target_nodes: When set to True, run inference on all nodes in the target node type(s) as defined by target_ntype. NEEDS TO RUN WITH no_validation=True. We require all-node inference to run without evaluation, to avoid biased evaluation that includes nodes in the train set.
Yaml:
infer_all_target_nodes: True
Argument:
--infer-all-target-nodes True
Default value: False, inference will run only on the node subset specified by the test mask.
Edge Classification/Regression Specific
target_etype: The list of canonical edge types that will be added as training targets in edge classification/regression tasks, for example
--train-etype query,clicks,asin
or--train-etype query,clicks,asin query,search,asin
. A canonical edge type should be formatted as src_node_type,relation_type,dst_node_type. Currently, GraphStorm only supports single task edge classification/regression, i.e., it only accepts one canonical edge type.- Yaml:
target_etype:
- query,clicks,asin
- Yaml:
Argument:
--target-etype query,clicks,asin
Default value: For heterogeneous input graph, this parameter must be provided by the user. If not provided, GraphStorm will assume the input graph is a homogeneous graph and set
target_etype
to (“_N”, “_E”, “_N”).
remove_target_edge_type: When set to true, GraphStorm removes target_etype in message passing, i.e., any edge with target_etype will not be sampled during training and inference.
Yaml:
remove_target_edge_type: false
Argument:
--remove-target-edge-type false
Default value:
true
reverse_edge_types_map: A list of reverse edge type info. Each edge type is in the following format: head,relation,reverse_relation,tail. For example: [“query,adds,rev-adds,asin”, “query,clicks,rev-clicks,asin”]. For edge classification/regression tasks, if remove_target_edge_type is set true and reverse_edge_type_map is provided, GraphStorm will remove both target_etype and the corresponding reverse edge type(s) in message passing. In certain cases, any edge with target_etype or reverse target_etype will not be sampled during training and inference. For link prediction tasks, if exclude_training_targets is set to
true
and reverse_edge_type_map is provided, GraphStorm will remove both target edges with train_etype and the corresponding reverse edges with the reverse edge types of train_etype in message passing. In contrast to edge classification/regression tasks, for link prediction tasks, GraphStorm only excludes specific edges instead of all edges with target_etype or reverse target_etype in message passing.- Yaml:
reverse_edge_types_map:
- query,adds,rev-adds,asin
- query,clicks,rev-clicks,asin
- Yaml:
Argument:
--reverse-edge-types-map query,adds,rev-adds,asin query,clicks,rev-clicks,asin
Default value:
None
decoder_type: Type of edge classification or regression decoder. Built-in decoders include
DenseBiDecoder
andMLPDecoder
.DenseBiDecoder
implements the bi-linear decoder used in GCMC.MLPEdgeDecoder
simply applies Multilayer Perceptron layers for prediction.Yaml:
decoder-type: DenseBiDecoder
Argument:
--decoder-type MLPDecoder
Default value:
DenseBiDecoder
num_decoder_basis: The number of basis for DenseBiDecoder in edge prediction task.
Yaml:
num_decoder_basis: 2
Argument:
--num-decoder-basis 2
Default value:
2
decoder_edge_feat: A list of edge features that can be used by a decoder to enhance its performance.
- Yaml:
decoder_edge_feat:
- "fname"
- Or
decoder_edge_feat:
|- query,adds,asin:count,price
- Yaml:
Argument:
--decoder-edge-feat fanme
or--decoder-edge-feat query,adds,asin:count,price
Default value:
None
Link Prediction Task
train_etype: The list of canonical edge type that will be added as training target with the target edge type(s). If not provided, all edge types will be used as training target. A canonical edge type should be formatted as src_node_type,relation_type,dst_node_type.
- Yaml:
train_etype:
- query,clicks,asin
- query,adds,asin
- Yaml:
Argument:
--train-etype query,clicks,asin query,adds,asin
Default value:
None
eval_etype: The list of canonical edge type that will be added as evaluation target with the target edge type(s). If not provided, all edge types will be used as evaluation target. In some link prediction use cases, users want to train a model using all edges of a graph but only do link prediction on specific edge type(s) for downstream applications. In certain cases, they only care about the model performance on specific edge types.
- Yaml:
eval_etype:
- query,clicks,asin
- query,adds,asin
- Yaml:
Argument:
--eval-etype query,clicks,asin query,adds,asin
Default value:
None
exclude_training_targets: If it is set to
true
, GraphStorm removes the training targets from the GNN computation graph. If true, reverse_edge_types_map MUST be provided.Yaml:
exclude_training_targets: false
Argument:
--exclude-training-targets false
Default value:
true
train_negative_sampler: The negative sampler used for link prediction training. Built-in samplers include
uniform
,joint
,localuniform
,all_etype_uniform
andall_etype_joint
.Yaml:
train_negative_sampler: uniform
Argument:
--train-negative-sampler joint
Default value:
uniform
eval_negative_sampler: The negative sampler used for link prediction testing and evaluation. Built-in samplers include
uniform
,joint
,localuniform
,all_etype_uniform
andall_etype_joint
.Yaml:
eval_negative_sampler: uniform
Argument:
--eval-negative-sampler joint
Default value:
joint
num_negative_edges: Number of negative edges sampled for each positive edge during training.
Yaml:
num_negative_edges: 32
Argument:
--num-negative-edges 32
Default value:
16
num_negative_edges_eval: Number of negative edges sampled for each positive edge in the validation and test set.
Yaml:
num_negative_edges_eval: 1000
Argument:
--num-negative-edges-eval 1000
Default value:
1000
lp_decoder_type: Set the decoder type for loss function in Link Prediction tasks. Currently GraphStorm support
dot_product
,distmult
,rotate
,transe_l1
, andtranse_l2
.Yaml:
lp_decoder_type: dot_product
Argument:
--lp-decoder-type dot_product
Default value:
distmult
lp_loss_func: Link prediction loss function. Builtin loss functions include
cross_entropy
andcontrastive
.Yaml:
lp_loss_func: cross_entropy
Argument:
--lp-loss-func contrastive
Default value:
cross_entropy
adversarial_temperature: Enable adversarial cross entropy loss and set the
adversarial_temperature
hyper-parameter. Only work whenlp_loss_func
is set tocross_entropy
. More details can be found on the Link Prediction Loss Functions.Yaml:
adversarial_temperature: 1.0
Argument:
adversarial-temperature 1.0
Default value: None
lp_edge_weight_for_loss: Edge feature field name for edge weight. The edge weight is used to rescale the positive edge loss for link prediction tasks.
- Yaml:
lp_edge_weight_for_loss:
- "weight"
- Or
lp_edge_weight_for_loss:
- "ntype0,rel0,ntype1:weight0"
- "ntype0,rel1,ntype1:weight1"
- Yaml:
Argument:
--lp-edge-weight-for-loss ntype0,rel0,ntype1:weight0 ntype0,rel1,ntype1:weight1
Default value: None
contrastive_loss_temperature: Temperature of link prediction contrastive loss. This is used to rescale the link prediction positive and negative scores for the loss.
Yaml:
contrastive_loss_temperature: 0.01`
Argument:
--contrastive-loss-temperature 0.01
Default value: 1.0
lp_embed_normalizer: Type of normalization method used to normalize node embeddings in link prediction tasks. Currently GraphStorm only supports l2 normalization (l2_norm).
Yaml:
lp_embed_normalizer: l2_norm
Argument:
--lp-embed-normalizer l2_norm
Default value: None
train_etypes_negative_dstnode: The list of canonical edge types that have hard negative edges constructed by corrupting destination nodes during training. The format of the configuration is
src_type,rel_type0,dst_type:negative_nid_field src_type,rel_type1,dst_type:negative_nid_field
for each edge type to use different fields to store the hard negatives, ornegative_nid_field
for all edge types to use the same field to store the hard negatives.- Yaml:
train_etypes_negative_dstnode:
- src_type,rel_type0,dst_type:negative_nid_field
- src_type,rel_type1,dst_type:negative_nid_field
- Or
train_etypes_negative_dstnode: negative_nid_field
- Yaml:
Argument:
--train-etypes-negative-dstnode src_type,rel_type0,dst_type:negative_nid_field src_type,rel_type1,dst_type:negative_nid_field
Default value: None
num_train_hard_negatives: Number of hard negatives to sample for each edge type during training. The format of the configuration is
src_type,rel_type0,dst_type:num_negatives src_type,rel_type1,dst_type:num_negatives
for each edge type to have number of hard negatives, ornum_negatives
for all edge types to have the same number of hard negatives.- Yaml:
num_train_hard_negatives:
- src_type,rel_type0,dst_type:num_negatives
- src_type,rel_type1,dst_type:num_negatives
- Yaml:
- Or
num_train_hard_negatives: num_negatives
Argument:
num_train_hard_negatives src_type,rel_type0,dst_type:num_negatives src_type,rel_type1,dst_type:num_negatives
Default value: None
eval_etypes_negative_dstnode: The list of canonical edge types that have hard negative edges constructed by corrupting destination nodes during evaluation. The format of the configuration is
src_type,rel_type0,dst_type:negative_nid_field src_type,rel_type1,dst_type:negative_nid_field
for each edge type to use different fields to store the hard negatives, ornegative_nid_field
for all edge types to use the same field to store the hard negatives.- Yaml:
eval_etypes_negative_dstnode:
- src_type,rel_type0,dst_type:negative_nid_field
- src_type,rel_type1,dst_type:negative_nid_field
- Or
eval_etypes_negative_dstnode: negative_nid_field
- Yaml:
Argument:
--eval-etypes-negative-dstnode src_type,rel_type0,dst_type:negative_nid_field src_type,rel_type1,dst_type:negative_nid_field
Default value: None
Distillation Specific Configurations
GraphStorm provides a set of parameters to control GNN distillation.
textual_data_path: The path to load the textual data for distillation. User need to specify a path of directory with two sub-directory for
train
split andval
split. In each split, there can be one or more*.parquet
files. Find more details ingraphstorm/training_scripts/gsgnn_dt/README.md
.Yaml:
textual_data_path: <str>
Argument:
--textual-data-path <str>
Default value: This parameter must be provided by user.
max_distill_step: The maximum of training step for each node type for distillation.
Yaml:
max_distill_step: 10000
Argument:
--max-distill-step 10000
Default value:
10000
max_seq_len: The maximum sequence length for tokenized textual data.
Yaml:
max_seq_len: 1024
Argument:
--max-seq-len 1024
Default value:
1024