Model Training and Inference on a Single Machine

While the Standalone Mode Quick Start tutorial introduces some basic concepts, commands, and steps of using GprahStorm CLIs on a single machine, this user guide provides more detailed description of the usage of GraphStorm CLIs in a single machine. In addition, the majority of the descriptions in this guide can be directly applied to model training and inference on distributed clusters.

GraphStorm can support graph machine learning (GML) model training and inference for common GML tasks, including node classification, node regression, edge classification, edge regression, and link prediction. Since the multi-task learning feature released in v0.3 is in experimental stage, formal documentations about this feature will be released later when it is mature.

For each task, GraphStorm provide a dedicated CLI for model training and inference. These CLIs share the same command template and some configurations, while each CLI has its unique task-specific configurations. GraphStorm also has a task-agnostic CLI for users to run your customized models.

Task-specific CLI template for model training and inference

GraphStorm model training and inference CLIs like the commands below.

# Model training
python -m graphstorm.run.TASK_COMMAND \
          --num-trainers 1 \
          --part-config data.json \
          --cf config.yaml \
          --save-model-path model_path/

# Model inference
python -m graphstorm.run.TASK_COMMAND \
          --inference \
          --num-trainers 1 \
          --part-config data.json \
          --cf config.yaml \
          --restore-model-path model_path/ \
          --save-prediction-path pred_path/

In the above two templates, the TASK_COMMAND represents one of the five task-specific commands:

gs_node_classification for node classification tasks;

gs_node_regression for node regression tasks;

gs_edge_classification for edge classification tasks;

gs_edge_regression for edge regression tasks;

gs_link_prediction for link prediction tasks.

These task-specific commands work for both model training and inference except that inference CLI needs to add the --inference argument to indicate this is an inference CLI, and the --restore-model-path argument that indicates the path of the saved model checkpoint.

For a single machine, the argument --num-trainers can configure how many GPUs or CPU processes to be used. If using a GPU machine, the value of --num-trainers should be equal or less than the total number of available GPUs, while in a CPU-only machine, the value could be less than the total number of CPU processes to avoid errors.

GraphStorm model training and inference CLIs use the --part-config argument to specify the partitioned graph data. Its value should be the path of the *.json file that is generated by the GraphStorm Graph Construction step.

While the CLIs could be very simple as the template demonstrated, users can leverage a YAML file to set a variaty of GraphStorm configurations that could make full use of the rich functions and features provided by GraphStorm. The YAML file will be specified to the --cf argument. GraphStorm has a set of example YAML files available for reference.

Note

Users can set CLI configurations either in CLI arguments or the configuration YAML file. But values set in CLI arguments will overwrite the values of the same configuration set in the YAML file.
This guide only explains a few commonly used configurations. For the detailed explanations of GraphStorm CLI configurations, please refer to the Model Training and Inference Configurations section.

Task-agnostic CLI for model training and inference

While task-specific CLIs allow users to quickly perform GML tasks supported by GraphStorm, users may build their own GNN models as described in the Use Your Own Models tutorial. To put these customized models into GraphStorm model training and inference pipeline, users can use the task-agnostic CLI as shown in the examples below.

# Model training
python -m graphstorm.run.launch \
          --num-trainers 1 \
          --part-config data.json \
          customized_model.py --save-model-path model_path/ \
                              customized_arguments

# Model inference
python -m graphstorm.run.launch \
          --inference \
          --num-trainers 1 \
          --part-config data.json \
          customized_model.py --restore-model-path model_path/ \
                              --save-prediction-path pred_path/ \
                              customized_arguments

The task-agnostic CLI command (launch) has similar tempalte as the task-specific CLIs except that it takes the customized model, which is stored as a .py file, as an argument. And in case the customized model has its own arguments, they should be placed after the customized model python file.