GSgnnEmbGenInferer

class graphstorm.inference.GSgnnEmbGenInferer(model)

Bases: GSInferrer

Inferrer for embedding generation tasks.

GSgnnEmbGenInferer defines the infer() method that performs one work:

  • Generate node embeddings and save to disk.

Parameters

modelGSgnnModel

This model should be a model class that inerits GSgnnModel. It is suggested to inherit from GSgnnNodeModelBase, GSgnnEdgeModelBase, or GSgnnLinkPredictionModelBase for node, edge, and link prediction models, repspectively. These bases define the necessary interfaces for each task type.

infer(data, infer_ntypes, save_embed_path, eval_fanout, use_mini_batch_infer=False, node_id_mapping_file=None, save_embed_format='pytorch', infer_batch_size=1024)

Generate node embeddings and save to disk.

Parameters

data: GSgnnData

The GraphStorm dataset

infer_ntypeslist of str

List of node types to compute embeddings in the format of [ntype1, ntype2, …].

save_embed_pathstr

The path where the GNN embeddings will be saved.

eval_fanout: list of int

Neighbor sampling fanout of each GNN layer used in evaluation and inference.

use_mini_batch_infer: bool

Whether to use mini-batch for inference. Default: False.

node_id_mapping_file: str

Path to the file storing node id mapping generated by the graph partition algorithm. If is None, will not do node ID mapping. Default: None.

save_embed_formatstr

Specify the data format of saved embeddings. Currently only support PyTorch Tensor. Default: “pytorch”.

infer_batch_size: int

The inference batch size when computing node embeddings with mini-batch inference.