GSgnnEmbGenInferer

class graphstorm.inference.GSgnnEmbGenInferer(model)

Bases: GSInferrer

Inferrer for embedding generation tasks.

GSgnnEmbGenInferer defines the infer() method that performs one work:

Generate node embeddings and save to disk.

Parameters

modelGSgnnModel: This model should be a model class that inerits GSgnnModel. It is suggested to inherit from GSgnnNodeModelBase, GSgnnEdgeModelBase, or GSgnnLinkPredictionModelBase for node, edge, and link prediction models, repspectively. These bases define the necessary interfaces for each task type.

infer(data, infer_ntypes, save_embed_path, eval_fanout, use_mini_batch_infer=False, node_id_mapping_file=None, save_embed_format='pytorch', infer_batch_size=1024)

Generate node embeddings and save to disk.

Parameters

data: GSgnnData: The GraphStorm dataset
infer_ntypeslist of str: List of node types to compute embeddings in the format of [ntype1, ntype2, …].
save_embed_pathstr: The path where the GNN embeddings will be saved.
eval_fanout: list of int: Neighbor sampling fanout of each GNN layer used in evaluation and inference.
use_mini_batch_infer: bool: Whether to use mini-batch for inference. Default: False.
node_id_mapping_file: str: Path to the file storing node id mapping generated by the graph partition algorithm. If is None, will not do node ID mapping. Default: None.
save_embed_formatstr: Specify the data format of saved embeddings. Currently only support PyTorch Tensor. Default: “pytorch”.
infer_batch_size: int: The inference batch size when computing node embeddings with mini-batch inference.