GSgnnLinkPredictionDataLoader

class graphstorm.dataloading.GSgnnLinkPredictionDataLoader(dataset, target_idx, fanout, batch_size, num_negative_edges, device='cpu', train_task=True, reverse_edge_types_map=None, exclude_training_targets=False, edge_mask_for_gnn_embeddings='train_mask', construct_feat_ntype=None, construct_feat_fanout=5, edge_dst_negative_field=None, num_hard_negs=None)

Bases: GSgnnLinkPredictionDataLoaderBase

Link prediction minibatch dataloader

GSgnnLinkPredictionDataLoader samples GraphStorm edge dataset into an iterable over mini-batches of samples. In each batch, pos_graph and neg_graph are sampled subgraph for positive and negative edges, which will be used by GraphStorm Trainers and Inferrers. Given a positive edge, a negative edge is composed of the source node and a random negative destination nodes according to a uniform distribution.

Argument

dataset: GSgnnEdgeData: The GraphStorm edge dataset
target_idxdict of Tensors: The target edges for prediction
fanout: list of int or dict of list: Neighbor sample fanout. If it’s a dict, it indicates the fanout for each edge type.
batch_size: int: Batch size
num_negative_edges: int: The number of negative edges per positive edge
device: torch.device: the device trainer is running on.
train_taskbool: Whether or not for training.
reverse_edge_types_map: dict: A map for reverse edge type
exclude_training_targets: bool: Whether to exclude training edges during neighbor sampling
edge_mask_for_gnn_embeddingsstr: The mask that indicates the edges used for computing GNN embeddings. By default, the dataloader uses the edges in the training graphs to compute GNN embeddings to avoid information leak for link prediction.
construct_feat_ntypelist of str: The node types that requires to construct node features.
construct_feat_fanoutint: The fanout required to construct node features.
edge_dst_negative_field: str or dict of str: The feature field(s) that store the hard negative edges for each edge type.
num_hard_negs: int or dict of int: The number of hard negatives per positive edge for each edge type

Examples

To train a 2-layer GNN for link prediction on a set of positive edges target_idx on a graph where each nodes takes messages from 15 neighbors on the first layer and 10 neighbors on the second. We use 10 negative edges per positive in this example.

from graphstorm.dataloading import GSgnnEdgeTrainData
from graphstorm.dataloading import GSgnnLinkPredictionDataLoader
from graphstorm.trainer import GSgnnLinkPredictionTrainer

lp_data = GSgnnEdgeTrainData(...)
lp_dataloader = GSgnnLinkPredictionDataLoader(lp_data, target_idx, fanout=[15, 10],
                                            num_negative_edges=10, batch_size=128)
lp_trainer = GSgnnLinkPredictionTrainer(...)
lp_trainer.fit(lp_dataloader, num_epochs=10)

__iter__(): Returns an iterator object

__next__()

Return a mini-batch for link prediction.

A mini-batch of link prediction contains four objects: * the input node IDs of the mini-batch, * the target positive edges for prediction, * the negative edges for prediction, * the subgraph blocks for message passing.

Returns

Tensor or dict of Tensors : the input nodes of a mini-batch. DGLGraph : positive edges. DGLGraph : negative edges. list of DGLGraph : subgraph blocks for message passing.