GSgnnLinkPredictionDataLoader

class graphstorm.dataloading.GSgnnLinkPredictionDataLoader(dataset, target_idx, fanout, batch_size, num_negative_edges, device='cpu', train_task=True, reverse_edge_types_map=None, exclude_training_targets=False, edge_mask_for_gnn_embeddings='train_mask', construct_feat_ntype=None, construct_feat_fanout=5, edge_dst_negative_field=None, num_hard_negs=None)

Bases: GSgnnLinkPredictionDataLoaderBase

Link prediction minibatch dataloader

GSgnnLinkPredictionDataLoader samples GraphStorm edge dataset into an iterable over mini-batches of samples. In each batch, pos_graph and neg_graph are sampled subgraph for positive and negative edges, which will be used by GraphStorm Trainers and Inferrers. Given a positive edge, a negative edge is composed of the source node and a random negative destination nodes according to a uniform distribution.

Argument

dataset: GSgnnEdgeData

The GraphStorm edge dataset

target_idxdict of Tensors

The target edges for prediction

fanout: list of int or dict of list

Neighbor sample fanout. If it’s a dict, it indicates the fanout for each edge type.

batch_size: int

Batch size

num_negative_edges: int

The number of negative edges per positive edge

device: torch.device

the device trainer is running on.

train_taskbool

Whether or not for training.

reverse_edge_types_map: dict

A map for reverse edge type

exclude_training_targets: bool

Whether to exclude training edges during neighbor sampling

edge_mask_for_gnn_embeddingsstr

The mask that indicates the edges used for computing GNN embeddings. By default, the dataloader uses the edges in the training graphs to compute GNN embeddings to avoid information leak for link prediction.

construct_feat_ntypelist of str

The node types that requires to construct node features.

construct_feat_fanoutint

The fanout required to construct node features.

edge_dst_negative_field: str or dict of str

The feature field(s) that store the hard negative edges for each edge type.

num_hard_negs: int or dict of int

The number of hard negatives per positive edge for each edge type

Examples

To train a 2-layer GNN for link prediction on a set of positive edges target_idx on a graph where each nodes takes messages from 15 neighbors on the first layer and 10 neighbors on the second. We use 10 negative edges per positive in this example.

from graphstorm.dataloading import GSgnnEdgeTrainData
from graphstorm.dataloading import GSgnnLinkPredictionDataLoader
from graphstorm.trainer import GSgnnLinkPredictionTrainer

lp_data = GSgnnEdgeTrainData(...)
lp_dataloader = GSgnnLinkPredictionDataLoader(lp_data, target_idx, fanout=[15, 10],
                                            num_negative_edges=10, batch_size=128)
lp_trainer = GSgnnLinkPredictionTrainer(...)
lp_trainer.fit(lp_dataloader, num_epochs=10)
__iter__()

Returns an iterator object

__next__()

Return a mini-batch for link prediction.

A mini-batch of link prediction contains four objects: * the input node IDs of the mini-batch, * the target positive edges for prediction, * the negative edges for prediction, * the subgraph blocks for message passing.

Returns

Tensor or dict of Tensors : the input nodes of a mini-batch. DGLGraph : positive edges. DGLGraph : negative edges. list of DGLGraph : subgraph blocks for message passing.