GSgnnLinkPredictionDataLoader
- class graphstorm.dataloading.GSgnnLinkPredictionDataLoader(dataset, target_idx, fanout, batch_size, num_negative_edges, device='cpu', train_task=True, reverse_edge_types_map=None, exclude_training_targets=False, edge_mask_for_gnn_embeddings='train_mask', construct_feat_ntype=None, construct_feat_fanout=5, edge_dst_negative_field=None, num_hard_negs=None)
Bases:
GSgnnLinkPredictionDataLoaderBaseLink prediction minibatch dataloader
GSgnnLinkPredictionDataLoader samples GraphStorm edge dataset into an iterable over mini-batches of samples. In each batch, pos_graph and neg_graph are sampled subgraph for positive and negative edges, which will be used by GraphStorm Trainers and Inferrers. Given a positive edge, a negative edge is composed of the source node and a random negative destination nodes according to a uniform distribution.
Argument
- dataset: GSgnnEdgeData
The GraphStorm edge dataset
- target_idxdict of Tensors
The target edges for prediction
- fanout: list of int or dict of list
Neighbor sample fanout. If it’s a dict, it indicates the fanout for each edge type.
- batch_size: int
Batch size
- num_negative_edges: int
The number of negative edges per positive edge
- device: torch.device
the device trainer is running on.
- train_taskbool
Whether or not for training.
- reverse_edge_types_map: dict
A map for reverse edge type
- exclude_training_targets: bool
Whether to exclude training edges during neighbor sampling
- edge_mask_for_gnn_embeddingsstr
The mask that indicates the edges used for computing GNN embeddings. By default, the dataloader uses the edges in the training graphs to compute GNN embeddings to avoid information leak for link prediction.
- construct_feat_ntypelist of str
The node types that requires to construct node features.
- construct_feat_fanoutint
The fanout required to construct node features.
- edge_dst_negative_field: str or dict of str
The feature field(s) that store the hard negative edges for each edge type.
- num_hard_negs: int or dict of int
The number of hard negatives per positive edge for each edge type
Examples
To train a 2-layer GNN for link prediction on a set of positive edges
target_idxon a graph where each nodes takes messages from 15 neighbors on the first layer and 10 neighbors on the second. We use 10 negative edges per positive in this example.from graphstorm.dataloading import GSgnnEdgeTrainData from graphstorm.dataloading import GSgnnLinkPredictionDataLoader from graphstorm.trainer import GSgnnLinkPredictionTrainer lp_data = GSgnnEdgeTrainData(...) lp_dataloader = GSgnnLinkPredictionDataLoader(lp_data, target_idx, fanout=[15, 10], num_negative_edges=10, batch_size=128) lp_trainer = GSgnnLinkPredictionTrainer(...) lp_trainer.fit(lp_dataloader, num_epochs=10)
- __iter__()
Returns an iterator object
- __next__()
Return a mini-batch for link prediction.
A mini-batch of link prediction contains four objects: * the input node IDs of the mini-batch, * the target positive edges for prediction, * the negative edges for prediction, * the subgraph blocks for message passing.
Returns
Tensor or dict of Tensors : the input nodes of a mini-batch. DGLGraph : positive edges. DGLGraph : negative edges. list of DGLGraph : subgraph blocks for message passing.