GSgnnData

class graphstorm.dataloading.GSgnnData(part_config, node_feat_field=None, edge_feat_field=None, lm_feat_ntypes=None, lm_feat_etypes=None)

Bases: object

The GraphStorm data class.

Parameters

part_configstr: The path of the partition configuration JSON file.
node_feat_field: str or dict of list of str: The fields of the node features that will be encoded by GSNodeInputLayer. It’s a dict if different node types have different feature names. Default: None.
edge_feat_fieldstr or dict of list of str: The fields of the edge features. It’s a dict, if different edge types have different feature names. This argument is reserved for future usage when the GSEdgeInputLayer is implemented. Default: None.
lm_feat_ntypeslist of str: The node types that contains text features. Default: None.
lm_feat_etypeslist of tuples: The edge types that contains text features. Default: None.

property g: The distributed graph loaded using information in the given part_config JSON file.

property graph_name: The distributed graph’s name extracted from the given part_config JSON file.

property node_feat_field: The fields of node features given in initialization.

property edge_feat_field: The fields of edge features given in initialization.

has_node_feats(ntype)

Test if the specified node type has features.

Parameters

ntypestr: The node type

Returns

bool : Whether the node type has features.

has_edge_feats(etype)

Test if the specified edge type has features.

Parameters

etype(str, str, str): The canonical edge type.

Returns

bool : Whether the edge type has features.

has_node_lm_feats(ntype)

Test if the specified node type has text features.

Parameters

ntypestr: The node type.

Returns

bool : Whether the node type has text features.

has_edge_lm_feats(etype)

Test if the specified edge type has text features.

Parameters

etype(str, str, str): The edge type.

Returns

bool : Whether the edge type has text features.

get_node_feats(input_nodes, nfeat_fields, device='cpu')

Get the node features of the given input nodes. The feature fields are defined in nfeat_fields.

Changed in version 0.5.0: When nfeat_fields is a dict, its value(s) can be a list of str or a list of FeatureGroup. The return value can be a dict of int or FeatureGroupSize, respectively.

Parameters

input_nodesTensor or dict of Tensors: The input node IDs.
nfeat_fieldsstr or dict of [str …] or dict of [FeatureGroup …]: The node feature fields to be extracted. A string represents the feature name. A dictionary indicates that each node type has different node feature names. When the value of a key (node type) is a list of strings, it indicates that the node type has only one group of features. When the value is a list of FeatureGroup, it indicates that the node type has more than one group of features.
devicePytorch device: The device where the returned node features are stored.

Returns

dict of Tensors : The returned node features.

get_edge_feats(input_edges, efeat_fields, device='cpu')

Get the edge features of the given input edges. The feature fields are defined in efeat_fields.

Parameters

input_edgesTensor or dict of Tensors: The input edge IDs.
efeat_fields: str or dict of [str ..]: The edge feature fields to be extracted.
devicePytorch device: The device where the returned edge features are stored.

Returns

dict of Tensors : The returned edge features.

get_blocks_edge_feats(input_blocks, efeat_fields, device='cpu')

Get the edge features of the given input blocks. The feature fields are: defined in efeat_fields.

New in version 0.4.0: Add get_blocks_edge_feat in 0.4.0 to support edge features in message passing.

Parameters

input_blockslist of DGLblock: The input blocks with edge features to be extracted.
efeat_fields: string or dict of list of strings: The edge feature fields to be extracted.
devicePytorch device: The device where the returned edge features are stored.

Returns

block_edge_input_feats: list of dict of Tensors: The returned edge features for all blocks.

get_unlabeled_node_set(train_idxs, mask='train_mask')

Get node indexes not having the given mask in the training set.

Parameters

train_idxs: dict of Tensor: The training set.
mask: str or list of str: The node feature fields storing the training mask. Default: “train_mask”.

Returns

dict of Tensors : The returned node indexes

get_node_train_set(ntypes, mask='train_mask')

Get the training set for the given node types under the given mask.

Parameters

ntypes: str or list of str: Node types to get the training set.
mask: str or list of str: The node feature fields storing the training mask. Default: “train_mask”.

Returns

dict of Tensors : The returned training node indexes.

get_node_val_set(ntypes, mask='val_mask')

Get the validation set for the given node types under the given mask.

Parameters

ntypes: str or list of str: Node types to get the validation set.
mask: str or list of str: The node feature fields storing the validation mask. Default: “val_mask”.

Returns

dict of Tensors : The returned validation node indexes.

get_node_test_set(ntypes, mask='test_mask')

Get the test set for the given node types under the given mask.

Parameters

ntypes: str or list of str: Node types to get the test set.
mask: str or list of str: The node feature fields storing the test mask. Default: “test_mask”.

Returns

dict of Tensors : The returned test node indexes.

get_node_infer_set(ntypes, mask='test_mask')

Get inference node set for the given node types under the given mask.

If the mask exists in g.nodes[ntype].data, include only nodes in the mask during inference. If such a mask does not exist, run inference on the entire node set.

Parameters

ntypes: str or list of str: Node types to get the inference set.
mask: str or list of str: The node feature fields storing the inference mask. Default: “test_mask”.

Returns

dict[str, Tensor]:: Mapping from node type to indices of nodes to run inference on.

get_edge_train_set(etypes=None, mask='train_mask', reverse_edge_types_map=None)

Get the training set for the given edge types under the given mask.

Parameters

etypes: list of str: List of edge types to get the training set. If set to None, all the edge types are included. Default: None.
mask: str or list of str: The edge feature fields storing the training mask. Default: “train_mask”.
reverse_edge_types_map: dict of tupeles: A map for reverse edge types in the format of {(edge type):(reversed edge type)}. Default: None.

Returns

dict of Tensors : The returned training edge indexes.

get_edge_val_set(etypes=None, mask='val_mask', reverse_edge_types_map=None)

Get the validation set for the given edge types under the given mask.

Parameters

etypes: list of str: List of edge types to get the val set. If set to None, all the edge types are included.
mask: str or list of str: The edge feature field storing the val mask. Default: “val_mask”.
reverse_edge_types_map: dict: A map for reverse edge types in the format of {(edge type):(reversed edge type)}. Default: None.

Returns

dict of Tensors : The returned validation edge indexes.

get_edge_test_set(etypes=None, mask='test_mask', reverse_edge_types_map=None)

Get the test set for the given edge types under the given mask.

Parameters

etypes: list of str: List of edge types to get the test set. If set to None, all the edge types are included.
mask: str or list of str: The edge feature field storing the test mask. Default: “test_mask”.
reverse_edge_types_map: dict: A map for reverse edge types in the format of {(edge type):(reversed edge type)}. Default: None.

Returns

dict of Tensors : The returned test edge indexes.

get_edge_infer_set(etypes=None, mask='test_mask', reverse_edge_types_map=None)

Get the inference set for the given edge types under the given mask.

If the mask exists in g.edges[etype].data, the inference set is collected based on the mask. If not exist, the entire edge set are treated as the inference set.

Parameters

etypes: list of str: List of edge types to get the inference set. If set to None, all the edge types are included. Default: None.
mask: str or list of str: The edge feature field storing the inference mask. Default: “test_mask”.
reverse_edge_types_map: dict: A map for reverse edge types in the format of {(edge type):(reversed edge type)}. Default: None.

Returns

dict of Tensors : The returned inference edge indexes.