GSgnnData

class graphstorm.dataloading.GSgnnData(part_config, node_feat_field=None, edge_feat_field=None, lm_feat_ntypes=None, lm_feat_etypes=None)

Bases: object

The GraphStorm data class.

Parameters

part_configstr

The path of the partition configuration JSON file.

node_feat_field: str or dict of list of str

The fields of the node features that will be encoded by GSNodeInputLayer. It’s a dict if different node types have different feature names. Default: None.

edge_feat_fieldstr or dict of list of str

The fields of the edge features. It’s a dict, if different edge types have different feature names. This argument is reserved for future usage when the GSEdgeInputLayer is implemented. Default: None.

lm_feat_ntypeslist of str

The node types that contains text features. Default: None.

lm_feat_etypeslist of tuples

The edge types that contains text features. Default: None.

property g

The distributed graph loaded using information in the given part_config JSON file.

property graph_name

The distributed graph’s name extracted from the given part_config JSON file.

property node_feat_field

The fields of node features given in initialization.

property edge_feat_field

The fields of edge features given in initialization.

has_node_feats(ntype)

Test if the specified node type has features.

Parameters

ntypestr

The node type

Returns

bool : Whether the node type has features.

has_edge_feats(etype)

Test if the specified edge type has features.

Parameters

etype(str, str, str)

The canonical edge type.

Returns

bool : Whether the edge type has features.

has_node_lm_feats(ntype)

Test if the specified node type has text features.

Parameters

ntypestr

The node type.

Returns

bool : Whether the node type has text features.

has_edge_lm_feats(etype)

Test if the specified edge type has text features.

Parameters

etype(str, str, str)

The edge type.

Returns

bool : Whether the edge type has text features.

get_node_feats(input_nodes, nfeat_fields, device='cpu')

Get the node features of the given input nodes. The feature fields are defined in nfeat_fields.

Changed in version 0.5.0: When nfeat_fields is a dict, its value(s) can be a list of str or a list of FeatureGroup. The return value can be a dict of int or FeatureGroupSize, respectively.

Parameters

input_nodesTensor or dict of Tensors

The input node IDs.

nfeat_fieldsstr or dict of [str …] or dict of [FeatureGroup …]

The node feature fields to be extracted. A string represents the feature name. A dictionary indicates that each node type has different node feature names. When the value of a key (node type) is a list of strings, it indicates that the node type has only one group of features. When the value is a list of FeatureGroup, it indicates that the node type has more than one group of features.

devicePytorch device

The device where the returned node features are stored.

Returns

dict of Tensors : The returned node features.

get_edge_feats(input_edges, efeat_fields, device='cpu')

Get the edge features of the given input edges. The feature fields are defined in efeat_fields.

Parameters

input_edgesTensor or dict of Tensors

The input edge IDs.

efeat_fields: str or dict of [str ..]

The edge feature fields to be extracted.

devicePytorch device

The device where the returned edge features are stored.

Returns

dict of Tensors : The returned edge features.

get_blocks_edge_feats(input_blocks, efeat_fields, device='cpu')
Get the edge features of the given input blocks. The feature fields are

defined in efeat_fields.

New in version 0.4.0: Add get_blocks_edge_feat in 0.4.0 to support edge features in message passing.

Parameters

input_blockslist of DGLblock

The input blocks with edge features to be extracted.

efeat_fields: string or dict of list of strings

The edge feature fields to be extracted.

devicePytorch device

The device where the returned edge features are stored.

Returns

block_edge_input_feats: list of dict of Tensors

The returned edge features for all blocks.

get_unlabeled_node_set(train_idxs, mask='train_mask')

Get node indexes not having the given mask in the training set.

Parameters

train_idxs: dict of Tensor

The training set.

mask: str or list of str

The node feature fields storing the training mask. Default: “train_mask”.

Returns

dict of Tensors : The returned node indexes

get_node_train_set(ntypes, mask='train_mask')

Get the training set for the given node types under the given mask.

Parameters

ntypes: str or list of str

Node types to get the training set.

mask: str or list of str

The node feature fields storing the training mask. Default: “train_mask”.

Returns

dict of Tensors : The returned training node indexes.

get_node_val_set(ntypes, mask='val_mask')

Get the validation set for the given node types under the given mask.

Parameters

ntypes: str or list of str

Node types to get the validation set.

mask: str or list of str

The node feature fields storing the validation mask. Default: “val_mask”.

Returns

dict of Tensors : The returned validation node indexes.

get_node_test_set(ntypes, mask='test_mask')

Get the test set for the given node types under the given mask.

Parameters

ntypes: str or list of str

Node types to get the test set.

mask: str or list of str

The node feature fields storing the test mask. Default: “test_mask”.

Returns

dict of Tensors : The returned test node indexes.

get_node_infer_set(ntypes, mask='test_mask')

Get inference node set for the given node types under the given mask.

If the mask exists in g.nodes[ntype].data, include only nodes in the mask during inference. If such a mask does not exist, run inference on the entire node set.

Parameters

ntypes: str or list of str

Node types to get the inference set.

mask: str or list of str

The node feature fields storing the inference mask. Default: “test_mask”.

Returns

dict[str, Tensor]:

Mapping from node type to indices of nodes to run inference on.

get_edge_train_set(etypes=None, mask='train_mask', reverse_edge_types_map=None)

Get the training set for the given edge types under the given mask.

Parameters

etypes: list of str

List of edge types to get the training set. If set to None, all the edge types are included. Default: None.

mask: str or list of str

The edge feature fields storing the training mask. Default: “train_mask”.

reverse_edge_types_map: dict of tupeles

A map for reverse edge types in the format of {(edge type):(reversed edge type)}. Default: None.

Returns

dict of Tensors : The returned training edge indexes.

get_edge_val_set(etypes=None, mask='val_mask', reverse_edge_types_map=None)

Get the validation set for the given edge types under the given mask.

Parameters

etypes: list of str

List of edge types to get the val set. If set to None, all the edge types are included.

mask: str or list of str

The edge feature field storing the val mask. Default: “val_mask”.

reverse_edge_types_map: dict

A map for reverse edge types in the format of {(edge type):(reversed edge type)}. Default: None.

Returns

dict of Tensors : The returned validation edge indexes.

get_edge_test_set(etypes=None, mask='test_mask', reverse_edge_types_map=None)

Get the test set for the given edge types under the given mask.

Parameters

etypes: list of str

List of edge types to get the test set. If set to None, all the edge types are included.

mask: str or list of str

The edge feature field storing the test mask. Default: “test_mask”.

reverse_edge_types_map: dict

A map for reverse edge types in the format of {(edge type):(reversed edge type)}. Default: None.

Returns

dict of Tensors : The returned test edge indexes.

get_edge_infer_set(etypes=None, mask='test_mask', reverse_edge_types_map=None)

Get the inference set for the given edge types under the given mask.

If the mask exists in g.edges[etype].data, the inference set is collected based on the mask. If not exist, the entire edge set are treated as the inference set.

Parameters

etypes: list of str

List of edge types to get the inference set. If set to None, all the edge types are included. Default: None.

mask: str or list of str

The edge feature field storing the inference mask. Default: “test_mask”.

reverse_edge_types_map: dict

A map for reverse edge types in the format of {(edge type):(reversed edge type)}. Default: None.

Returns

dict of Tensors : The returned inference edge indexes.