GSNodeEncoderInputLayer

class graphstorm.model.GSNodeEncoderInputLayer(g, feat_size, embed_size, activation=None, dropout=0.0, use_node_embeddings=False, force_no_embeddings=None, num_ffn_layers_in_input=0, ffn_activation=<function relu>, cache_embed=False, use_wholegraph_sparse_emb=False)

Bases: GSNodeInputLayer

The node encoder input layer for all nodes in a heterogeneous graph.

The input layer adds a linear layer on nodes with node features and the linear layer projects the node features into a specified dimension. It also adds learnable embeddings on nodes that do not have features. Users can add learnable embeddings on the nodes with node features by setting use_node_embeddings to True. In this case, the input layer combines the node features with the learnable embeddings and project them to the specified dimension.

Parameters

g: DistGraph: The input DGL distributed graph.
feat_sizedict of int or dict of FeatureGroupSize: The original feat size of each node type in the format of {str: int}. If a node has multiple feature groups, it is in the format of {str: FeatureGroupSize}
embed_sizeint: The output embedding size.
activationcallable: The activation function applied to the output embeddigns. Default: None.
dropoutfloat: The dropout parameter. Default: 0.
use_node_embeddingsbool: Whether to use learnable embeddings for nodes even when node features are available. Default: False.
force_no_embeddingslist of str: The list node types that are forced to not use learnable embeddings. Default: None.
num_ffn_layers_in_input: int: (Optional) Number of layers of feedforward neural network for each node type in the input layer. Default: 0.
ffn_activationcallable: The activation function for the feedforward neural networks. Default: relu.
cache_embedbool: Whether or not to cache the embeddings. Default: False.
use_wholegraph_sparse_embbool: Whether or not to use WholeGraph to host embeddings for sparse updates. Default: False.

Examples:

from graphstorm import get_node_feat_size
from graphstorm.model import GSgnnNodeModel, GSNodeEncoderInputLayer
from graphstorm.dataloading import GSgnnData

np_data = GSgnnData(...)

model = GSgnnNodeModel(alpha_l2norm=0)
feat_size = get_node_feat_size(np_data.g, "feat")
encoder = GSNodeEncoderInputLayer(g, feat_size,
                                  embed_size=4,
                                  use_node_embeddings=True)
model.set_node_input_encoder(encoder)

forward(input_feats, input_nodes)

Input layer forward computation.

Parameters

input_feats: dict of Tensor: The input features in the format of {ntype: feats}.
input_nodes: dict of Tensor: The input node indexes in the format of {ntype: indexes}.

Returns

embs: dict of Tensor: The projected node embeddings in the format of {ntype: emb}.

require_cache_embed()

Whether to cache the embeddings for inference.

If the input layer encoder includes heavy computations, such as BERT computations, it should return True and the inference engine will cache the embeddings from the input layer encoder.

Returns

bool : True if we need to cache the embeddings for inference.

get_sparse_params()

Get the sparse parameters of this input layer.

This function is normally called by optimizers to update sparse model parameters, i.e., learnable node embeddings.

Returns

list of Tensors: the sparse embeddings, or empty list if no sparse parameters.

property in_dims: Return the input feature size, which is given in class initialization.

property out_dims: Return the number of output dimensions, which is given in class initialization.

property use_wholegraph_sparse_emb: Return whether or not to use WholeGraph to host embeddings for sparse updates, which is given in class initialization.