GSNodeEncoderInputLayer

class graphstorm.model.GSNodeEncoderInputLayer(g, feat_size, embed_size, activation=None, dropout=0.0, use_node_embeddings=False, force_no_embeddings=None, num_ffn_layers_in_input=0, ffn_activation=<function relu>, cache_embed=False, use_wholegraph_sparse_emb=False)

Bases: GSNodeInputLayer

The node encoder input layer for all nodes in a heterogeneous graph.

The input layer adds a linear layer on nodes with node features and the linear layer projects the node features into a specified dimension. It also adds learnable embeddings on nodes that do not have features. Users can add learnable embeddings on the nodes with node features by setting use_node_embeddings to True. In this case, the input layer combines the node features with the learnable embeddings and project them to the specified dimension.

Parameters

g: DistGraph

The input DGL distributed graph.

feat_sizedict of int or dict of FeatureGroupSize

The original feat size of each node type in the format of {str: int}. If a node has multiple feature groups, it is in the format of {str: FeatureGroupSize}

embed_sizeint

The output embedding size.

activationcallable

The activation function applied to the output embeddigns. Default: None.

dropoutfloat

The dropout parameter. Default: 0.

use_node_embeddingsbool

Whether to use learnable embeddings for nodes even when node features are available. Default: False.

force_no_embeddingslist of str

The list node types that are forced to not use learnable embeddings. Default: None.

num_ffn_layers_in_input: int

(Optional) Number of layers of feedforward neural network for each node type in the input layer. Default: 0.

ffn_activationcallable

The activation function for the feedforward neural networks. Default: relu.

cache_embedbool

Whether or not to cache the embeddings. Default: False.

use_wholegraph_sparse_embbool

Whether or not to use WholeGraph to host embeddings for sparse updates. Default: False.

Examples:

from graphstorm import get_node_feat_size
from graphstorm.model import GSgnnNodeModel, GSNodeEncoderInputLayer
from graphstorm.dataloading import GSgnnData

np_data = GSgnnData(...)

model = GSgnnNodeModel(alpha_l2norm=0)
feat_size = get_node_feat_size(np_data.g, "feat")
encoder = GSNodeEncoderInputLayer(g, feat_size,
                                  embed_size=4,
                                  use_node_embeddings=True)
model.set_node_input_encoder(encoder)
forward(input_feats, input_nodes)

Input layer forward computation.

Parameters

input_feats: dict of Tensor

The input features in the format of {ntype: feats}.

input_nodes: dict of Tensor

The input node indexes in the format of {ntype: indexes}.

Returns

embs: dict of Tensor

The projected node embeddings in the format of {ntype: emb}.

require_cache_embed()

Whether to cache the embeddings for inference.

If the input layer encoder includes heavy computations, such as BERT computations, it should return True and the inference engine will cache the embeddings from the input layer encoder.

Returns

bool : True if we need to cache the embeddings for inference.

get_sparse_params()

Get the sparse parameters of this input layer.

This function is normally called by optimizers to update sparse model parameters, i.e., learnable node embeddings.

Returns

list of Tensors: the sparse embeddings, or empty list if no sparse parameters.

property in_dims

Return the input feature size, which is given in class initialization.

property out_dims

Return the number of output dimensions, which is given in class initialization.

property use_wholegraph_sparse_emb

Return whether or not to use WholeGraph to host embeddings for sparse updates, which is given in class initialization.