graphstorm.model.HGTLayer

class graphstorm.model.HGTLayer(in_dim, out_dim, ntypes, canonical_etypes, num_heads, activation=None, dropout=0.2, norm=True, num_ffn_layers_in_gnn=0, fnn_activation=<function relu>)

Heterogenous graph transformer (HGT) layer from Heterogeneous Graph Transformer.

Given a graph \(G(V, E)\) and input node features \(H^{(l-1)}\) in the \(l-1\) layer, it computes the new node features in the \(l\) layer as follows:

Compute a multi-head attention score for each edge \((s, e, t)\) in the graph:

\[\begin{split}Attention(s, e, t) = \text{Softmax}\left(||_{i\in[1,h]}ATT-head^i(s, e, t)\right) \\ ATT-head^i(s, e, t) = \left(K^i(s)W^{ATT}_{\phi(e)}Q^i(t)^{\top}\right)\cdot \frac{\mu_{(\tau(s),\phi(e),\tau(t)}}{\sqrt{d}} \\ K^i(s) = \text{K-Linear}^i_{\tau(s)}(H^{(l-1)}[s]) \\ Q^i(t) = \text{Q-Linear}^i_{\tau(t)}(H^{(l-1)}[t]) \\\end{split}\]

Compute the message to send on each edge \((s, e, t)\):

\[\begin{split}Message(s, e, t) = ||_{i\in[1, h]} MSG-head^i(s, e, t) \\ MSG-head^i(s, e, t) = \text{M-Linear}^i_{\tau(s)}(H^{(l-1)}[s])W^{MSG}_{\phi(e)} \\\end{split}\]

Send messages to target nodes \(t\) and aggregate:

\[\tilde{H}^{(l)}[t] = \sum_{\forall s\in \mathcal{N}(t)}\left( Attention(s,e,t) \cdot Message(s,e,t)\right)\]

Compute new node features:

\[H^{(l)}[t]=\text{A-Linear}_{\tau(t)}(\sigma(\tilde(H)^{(l)}[t])) + H^{(l-1)}[t]\]

Note:

Different from DGL’s HGTConv, this implementation is based on heterogeneous graph. Other hyperparameters’ default values are same as the DGL’s HGTConv setting.
The cross-relation aggregation function of this implementation is mean, which was chosen by authors of the HGT paper in their contribution to DGL.

Examples:

# suppose graph and input_feature are ready
from graphstorm.model.hgt_encoder import HGTLayer

layer = HGTLayer(hid_dim, out_dim, g.ntypes, g.canonical_etypes,
                 num_heads, activation, dropout, norm)
h = layer(g, input_feature)

Parameters

in_dimint: Input dimension size.
out_dimint: Output dimension size.
ntypes: list[str]: List of node types
canonical_etypes: list[(str, str, str)]: List of canonical edge types
num_headsint: Number of attention heads
activationcallable, optional: Activation function. Default: None
dropoutfloat, optional: Dropout rate. Default: 0.2
use_norm: boolean: If use layer normalization or not, default is True
num_ffn_layers_in_gnn: int, optional: Number of layers of ngnn between gnn layers