graphstorm.model.HGTLayer
- class graphstorm.model.HGTLayer(in_dim, out_dim, ntypes, canonical_etypes, num_heads, activation=None, dropout=0.2, norm=True, num_ffn_layers_in_gnn=0, fnn_activation=<function relu>)
Heterogenous graph transformer (HGT) layer from Heterogeneous Graph Transformer.
Given a graph \(G(V, E)\) and input node features \(H^{(l-1)}\) in the \(l-1\) layer, it computes the new node features in the \(l\) layer as follows:
Compute a multi-head attention score for each edge \((s, e, t)\) in the graph:
\[\begin{split}Attention(s, e, t) = \text{Softmax}\left(||_{i\in[1,h]}ATT-head^i(s, e, t)\right) \\ ATT-head^i(s, e, t) = \left(K^i(s)W^{ATT}_{\phi(e)}Q^i(t)^{\top}\right)\cdot \frac{\mu_{(\tau(s),\phi(e),\tau(t)}}{\sqrt{d}} \\ K^i(s) = \text{K-Linear}^i_{\tau(s)}(H^{(l-1)}[s]) \\ Q^i(t) = \text{Q-Linear}^i_{\tau(t)}(H^{(l-1)}[t]) \\\end{split}\]Compute the message to send on each edge \((s, e, t)\):
\[\begin{split}Message(s, e, t) = ||_{i\in[1, h]} MSG-head^i(s, e, t) \\ MSG-head^i(s, e, t) = \text{M-Linear}^i_{\tau(s)}(H^{(l-1)}[s])W^{MSG}_{\phi(e)} \\\end{split}\]Send messages to target nodes \(t\) and aggregate:
\[\tilde{H}^{(l)}[t] = \sum_{\forall s\in \mathcal{N}(t)}\left( Attention(s,e,t) \cdot Message(s,e,t)\right)\]Compute new node features:
\[H^{(l)}[t]=\text{A-Linear}_{\tau(t)}(\sigma(\tilde(H)^{(l)}[t])) + H^{(l-1)}[t]\]Note:
Different from DGL’s HGTConv, this implementation is based on heterogeneous graph. Other hyperparameters’ default values are same as the DGL’s HGTConv setting.
The cross-relation aggregation function of this implementation is mean, which was chosen by authors of the HGT paper in their contribution to DGL.
Examples:
# suppose graph and input_feature are ready from graphstorm.model.hgt_encoder import HGTLayer layer = HGTLayer(hid_dim, out_dim, g.ntypes, g.canonical_etypes, num_heads, activation, dropout, norm) h = layer(g, input_feature)
Parameters
- in_dimint
Input dimension size.
- out_dimint
Output dimension size.
- ntypes: list[str]
List of node types
- canonical_etypes: list[(str, str, str)]
List of canonical edge types
- num_headsint
Number of attention heads
- activationcallable, optional
Activation function. Default: None
- dropoutfloat, optional
Dropout rate. Default: 0.2
- use_norm: boolean
If use layer normalization or not, default is True
- num_ffn_layers_in_gnn: int, optional
Number of layers of ngnn between gnn layers