HGTLayer
- class graphstorm.model.HGTLayer(in_dim, out_dim, ntypes, canonical_etypes, num_heads, activation=None, dropout=0.2, norm='layer', num_ffn_layers_in_gnn=0, fnn_activation=<function relu>)
Bases:
ModuleHeterogenous graph transformer (HGT) layer from Heterogeneous Graph Transformer.
Given a graph \(G(V, E)\) and input node features \(H^{(l-1)}\) in the \(l-1\) layer, it computes the new node features in the \(l\) layer as follows:
Compute a multi-head attention score for each edge \((s, e, t)\) in the graph:
\[\begin{split}Attention(s, e, t) = \text{Softmax}\left(||_{i\in[1,h]}ATT-head^i(s, e, t)\right) \\ ATT-head^i(s, e, t) = \left(K^i(s)W^{ATT}_{\phi(e)}Q^i(t)^{\top}\right)\cdot \frac{\mu_{(\tau(s),\phi(e),\tau(t)}}{\sqrt{d}} \\ K^i(s) = \text{K-Linear}^i_{\tau(s)}(H^{(l-1)}[s]) \\ Q^i(t) = \text{Q-Linear}^i_{\tau(t)}(H^{(l-1)}[t]) \\\end{split}\]Compute the message to send on each edge \((s, e, t)\):
\[\begin{split}Message(s, e, t) = ||_{i\in[1, h]} MSG-head^i(s, e, t) \\ MSG-head^i(s, e, t) = \text{M-Linear}^i_{\tau(s)}(H^{(l-1)}[s])W^{MSG}_{\phi(e)} \\\end{split}\]Send messages to target nodes \(t\) and aggregate:
\[\tilde{H}^{(l)}[t] = \sum_{\forall s\in \mathcal{N}(t)}\left( Attention(s,e,t) \cdot Message(s,e,t)\right)\]Compute new node features:
\[H^{(l)}[t]=\text{A-Linear}_{\tau(t)}(\sigma(\tilde{H}^{(l)}[t])) + H^{(l-1)}[t]\]Note:
Different from DGL’s
HGTConv, this implementation is based on heterogeneous graphs. Other hyperparameters’ default values are same as the DGL’sHGTConvsetting.The cross-relation aggregation function of this implementation is
mean, which was chosen by authors of the HGT paper in their contribution to DGL.
Examples:
# suppose graph and input_feature are ready from graphstorm.model import HGTLayer layer = HGTLayer(hid_dim, out_dim, g.ntypes, g.canonical_etypes, num_heads, activation, dropout, norm) h = layer(g, input_feature)
Parameters
- in_dim: int
Input dimension size.
- out_dim: int
Output dimension size.
- ntypes: list of str
List of node types in the format of [ntype1, ntype2, …].
- canonical_etypes: list of tuple
List of canonical edge types in the format of [(‘src_ntyp1’, ‘etype1’, ‘dst_ntype1`), …].
- num_heads: int
Number of attention heads.
- activation: callable
Activation function. Default: None.
- dropout: float
Dropout rate. Default: 0.2.
- norm: str
Normalization methods. Options:
batch,layer, andNone. Default:layer.- num_ffn_layers_in_gnn: int
Number of fnn layers between gnn layers. Default: 0.
- ffn_actication: torch.nn.functional
Activation for ffn. Default: relu.