REF: http://arxiv.org/abs/1903.07293
propose a novel heterogeneous graph neural network based on the hierarchical attention, including node-level and semantic-level attentions.
Sematic-level attention: 不同meta-path的权重
Semantic-level attention aims to learn the importance of each meta-path and assign proper weights to them.
eg: The Terminator can either connect to The Terminator 2 via Movie-Actor-Movie (both starred by Schwarzenegger) or connect to Birdy via Movie-Year-Movie (both shot in 1984). However, when identifying the genre of the movie The Terminator, MAM usually plays more important role, rather than MYM
Node-level attention:同一meta-path下不同邻居的权重
For each node, node-level attention aims to learn the importance of meta-path based neighbours and assign different attention values to them.
eg: using the meta-path Movie-Director-Moive (the movies are with the same director), The Terminator will connect to Titanic and The Terminator 2 via director James Cameron. To better identify the genre of The Terminator as sci-fi movie, the model should pay more attention to The Terminator 2, rather than Titanic
Projection
Use the node type specific transformation matrix to project the features of different types of nodes into the same feature space. $$ \mathbf{h}i^{\prime}=\mathbf{M}{\phi_i} \cdot \mathbf{h}_i $$ Attention coefficient
Calculate the attention coefficient for neighbour nodes $k \in \mathcal{N}i^{\Phi}$
$$
\alpha{i j}^{\Phi}=\operatorname{softmax}j\left(e{i j}^{\Phi}\right)=\frac{\exp \left(\sigma\left(\mathbf{a}_{\Phi}^{\mathrm{T}} \cdot\left[\mathbf{h}_i^{\prime} | \mathbf{h}j^{\prime}\right]\right)\right)}{\sum{k \in \mathcal{N}i^{\Phi}} \exp \left(\sigma\left(\mathbf{a}{\Phi}^{\mathrm{T}} \cdot\left[\mathbf{h}_i^{\prime} | \mathbf{h}_k^{\prime}\right]\right)\right)}
$$
where
node embedding
The meta-path based embedding of node
在得到不同meta-path的特征后,我们希望得到他们各自的权重$\beta$。
$$
\left(\beta_{\Phi_0}, \beta_{\Phi_1}, \ldots, \beta_{\Phi_P}\right)=\operatorname{att}{s e m}\left(\mathbf{Z}{\Phi_0}, \mathbf{Z}{\Phi_1}, \ldots, \mathbf{Z}{\Phi_P}\right)
$$
To learn the importance of each meta-path, It first use a nonlinear transformation (e.g., one-layer MLP). Then use a semantic-level attention vector
\beta_{\Phi_i}=\frac{\exp \left(w_{\Phi_i}\right)}{\sum_{i=1}^P \exp \left(w_{\Phi_i}\right)}
$$
Fuse these semantic-specific embeddings to obtain the final embedding
where