Heterogeneous Graph (HG) also known as heterogeneous information networks (HIN).
A heterogeneous graph can represent as
- Meta-path need domain knowledge.
- Different types of nodes/edges share features.
- Different types of nodes/edges keep different non-shared weights
- Ignore the dynamic of heterogeneous graph
- Incapable of modeling Web-scale (large) heterogeneous graph
- Node and edge type dependent attention mechanism.
- Not parameterizing each type of edges
- use meta relation triplet \(e = (s, t)\), where
$s$ is source node,$t$ is target node
- Relative temporal encoding (RTE) strategy for dynamic graph
- HGSampling for Web-scale graph data.
- Graph
- \(G = (\mathcal{V}, \mathcal{E}, \mathcal{A}, \mathcal{R})\)
- Node
- \(v ∈ \mathcal{V}\), also $s,t$
- Edge
- \(e ∈ \mathcal{E}\)
- Node Type
- \(τ(v): \mathcal{V} → \mathcal{A}\)
- Edge Type
- \(φ(e): \mathcal{E} → \mathcal{R}\)
- edge, source node, target node
- \(e = (s, t)\)
- meta relation triplet
- \(<τ(s),φ(e),τ(t)>\)
Use the meta-relations fo heterogeneous graph to parameterize weight matrices for heterogeneous mutual attention, message passing, and propagation steps.
Three steps:
- Heterogeneous Mutual Attention
- input embedding of \(s_1,s_2,t\)
- output attention matrix of \(φ(e)\).
- Heterogeneous Message Passing
- output message of \(φ(e)\)
- Target-Specific Aggregation
GAT:
- Attention
- Importance of each source node.
- Message
- Extracts the message by using only the source node.
- Aggregate
- Aggregate the neighborhood message by the attention weight.
Transformer: \(W_q,W_k,W_v\)
HGT:
- \(WATTφ(e)\)
- \(μ<τ(s),φ(e),τ(t)>\)
- Edge dependent: \(W(MSG) τ(e)\)
- Incorporate the meta relations of edges into the message passing process to alleviate the distribution differences of nodes and edges of different types.
- A-Linear\(τ(t)\) to map target node
$t$ to type specific distribution and update the \(l\)-th HGT layers embedding.
- keep a similar number of nodes and edges for each type, and keep the sampled sub-graph dense to minimize the information loss and reduce the sample variance.
- \(Δ T(s, t) = T(s) - T(t)\)
- OAG
- All
- Computer Science (CS)
- Medicine (Med)
- Graph Convolutional Networks (GCN)
- Graph Attention Networks (GAT)
- Relational Graph Convolutional Networks
- Keep a different weight for each relationship (edge).
- \(hi(l+1)=σ\left(∑r ∈ \mathcal{R} ∑j ∈\mathcal{Nir} \frac{1}{ci, r} Wr(l)hj(l)+W0(l) hi(l)\right)\)
- Heterogeneous Graph Neural Networks
- Adopt different BiLSTM for node type and neighbor information
- Heterogeneous Graph Attention Networks (HAN)
- Hierarchical attentions to aggregate neighbor via meta-paths
- Generate heterogeneous graphs
- predict new papers and title
- Pre-train HGT to benefit tasks with scarce labels
—
- Downstream Tasks