Skip to content

Latest commit

 

History

History
189 lines (133 loc) · 4.67 KB

hgt.org

File metadata and controls

189 lines (133 loc) · 4.67 KB

Heterogeneous Graph Transformer

1 Introduction

1.1 Introduction

Heterogeneous Graph (HG) also known as heterogeneous information networks (HIN).

A heterogeneous graph can represent as $\mathcal{G} = (\mathcal{V}, ξ)$, where each node $\mathcal{V}$, and each edge $ξ$ has its own type $Γv$ and $Γe$. A heterogeneous graph have two mapping function: $φv:V→\Gammav$ for node to node types, and $φe:ξ\rightarrowΓe$ for edge types.

./p1.png

1.2 Problem

  • Meta-path need domain knowledge.
  • Different types of nodes/edges share features.
  • Different types of nodes/edges keep different non-shared weights
  • Ignore the dynamic of heterogeneous graph
  • Incapable of modeling Web-scale (large) heterogeneous graph

1.3 Heterogeneous graph transformer (HGT)

  • Node and edge type dependent attention mechanism.
    • Not parameterizing each type of edges
    • use meta relation triplet \(e = (s, t)\), where $s$ is source node, $t$ is target node
  • Relative temporal encoding (RTE) strategy for dynamic graph
  • HGSampling for Web-scale graph data.

2 Method

2.1 Symbols

Graph
\(G = (\mathcal{V}, \mathcal{E}, \mathcal{A}, \mathcal{R})\)
Node
\(v ∈ \mathcal{V}\), also $s,t$
Edge
\(e ∈ \mathcal{E}\)
Node Type
\(τ(v): \mathcal{V} → \mathcal{A}\)
Edge Type
\(φ(e): \mathcal{E} → \mathcal{R}\)
edge, source node, target node
\(e = (s, t)\)
meta relation triplet
\(<τ(s),φ(e),τ(t)>\)

2.2 Method

Use the meta-relations fo heterogeneous graph to parameterize weight matrices for heterogeneous mutual attention, message passing, and propagation steps.

Three steps:

  • Heterogeneous Mutual Attention
    • input embedding of \(s_1,s_2,t\)
    • output attention matrix of \(φ(e)\).
  • Heterogeneous Message Passing
    • output message of \(φ(e)\)
  • Target-Specific Aggregation

2.3 Method

./p2.png

2.4 Heterogeneous Mutual Attention

GAT:

./p3-1.png

./p3-2.png

Attention
Importance of each source node.
Message
Extracts the message by using only the source node.
Aggregate
Aggregate the neighborhood message by the attention weight.

2.5 Heterogeneous Mutual Attention

Transformer: \(W_q,W_k,W_v\)

HGT:

./p4.png

  • \(WATTφ(e)\)
  • \(μ<τ(s),φ(e),τ(t)>\)

2.6 Message passing

./p5.png

  • Edge dependent: \(W(MSG) τ(e)\)
  • Incorporate the meta relations of edges into the message passing process to alleviate the distribution differences of nodes and edges of different types.

2.7 Target-Specific Aggregation

./p6.png

./p6-2.png

  • A-Linear\(τ(t)\) to map target node $t$ to type specific distribution and update the \(l\)-th HGT layers embedding.

2.8 HGSampling

./p7.png

  • keep a similar number of nodes and edges for each type, and keep the sampled sub-graph dense to minimize the information loss and reduce the sample variance.

2.9 Relative Temporal Encoding (RTE)

./p8.png

2.10 Relative Temporal Encoding (RTE)

./p8-2.png

  • \(Δ T(s, t) = T(s) - T(t)\)

3 Experiments

3.1 OAG Data

./p9.png

3.2 OAG Data

  • OAG
    • All
    • Computer Science (CS)
    • Medicine (Med)

3.3 Baseline models

  • Graph Convolutional Networks (GCN)
  • Graph Attention Networks (GAT)
  • Relational Graph Convolutional Networks
    • Keep a different weight for each relationship (edge).
    • \(hi(l+1)=σ\left(∑r ∈ \mathcal{R} ∑j ∈\mathcal{Nir} \frac{1}{ci, r} Wr(l)hj(l)+W0(l) hi(l)\right)\)
  • Heterogeneous Graph Neural Networks
    • Adopt different BiLSTM for node type and neighbor information
  • Heterogeneous Graph Attention Networks (HAN)
    • Hierarchical attentions to aggregate neighbor via meta-paths

3.4 Results

./p10.png

4 Futures

4.1 Futures

  • Generate heterogeneous graphs
    • predict new papers and title
  • Pre-train HGT to benefit tasks with scarce labels

  • Downstream Tasks