Heterogeneous Graph Transformer

1 Introduction

1.1 Introduction

Heterogeneous Graph (HG) also known as heterogeneous information networks (HIN).

A heterogeneous graph can represent as $\mathcal{G} = (\mathcal{V}, ξ)$, where each node $\mathcal{V}$, and each edge $ξ$ has its own type $Γ_v$ and $Γ_e$. A heterogeneous graph have two mapping function: $φ_v:V→\Gamma_v$ for node to node types, and $φ_e:ξ\rightarrowΓ_e$ for edge types.

1.2 Problem

Meta-path need domain knowledge.
Different types of nodes/edges share features.
Different types of nodes/edges keep different non-shared weights
Ignore the dynamic of heterogeneous graph
Incapable of modeling Web-scale (large) heterogeneous graph

1.3 Heterogeneous graph transformer (HGT)

Node and edge type dependent attention mechanism.
- Not parameterizing each type of edges
- use meta relation triplet $e = (s, t)$, where $s$ is source node, $t$ is target node
Relative temporal encoding (RTE) strategy for dynamic graph
HGSampling for Web-scale graph data.

2 Method

2.1 Symbols

Graph: $G = (\mathcal{V}, \mathcal{E}, \mathcal{A}, \mathcal{R})$
Node: $v ∈ \mathcal{V}$, also $s,t$
Edge: $e ∈ \mathcal{E}$
Node Type: $τ(v): \mathcal{V} → \mathcal{A}$
Edge Type: $φ(e): \mathcal{E} → \mathcal{R}$
edge, source node, target node: $e = (s, t)$
meta relation triplet: $<τ(s),φ(e),τ(t)>$

2.2 Method

Use the meta-relations fo heterogeneous graph to parameterize weight matrices for heterogeneous mutual attention, message passing, and propagation steps.

Three steps:

Heterogeneous Mutual Attention
- input embedding of $s_1,s_2,t$
- output attention matrix of $φ(e)$.
Heterogeneous Message Passing
- output message of $φ(e)$
Target-Specific Aggregation

2.3 Method

2.4 Heterogeneous Mutual Attention

GAT:

Attention: Importance of each source node.
Message: Extracts the message by using only the source node.
Aggregate: Aggregate the neighborhood message by the attention weight.

2.5 Heterogeneous Mutual Attention

Transformer: $W_q,W_k,W_v$

HGT:

$W^ATT_φ(e)$
$μ_{<τ(s),φ(e),τ(t)>}$

2.6 Message passing

Edge dependent: $W^(MSG) _τ(e)$
Incorporate the meta relations of edges into the message passing process to alleviate the distribution differences of nodes and edges of different types.

2.7 Target-Specific Aggregation

A-Linear$_τ(t)$ to map target node $t$ to type specific distribution and update the $l$-th HGT layers embedding.

2.8 HGSampling

keep a similar number of nodes and edges for each type, and keep the sampled sub-graph dense to minimize the information loss and reduce the sample variance.

2.9 Relative Temporal Encoding (RTE)

2.10 Relative Temporal Encoding (RTE)

$Δ T(s, t) = T(s) - T(t)$

3 Experiments

3.1 OAG Data

3.2 OAG Data

OAG
- All
- Computer Science (CS)
- Medicine (Med)

3.3 Baseline models

Graph Convolutional Networks (GCN)
Graph Attention Networks (GAT)
Relational Graph Convolutional Networks
- Keep a different weight for each relationship (edge).
- $h_i^(l+1)=σ\left(∑_{r ∈ \mathcal{R}} ∑_{j ∈\mathcal{N}_i^r} \frac{1}{c_{i, r}} W_r^(l)h_j^(l)+W₀^(l) h_i^(l)\right)$
Heterogeneous Graph Neural Networks
- Adopt different BiLSTM for node type and neighbor information
Heterogeneous Graph Attention Networks (HAN)
- Hierarchical attentions to aggregate neighbor via meta-paths

3.4 Results

4 Futures

4.1 Futures

Generate heterogeneous graphs
- predict new papers and title
Pre-train HGT to benefit tasks with scarce labels

—

Downstream Tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hgt.org

hgt.org

Heterogeneous Graph Transformer

1 Introduction

1.1 Introduction

1.2 Problem

1.3 Heterogeneous graph transformer (HGT)

2 Method

2.1 Symbols

2.2 Method

2.3 Method

2.4 Heterogeneous Mutual Attention

2.5 Heterogeneous Mutual Attention

2.6 Message passing

2.7 Target-Specific Aggregation

2.8 HGSampling

2.9 Relative Temporal Encoding (RTE)

2.10 Relative Temporal Encoding (RTE)

3 Experiments

3.1 OAG Data

3.2 OAG Data

3.3 Baseline models

3.4 Results

4 Futures

4.1 Futures

Files

hgt.org

Latest commit

History

hgt.org

File metadata and controls

Heterogeneous Graph Transformer

1 Introduction

1.1 Introduction

1.2 Problem

1.3 Heterogeneous graph transformer (HGT)

2 Method

2.1 Symbols

2.2 Method

2.3 Method

2.4 Heterogeneous Mutual Attention

2.5 Heterogeneous Mutual Attention

2.6 Message passing

2.7 Target-Specific Aggregation

2.8 HGSampling

2.9 Relative Temporal Encoding (RTE)

2.10 Relative Temporal Encoding (RTE)

3 Experiments

3.1 OAG Data

3.2 OAG Data

3.3 Baseline models

3.4 Results

4 Futures

4.1 Futures