Table of Contents
1. What Is a Graph Neural Network?
A graph neural network (GNN) is a deep learning model that operates directly on graphs: data made of nodes (entities) joined by edges (relationships). Where a convolutional network assumes a regular grid of pixels and a recurrent network assumes an ordered sequence, a GNN makes neither assumption. It works on irregular structures in which each node may have a different number of neighbours and there is no natural ordering to exploit.
This matters because so much real-world data is naturally a graph: social networks, molecules, knowledge graphs, road systems, payment networks, and the web itself. A GNN learns a vector representation, called an embedding, for each node, edge, or entire graph. That embedding blends two kinds of information at once: a node's own features and the structure of the neighbourhood around it. The embeddings then feed an ordinary predictor for the task at hand.
The concept dates back to Gori et al. (2005) and Scarselli et al. (2009), who introduced "the graph neural network model." It stayed a niche idea until the deep-learning wave: between 2016 and 2019 a cluster of architectures, namely GCN, GraphSAGE, GAT, and GIN, made GNNs both scalable and accurate, and they are now a standard part of the machine-learning toolkit. For the linear-algebra roots of one important branch, see our companion article on spectral graph theory in machine learning.
2. The Core Idea: Message Passing
Almost every modern GNN follows a single unifying recipe known as message passing, formalised by Gilmer et al. (2017). The intuition is refreshingly simple. Every node begins with a feature vector. Then, in each layer, a node performs three steps: it collects messages from its neighbours, aggregates them into one summary, and updates its own vector using that summary together with its previous value.
The aggregation step has one hard requirement: it must be permutation invariant. Because a graph has no inherent order, the result cannot depend on the sequence in which the neighbours happen to be listed. The usual choices are sum, mean, or max. The update step is typically a small neural network, a learned linear transformation followed by a non-linearity. Critically, every node in a layer shares the same weights, exactly as a CNN reuses one filter across an image. This weight sharing is what lets a single trained GNN generalise across all the nodes of a graph, and even to graphs it has never seen.
This neighbour-then-update rhythm is the entire engine. A useful mental picture is gossip spreading through a crowd: in each round everyone refines what they know from the people standing next to them, and after a few rounds news from far away has reached the whole room. A GNN simply makes that update step learnable, so the network discovers for itself which signals from the neighbourhood actually matter for the task at hand.
3. Inside a GNN Layer (and How Deep to Go)
Formally, a single layer computes a new embedding for every node v as:
h_v⁽ᵏ⁾ = UPDATE( h_v⁽ᵏ⁻¹⁾, AGGREGATE{ h_u⁽ᵏ⁻¹⁾ : u ∈ N(v) } )
Here N(v) is the set of neighbours of v, and h_v⁽ᵏ⁾ is the embedding of v after k layers. The number of layers controls the receptive field. After one layer, a node has heard only from its immediate neighbours. After two layers it has indirectly heard from the neighbours of its neighbours, because those neighbours had themselves updated from their neighbours in the previous round. In general, k layers let information travel k hops across the graph.
This might suggest that deeper is always better, but GNNs are unusual here. Stacking too many layers causes oversmoothing: every node's embedding drifts toward the same value, and the network loses its ability to tell nodes apart. In practice, two to four layers are common, and choosing the depth is a genuine design decision rather than a simple matter of adding capacity.
4. Four Architectures That Shaped the Field
Four models defined the modern era of GNNs. They share the message-passing skeleton and differ mainly in how they aggregate.
- GCN (Kipf & Welling, 2017) aggregates with a degree-normalised mean of neighbours. It is simple, fast, and a remarkably strong baseline, and it grew directly out of spectral graph theory.
- GraphSAGE (Hamilton, Ying & Leskovec, 2017) samples a fixed number of neighbours and supports mean, max, or LSTM aggregators. Its key contribution is inductive learning: it generalises to nodes and graphs never seen during training, which is essential at industrial scale.
- GAT (Veličković et al., 2018) introduces attention, learning a weight for each neighbour so the model can focus on the most relevant ones instead of treating every neighbour equally.
- GIN (Xu et al., 2019) was designed to be maximally expressive. The authors showed that a GNN's power to distinguish graphs is bounded by the classic Weisfeiler-Lehman test, and that GIN's injective sum aggregator reaches that bound.
These four are best seen as starting points rather than a closed list. Dozens of later variants add residual connections, gating, edge features, or smarter sampling, yet almost all of them keep the same message-passing core. Understanding GCN, GraphSAGE, GAT, and GIN therefore gives you the vocabulary to read and reason about nearly any modern GNN paper.
At a Glance: Four Key GNN Architectures
| Model | Aggregation | Key idea | Best for |
|---|---|---|---|
| GCN (2017) | Normalised mean | Simple, spectral-rooted baseline | A fast, strong starting point |
| GraphSAGE (2017) | Sample + mean / max / LSTM | Inductive: works on unseen nodes | Large, growing graphs |
| GAT (2018) | Attention-weighted | Learns which neighbours matter | Noisy or uneven neighbourhoods |
| GIN (2019) | Sum (injective) | Maximally expressive (WL bound) | Graph-level classification |
5. What GNNs Predict: Three Levels
Once a GNN has produced embeddings, predictions are made at one of three levels, and the same backbone serves all of them.
- Node level: classify or score individual nodes, for example flagging an account as fraudulent or predicting a user's interests.
- Edge level (link prediction): predict whether an edge should exist between two nodes. This is the engine behind friend suggestions and product recommendations.
- Graph level: summarise an entire graph into a single prediction, such as whether a molecule is toxic. This adds a readout (pooling) step that combines all node embeddings into one graph vector.
This modularity is part of what makes GNNs so versatile: switch the output head and the labels, keep the rest, and the same architecture moves from labelling users to recommending links to screening molecules.
Build the Graph First
Every GNN starts from a graph. Draw nodes and edges, watch how neighbourhoods connect, and build the intuition that message passing relies on.
Launch the Visualizer6. Where Graph Neural Networks Are Used
GNNs have moved quickly from research papers into production systems used by millions of people every day.
- Recommendation: Pinterest's PinSage and related systems run web-scale recommendations over billions of items by treating users and content as one large graph.
- Drug discovery and chemistry: a molecule is literally a graph of atoms (nodes) and bonds (edges), so GNNs predict molecular properties, toxicity, and reactions. This was the original motivation behind message-passing networks.
- Fraud and security: payment and account graphs expose coordinated rings of abuse that isolated, per-account features simply cannot see.
- Traffic and logistics: Google Maps has used GNNs to improve its estimated time-of-arrival predictions across road networks.
- Science and engineering: physics simulation, knowledge-graph reasoning, recommendation in e-commerce, and even chip design all build on graph learning.
What unites these cases is a simple test: if your data is more naturally drawn as a network of relationships than as a flat table of rows, a GNN can usually turn that structure into a measurable accuracy gain. That situation is increasingly common, which is exactly why graph learning has spread so quickly across both industry and the sciences.
7. Challenges, Tools, and Getting Started
GNNs are powerful but not effortless. Beyond oversmoothing, scalability is a real constraint: a graph with billions of edges will not fit in memory, which is exactly why neighbour sampling (GraphSAGE) and graph partitioning were invented. Other active research challenges include oversquashing (too much information forced through a single bottleneck edge), the expressiveness ceiling set by the Weisfeiler-Lehman test, and handling dynamic or heterogeneous graphs whose nodes and edges come in many types.
Getting started, happily, is straightforward thanks to mature libraries. PyTorch Geometric (PyG) and the Deep Graph Library (DGL) ship ready-made GCN, GraphSAGE, GAT, and GIN layers, standard benchmark datasets, and efficient sparse operations. A good first project is node classification on a citation graph such as Cora, where a two-layer GCN written in a handful of lines already beats classical baselines. From that small example, the very same toolkit scales up to the industrial applications above.
Frequently Asked Questions
What is a graph neural network in simple terms?
A graph neural network is a deep learning model that runs directly on graphs of nodes and edges. Each node repeatedly gathers information from its neighbours and updates its own vector, so the final representation captures both the node’s features and its place in the network.
What is message passing in a GNN?
Message passing is the core mechanism: in every layer, each node collects messages from its neighbours, aggregates them with a permutation-invariant function such as sum or mean, and updates its embedding using a small shared neural network. Stacking layers lets information travel further across the graph.
What is the difference between GCN, GraphSAGE, GAT, and GIN?
They mainly differ in aggregation. GCN uses a normalised mean, GraphSAGE samples neighbours and is inductive, GAT learns attention weights for neighbours, and GIN uses a sum aggregator to reach the maximum expressiveness allowed by the Weisfeiler-Lehman test.
How are GNNs different from regular neural networks?
CNNs assume a fixed grid and RNNs assume an ordered sequence. GNNs make no such assumption: they handle irregular, unordered graphs where nodes have varying numbers of neighbours, and they reuse the same weights across every node so a trained model can generalise to new graphs.