Information RetrievalTechNews

Convolutional network graph: Introduction to GNNA step-by-step guide to using PyTorch Geometric

Graph Neural Networks (GNNs) stand out as one of the most intriguing and swiftly advancing structures in the realm of deep learning. Developed specifically for handling data organized in graph formats, GNNs offer exceptional adaptability and potent learning prowess.

Among the diverse categories of GNNs, Graph Convolutional Networks (GCNs) have risen as the most widespread and extensively used architecture. GCNs are distinctive for their capacity to utilize a node’s characteristics and its nearby environment to make predictions, presenting an efficient approach to manage data structured in the form of graphs.

In this piece, we will delve into the functioning of the GCN layer and elucidate its internal mechanisms. Additionally, we will investigate its real-world use for tasks involving node classification, employing PyTorch Geometric as our preferred tool.

PyTorch Geometric stands as a dedicated expansion of PyTorch tailored particularly for the construction and deployment of GNNs. This sophisticated yet accessible library offers an extensive array of utilities to simplify the process of machine learning on graph-structured data. To initiate our exploration, setting up PyTorch Geometric will be essential. If Google Colab is your platform of choice, PyTorch should already be set up, necessitating only a handful of supplementary commands.

All the code is available on Google Colab and GitHub.

!pip install torch_geometric
import torch
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

With PyTorch Geometric successfully installed, let’s now delve into the dataset that will be the focus of this tutorial.

  1. Graph Data

Graphs play a fundamental role in depicting connections among entities. Graph data is prevalent in numerous real-life situations, including social and computer networks, molecular chemical structures, natural language processing, and image recognition, among various others.

In this piece, we will examine the well-known and widely utilized Zachary’s karate club dataset.

The dataset compiled by Wayne W. Zachary in the 1970s captures the connections established within a karate club. This dataset functions as a social network, depicting club members as nodes and capturing external interactions between members through edges.

In this specific situation, the club members have been divided into four separate categories. Our objective involves correctly labeling each member with their respective group (node classification), relying on the manner in which they have interacted.

Allow’s utilize PyG’s pre-built function to bring in the dataset and make an effort to comprehend the Datasets object it employs.

from torch_geometric.datasets import KarateClub
# Import dataset from PyTorch Geometric
dataset = KarateClub()
# Print information print(dataset)
print(‘————‘)
print(f’Number of graphs: {len(dataset)}’)
print(f’Number of features: {dataset.num_features}’)
print(f’Number of classes: {dataset.num_classes}’)
KarateClub()
————
Number of graphs: 1
Number of features: 34
Number of classes: 4

The dataset comprises a single graph, wherein each node possesses a feature vector with 34 attributes and belongs to one of four categories (representing our four groups). Essentially, the Datasets entity can be conceptualized as an assemblage of Data (graph) entities.

We can conduct a more detailed examination of our distinct graph to acquire additional information about it.

# Print first element
print(f’Graph: {dataset[0]}’)
Graph: Data(x=[34, 34], edge_index=[2, 156], y=[34], train_mask=[34])

The Data object holds a notable level of intrigue. Displaying its contents provides a concise overview of the graph under examination:

  • The node feature matrix, denoted as x=[34, 34], possesses dimensions (number of nodes, number of features). In this instance, it signifies that we encompass 34 nodes (representing our 34 members), where each individual node is linked to a feature vector of 34 dimensions.
  • The edge_index=[2, 156] signifies the connectivity of the graph, indicating how the nodes are linked. This structure has dimensions of (2, number of directed edges).
  • The node ground-truth labels, represented as y=[34], pertain to this scenario. In this particular issue, each node is designated to a singular class (group), thus resulting in a solitary value assigned to each node.
  • The optional “train_mask” attribute, represented as a list of True or False values, designates the nodes intended for training, identified by their indices.

We should output the contents of these tensors to comprehend their stored information. Beginning with the node features is a good way to start.

data = dataset[0]
print(f’x = {data.x.shape}’)
print(data.x)
x = torch.Size([34, 34])
tensor([[1., 0., 0.,  …, 0., 0., 0.],
        [0., 1., 0.,  …, 0., 0., 0.],
        [0., 0., 1.,  …, 0., 0., 0.],
        …,
        [0., 0., 0.,  …, 1., 0., 0.],
        [0., 0., 0.,  …, 0., 1., 0.],
        [0., 0., 0.,  …, 0., 0., 1.]])

In this scenario, the node feature matrix x takes the form of an identity matrix, lacking any significant data regarding the nodes. While it could potentially include details such as age or skill level, this isn’t applicable to this particular dataset. Consequently, our node classification will need to rely solely on the examination of their connections.

Next, we’ll output the edge index for examination.

print(f’edge_index = {data.edge_index.shape}’)
print(data.edge_index)
edge_index = torch.Size([2, 156])
tensor([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  1,
          1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,
          3,  3,  3,  3,  3,  4,  4,  4,  5,  5,  5,  5,  6,  6,  6,  6,  7,  7,
          7,  7,  8,  8,  8,  8,  8,  9,  9, 10, 10, 10, 11, 12, 12, 13, 13, 13,
         13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 20, 20, 21,
         21, 22, 22, 23, 23, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 27, 27,
         27, 27, 28, 28, 28, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31,
         31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33,
         33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33],
        [ 1,  2,  3,  4,  5,  6,  7,  8, 10, 11, 12, 13, 17, 19, 21, 31,  0,  2,
          3,  7, 13, 17, 19, 21, 30,  0,  1,  3,  7,  8,  9, 13, 27, 28, 32,  0,
          1,  2,  7, 12, 13,  0,  6, 10,  0,  6, 10, 16,  0,  4,  5, 16,  0,  1,
          2,  3,  0,  2, 30, 32, 33,  2, 33,  0,  4,  5,  0,  0,  3,  0,  1,  2,
          3, 33, 32, 33, 32, 33,  5,  6,  0,  1, 32, 33,  0,  1, 33, 32, 33,  0,
          1, 32, 33, 25, 27, 29, 32, 33, 25, 27, 31, 23, 24, 31, 29, 33,  2, 23,
         24, 33,  2, 31, 33, 23, 26, 32, 33,  1,  8, 32, 33,  0, 24, 25, 28, 32,
         33,  2,  8, 14, 15, 18, 20, 22, 23, 29, 30, 31, 33,  8,  9, 13, 14, 15,
         18, 19, 20, 22, 23, 26, 27, 28, 29, 30, 31, 32]])

In the realm of graph theory and network analysis, connections between nodes are encoded using diverse data arrangements. Among these, the edge_index stands out as a specific structure, containing the graph’s relationships in the form of dual lists (156 for directed edges, equivalent to 78 bidirectional edges). This dual-list approach is employed to differentiate between the source nodes, stored in the first list, and the destination nodes, stored in the second list.

This technique is referred to as the coordinate list (COO) representation, which serves as an efficient way to store sparse matrices. Sparse matrices are data constructs designed for efficiently holding matrices in which the majority of the elements are zeros. Within the COO format, solely the non-zero elements are stored, conserving memory and computational power.

On the flip side, a more easily understandable and direct method to depict graph connections is by using an adjacency matrix labeled as A. This matrix is square in shape, and each entry Aᵢ indicates whether an edge exists between node i and node j within the graph. To put it differently, a value that is not zero at Ai suggests a link from node i to node j, while a zero signifies the absence of a direct link.

Nonetheless, the adjacency matrix is less space-efficient compared to the COO format when dealing with sparse matrices or graphs containing fewer edges. Nevertheless, due to its clear representation and straightforward interpretation, the adjacency matrix continues to be a favored option for illustrating graph connectivity.

The edge connectivity matrix can be deduced from the edge_index using a useful function called to_dense_adj().

from torch_geometric.utils import to_dense_adj
A = to_dense_adj(data.edge_index)[0].numpy().astype(int)
print(f’A = {A.shape}’)
print(A)
A = (34, 34)
[[0 1 1 … 1 0 0]
[1 0 1 … 0 0 0]
[1 1 0 … 0 1 0]

[1 0 0 … 0 1 1]
[0 0 1 … 1 0 1]
[0 0 0 … 1 1 0]]

In the realm of graph data, nodes being densely linked together are not frequent occurrences. As evident, our adjacency matrix A is sparsely populated with zeros.

In numerous real-world graphs, the majority of nodes establish connections with only a handful of other nodes, leading to a substantial presence of zeros within the adjacency matrix. Storing this abundance of zeros is notably inefficient, prompting the utilization of the COO format by PyG.

Conversely, comprehending ground-truth labels is straightforward.

print(f’y = {data.y.shape}’)
print(data.y)
y = torch.Size([34])
tensor([1, 1, 1, 1, 3, 3, 3, 1, 0, 1, 3, 1, 1, 1, 0, 0, 3, 1, 0, 1, 0, 1, 0, 0,
        2, 2, 0, 0, 2, 0, 0, 2, 0, 0])

The ground-truth labels for our nodes, stored in y, straightforwardly represent the group numbers (0, 1, 2, 3) corresponding to each node. This accounts for the presence of 34 values.

Lastly, we will display the train mask.

print(f’train_mask = {data.train_mask.shape}’)
print(data.train_mask)
train_mask = torch.Size([34])
tensor([ True, False, False, False, True, False, False, False,  True, False,
        False, False, False, False, False, False, False, False, False, False,
        False, False, False, False,  True, False, False, False, False, False,
        False, False, False, False])

The training mask indicates nodes intended for training through True values. These nodes constitute the training set, while the remaining can be deemed the testing set. This separation aids in evaluating the model by offering unseen data for testing purposes.

However, our exploration doesn’t end here. The Data object encompasses a plethora of additional capabilities. It furnishes a range of utility functions that facilitate the examination of numerous graph properties. For example:

  • The function is_directed() informs you about whether the graph has a directed nature. A graph is considered directed when the arrangement of edges in the adjacency matrix is unsymmetrical. This unsymmetrical arrangement reflects that the orientation of edges holds significance in defining relationships between nodes.
  • The isolated_nodes() function verifies the presence of nodes that lack connections with the rest of the graph. These nodes can present challenges in tasks such as classification, mainly due to their absence of interactions with other nodes.
  • The presence of self-loops in a graph is determined by has_self_loops(). A self-loop indicates that a node is connected to itself. This concept is distinct from that of a loop, which involves a path traveling through other nodes and returning to the starting node.

Within the framework of the Zachary’s karate club dataset, none of these characteristics are affirmed by the properties. This suggests that the graph lacks directionality, contains no isolated nodes, and none of its nodes are self-connected.

print(f’Edges are directed: {data.is_directed()}’)
print(f’Graph has isolated nodes: {data.has_isolated_nodes()}’)
print(f’Graph has loops: {data.has_self_loops()}’)
Edges are directed: False
Graph has isolated nodes: False
Graph has loops: False

Ultimately, we have the capability to transform a PyTorch Geometric graph into the widely-used graph library NetworkX by employing the “to_networkx” function. This proves to be highly advantageous when aiming to visually represent a compact graph using NetworkX and matplotlib.

Now, we can create a visualization of our dataset where distinct colors are assigned to each individual group.

from torch_geometric.utils import to_networkx
G = to_networkx(data, to_undirected=True)
plt.figure(figsize=(12,12))
plt.axis(‘off’)
nx.draw_networkx(G,
pos=nx.spring_layout(G, seed=0),
                with_labels=True,
                node_size=800,
                node_color=data.y,
                cmap=”hsv”,
                vmin=-2,
                vmax=3,
                width=0.8,
                edge_color=”grey”,
                font_size=14
                )
plt.show()

This plot of Zachary’s karate club displays our 34 nodes, 78 (bidirectional) edges, and 4 labels with 4 different colors. Now that we’ve seen the essentials of loading and handling a dataset with PyTorch Geometric, we can introduce the Graph Convolutional Network architecture.

  1. Graph Convolutional Network

The goal of this segment is to provide an introduction and construct the graph convolutional layer starting from the basics.

In conventional neural networks, linear layers put into action a linear change to the input data. This change takes the input features x and transforms them into concealed vectors h with the aid of a weight matrix 𝐖. For now, let’s temporarily exclude any considerations of biases, and this can be represented as follows:

When dealing with graph data, an extra level of intricacy emerges due to the interlinks among nodes. These interconnections are significant because, in most cases, networks tend to exhibit the tendency that nodes with similarities are more inclined to be connected compared to those with differences – a concept referred to as network homophily.

We can enhance our node portrayal by combining its attributes with those of its neighboring nodes. This process is referred to as convolution or neighborhood aggregation. Let’s denote the collection of nodes around node i, including node i itself, as Ñ.

Differing from filters found in Convolutional Neural Networks (CNNs), our weight matrix 𝐖 stands as a distinct entity that is shared across all nodes. However, another challenge arises: unlike pixels, nodes don’t possess a fixed count of neighbors.

How do we tackle scenarios in which one node has just a single neighbor while another node boasts 500? If we were to merely sum up the attribute vectors, the resulting embedding h would be disproportionately larger for the node with 500 neighbors. To ensure uniform value ranges for all nodes and establish comparability among them, we can standardize the outcome based on node degrees, where “degree” signifies the quantity of connections a node maintains.

We’re nearly at our destination! As presented by Kipf et al. (2016), the graph convolutional layer incorporates one last enhancement.

The researchers noticed that attributes originating from nodes that possess a larger number of neighbors spread more effortlessly compared to those arising from nodes with fewer connections. To counteract this impact, they proposed the idea of attributing greater weights to features stemming from nodes with a lesser count of neighbors, effectively equalizing the impact among all nodes. This process is formally expressed as:

Take into account that when both i and j possess an identical quantity of neighbors, it corresponds to our very own layer. Now, let’s proceed to comprehend how to put this into practice using Python and PyTorch Geometric.

  1. Implementing a GCN

PyTorch Geometric offers the GCNConv operation, which is responsible for directly enacting the graph convolution layer.

In this illustration, we will generate a simple Graph Convolutional Network using a solitary GCN layer. This layer will be accompanied by a ReLU activation function and a linear output layer. The output layer will provide four numerical outcomes, representing our four classifications, where the class of each node is determined by the most elevated value.

The subsequent code snippet establishes the GCN layer with a hidden layer possessing three dimensions.

from torch.nn import Linear
from torch_geometric.nn import GCNConv
class GCN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.gcn = GCNConv(dataset.num_features, 3)
        self.out = Linear(3, dataset.num_classes)
    def forward(self, x, edge_index):
        h = self.gcn(x, edge_index).relu()
        z = self.out(h)
        return h, z model = GCN() print(model)
GCN(
  (gcn): GCNConv(34, 3)
  (out): Linear(in_features=3, out_features=4, bias=True)
)

Introducing a second GCN layer would expand the model’s capability to not only gather feature vectors from neighboring nodes but also extend this aggregation to encompass the neighbors of those neighboring nodes.

We can integrate multiple graph layers to accumulate values from progressively more distant nodes. However, there’s a caveat: if we incorporate too many layers, the aggregation becomes so intensive that all the embeddings start resembling each other. This occurrence is termed over-smoothing and can pose a significant challenge when a surplus of layers is employed.

Now that we have outlined our Graph Neural Network (GNN), let’s construct a basic training loop utilizing PyTorch. I’ve opted for a standard cross-entropy loss since the task involves multi-class classification, and the optimizer of choice is Adam. For simplicity and a focus on comprehending GNN learning, this article won’t encompass a train/test data split.

The training loop follows a conventional pattern: the objective is to predict accurate labels, and then we compare the results generated by the GCN to the values stored in data.y. The discrepancy is quantified using the cross-entropy loss and propagated back through the Adam optimizer to fine-tune the GNN’s weights and biases. Eventually, we display metrics at intervals of every 10 epochs.

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)
# Calculate accuracy
def accuracy(pred_y, y):
    return (pred_y == y).sum() / len(y)
# Data for animations
embeddings = []
losses = []
accuracies = []
outputs = []
# Training loop
for epoch in range(201):
    # Clear gradients
    optimizer.zero_grad()
    # Forward pass
    h, z = model(data.x, data.edge_index)
    # Calculate loss function
    loss = criterion(z, data.y)
    # Calculate accuracy
    acc = accuracy(z.argmax(dim=1), data.y)
    # Compute gradients
    loss.backward()
    # Tune parameters
    optimizer.step()
    # Store data for animations
    embeddings.append(h)
    losses.append(loss)
    accuracies.append(acc)
    outputs.append(z.argmax(dim=1))
    # Print metrics every 10 epochs
    if epoch % 10 == 0:
        print(f’Epoch {epoch:>3} | Loss: {loss:.2f} | Acc: {acc*100:.2f}%’)
Epoch   0 | Loss: 1.40 | Acc: 41.18%
Epoch  10 | Loss: 1.21 | Acc: 47.06%
Epoch  20 | Loss: 1.02 | Acc: 67.65%
Epoch  30 | Loss: 0.80 | Acc: 73.53%
Epoch  40 | Loss: 0.59 | Acc: 73.53%
Epoch  50 | Loss: 0.39 | Acc: 94.12%
Epoch  60 | Loss: 0.23 | Acc: 97.06%
Epoch  70 | Loss: 0.13 | Acc: 100.00%
Epoch  80 | Loss: 0.07 | Acc: 100.00%
Epoch  90 | Loss: 0.05 | Acc: 100.00%
Epoch 100 | Loss: 0.03 | Acc: 100.00%
Epoch 110 | Loss: 0.02 | Acc: 100.00%
Epoch 120 | Loss: 0.02 | Acc: 100.00%
Epoch 130 | Loss: 0.02 | Acc: 100.00%
Epoch 140 | Loss: 0.01 | Acc: 100.00%
Epoch 150 | Loss: 0.01 | Acc: 100.00%
Epoch 160 | Loss: 0.01 | Acc: 100.00%
Epoch 170 | Loss: 0.01 | Acc: 100.00%
Epoch 180 | Loss: 0.01 | Acc: 100.00%
Epoch 190 | Loss: 0.01 | Acc: 100.00%
Epoch 200 | Loss: 0.01 | Acc: 100.00%

Excellent! As expected, we achieve a perfect accuracy of 100% on the training set, which constitutes the entire dataset. This indicates that our model has successfully grasped the task of accurately assigning each individual within the karate club to their respective group.

To enhance our understanding, we can generate an organized visualization by animating the graph. This animation will illustrate the progression of the GNN’s predictions as the training advances.

%%capture
from IPython.display import HTML
from matplotlib import animation
plt.rcParams[“animation.bitrate”] = 3000
def animate(i):
    G = to_networkx(data, to_undirected=True)
    nx.draw_networkx(G,
                    pos=nx.spring_layout(G, seed=0),
                    with_labels=True,
                    node_size=800,
                    node_color=outputs[i],
                    cmap=”hsv”,
                    vmin=-2,
                    vmax=3,
                    width=0.8,
                    edge_color=”grey”,
                    font_size=14
                    )
    plt.title(f’Epoch {i} | Loss: {losses[i]:.2f} | Acc: {accuracies[i]*100:.2f}%’,
              fontsize=18, pad=20) fig = plt.figure(figsize=(12, 12)) plt.axis(‘off’)
anim = animation.FuncAnimation(fig, animate, \
            np.arange(0, 200, 10), interval=500, repeat=True)
html = HTML(anim.to_html5_video())
display(html)

Initially, the initial predictions appear arbitrary, but over time, the GCN achieves flawless labeling for every node. As a matter of fact, the ultimate graph mirrors the one visualized in the concluding part of the initial section. But what exactly does the GCN comprehend through this process?

Through the accumulation of features from adjacent nodes, the Graph Neural Network (GNN) acquires a vector portrayal, or embedding, for each node within the network. In our model, the last layer simply learns how to exploit these representations to generate the most accurate classifications. Nonetheless, it’s the embeddings themselves that constitute the true outcomes of GNNs.

Let’s proceed to display the embeddings that our model has acquired.

# Print embeddings
print(f’Final embeddings = {h.shape}’)
print(h)
Final embeddings = torch.Size([34, 3])
tensor([[1.9099e+00, 2.3584e+00, 7.4027e-01],
        [2.6203e+00, 2.7997e+00, 0.0000e+00],
        [2.2567e+00, 2.2962e+00, 6.4663e-01],
        [2.0802e+00, 2.8785e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 2.9694e+00],
        [0.0000e+00, 0.0000e+00, 3.3817e+00],
        [0.0000e+00, 1.5008e-04, 3.4246e+00],
        [1.7593e+00, 2.4292e+00, 2.4551e-01],
        [1.9757e+00, 6.1032e-01, 1.8986e+00],
        [1.7770e+00, 1.9950e+00, 6.7018e-01],
        [0.0000e+00, 1.1683e-04, 2.9738e+00],
        [1.8988e+00, 2.0512e+00, 2.6225e-01],
        [1.7081e+00, 2.3618e+00, 1.9609e-01],
        [1.8303e+00, 2.1591e+00, 3.5906e-01],
        [2.0755e+00, 2.7468e-01, 1.9804e+00],
        [1.9676e+00, 3.7185e-01, 2.0011e+00],
        [0.0000e+00, 0.0000e+00, 3.4787e+00],
        [1.6945e+00, 2.0350e+00, 1.9789e-01],
        [1.9808e+00, 3.2633e-01, 2.1349e+00],
        [1.7846e+00, 1.9585e+00, 4.8021e-01],
        [2.0420e+00, 2.7512e-01, 1.9810e+00],
        [1.7665e+00, 2.1357e+00, 4.0325e-01],
        [1.9870e+00, 3.3886e-01, 2.0421e+00],
        [2.0614e+00, 5.1042e-01, 2.4872e+00],

        [2.1778e+00, 4.4730e-01, 2.0077e+00],
        [3.8906e-02, 2.3443e+00, 1.9195e+00],
        [3.0748e+00, 0.0000e+00, 3.0789e+00],
        [3.4316e+00, 1.9716e-01, 2.5231e+00]], grad_fn=<ReluBackward0>)

As evident, embeddings are not obliged to possess identical dimensions as feature vectors. In this context, I opted to diminish the dimensionality from 34 (corresponding to dataset.num_features) to three. This reduction facilitates the creation of an appealing 3D visualization.

Now, let’s proceed to depict these embeddings before any training takes place, specifically at epoch 0.

# Get first embedding at epoch = 0
embed = h.detach().cpu().numpy()
fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(projection=’3d’)
ax.patch.set_alpha(0)
plt.tick_params(left=False,
                bottom=False,
                labelleft=False,
                labelbottom=False)
ax.scatter(embed[:, 0], embed[:, 1], embed[:, 2],
           s=200, c=data.y, cmap=”hsv”, vmin=-2, vmax=3)
plt.show()

We observe all the nodes belonging to Zachary’s karate club along with their actual labels (rather than the forecasts made by the model). Currently, their positions appear scattered as the GNN hasn’t undergone training yet. Yet, if we were to generate visualizations of these embeddings at every iteration of the training loop, we could gain insight into the actual knowledge that the GNN assimilates.

Now, let’s examine the progressive changes in these embeddings as the GCN progressively enhances its ability to classify nodes accurately.

%%capture
def animate(i):
    embed = embeddings[i].detach().cpu().numpy()
    ax.clear()
    ax.scatter(embed[:, 0], embed[:, 1], embed[:, 2],
           s=200, c=data.y, cmap=”hsv”, vmin=-2, vmax=3)
    plt.title(f’Epoch {i} | Loss: {losses[i]:.2f} | Acc: {accuracies[i]*100:.2f}%’,
              fontsize=18, pad=40) fig = plt.figure(figsize=(12, 12)) plt.axis(‘off’) ax = fig.add_subplot(projection=’3d’) plt.tick_params(left=False,
                bottom=False,
                labelleft=False,
                labelbottom=False) anim = animation.FuncAnimation(fig, animate, \
              np.arange(0, 200, 10), interval=800, repeat=True)
html = HTML(anim.to_html5_video())
display(html)

Our Graph Convolutional Network (GCN) has effectively acquired embeddings that successfully cluster similar nodes into distinctive groups. Consequently, this empowers the final linear layer to effortlessly differentiate between these clusters and assign them to separate classes.

Embeddings are not confined solely to GNNs; they are pervasive throughout the realm of deep learning. Moreover, their dimensionality need not be limited to three; in fact, such low-dimensional embeddings are seldom encountered. For instance, language models like BERT generate embeddings featuring 768 or even 1024 dimensions.

Additional dimensions serve to encode more intricate information about nodes, text, images, and other entities. However, they also contribute to larger models that are inherently more challenging to train. This delineates why the maintenance of low-dimensional embeddings for as long as possible holds distinct advantages.

  1. Conclusion

Graph Convolutional Networks exhibit remarkable adaptability that makes them applicable across numerous scenarios. In this piece, we acquainted ourselves with the PyTorch Geometric library and concepts like Datasets and Data. Subsequently, we effectively reconstructed a graph convolutional layer from scratch. We proceeded to translate theory into practical application by creating a GCN, which provided insight into tangible aspects and the interactions between individual elements. Ultimately, we visualized the training procedure, gaining a distinct understanding of its implications for such a network.

The Zachary’s karate club dataset, while basic, serves as a sufficient resource to grasp fundamental principles in graph data and GNNs. While this article focused solely on node classification, it’s important to note that GNNs can tackle various other tasks, such as link prediction (e.g., suggesting a friend), graph classification (e.g., labeling molecules), graph generation (e.g., creating new molecules), and more.

Apart from GCN, researchers have put forward a multitude of GNN layers and structures. In the upcoming article, we will present the Graph Attention Network (GAT) architecture. GAT dynamically calculates the normalization factor of GCN and determines the significance of each connection using an attention mechanism.

If you’re eager to expand your understanding of graph neural networks, explore the realm of GNNs further through my book titled Hands-On Graph Neural Networks.

Leave a Reply

Your email address will not be published. Required fields are marked *