Graph Algorithms with Python

In this article, I will take you through the implementation of Graph Algorithms with Python. As a data scientist, you should be well aware to find relationships among people by using the network they create within each other. So here I will take you through the Graph Algorithms you should know for Data Science using Python.

What are Graph Algorithms?

As machine learning practitioners, we have become quite comfortable with Pandas or SQL or any relational database. We are used to seeing our users in rows with their attributes in columns. But does the real world behave this way?

Also, Read – Machine Learning Full Course for free.

In a connected world, users cannot be seen as independent entities. They have certain relationships with each other and sometimes we would like to include such relationships when building our machine learning models.

Now, while in a relational database we cannot use such relationships between different users, in a graph database it is quite trivial to do so. In this article, I’m going to talk about some of the most important graphics algorithms you should know about and how to implement them using Python.

Graph Algorithms: Connected Components

Connected Components

You can think of connected components in very simple terms as a kind of hard clustering algorithm that finds clusters in connected data.

As a concrete example: let’s say you have data on the roads connecting two cities in the world. And you have to find out all the continents of the world and what city they contain.

The algorithm of connected components that we use to do this is based on a special case of BFS / DFS. I won’t talk much about how it works here, but we’ll see how to get the code to work with Networkx.

I will be using the Networkx module in Python to build and analyze our graphical algorithms. Let’s start with an example chart that we use for our purpose. Contains cities and distance information between them.

I’ll start by creating a list of edges with the distances that I’ll add as the edge weight:

Now I will create a graph:

g = nx.Graph()
for edge in edgelist:
    g.add_edge(edge[0],edge[1], weight = edge[2])

We now want to discover the different continents and their cities from this graphic. We can now do this using the algorithm of connected components like:

for i, x in enumerate(nx.connected_components(g)):
    print("cc"+str(i)+":",x)Code language: PHP (php)
cc0: {'Frankfurt', 'Kassel', 'Munchen', 'Numberg', 'Erfurt', 'Stuttgart', 'Karlsruhe', 'Wurzburg', 'Mannheim', 'Augsburg'}
cc1: {'Kolkata', 'Bangalore', 'Mumbai', 'Delhi'}
cc2: {'ALB', 'NY', 'TX'}

So we can find distinct network components in our data by using edges and vertices. This graph algorithm can be used on different datasets to satisfy any use case as above.

Graph Algorithms: Shortest Path

Continuing with the above example only, we are given a graph with the cities of Germany and their respective distances. You want to know how to get from Frankfurt (the starting node) to Munich by covering the shortest distance.

Applying the shortest route in graphical algorithms is used in Google Maps to find the shortest route. Let’s say you are in a Walmart store. You have different aisles and the distance between all the aisles. You want to offer the customer the shortest route between aisle A and aisle D.

print(nx.shortest_path(g, 'Stuttgart','Frankfurt',weight='weight'))
print(nx.shortest_path_length(g, 'Stuttgart','Frankfurt',weight='weight'))Code language: PHP (php)
['Stuttgart', 'Numberg', 'Wurzburg', 'Frankfurt']

To find the shortest path between all the pairs we can simply use a for loop:

for x in nx.all_pairs_dijkstra_path(g,weight='weight'):
    print(x)Code language: PHP (php)
('Mannheim', {'Mannheim': ['Mannheim'], 'Frankfurt': ['Mannheim', 'Frankfurt'], 'Karlsruhe': ['Mannheim', 'Karlsruhe'], 'Augsburg': ['Mannheim', 'Karlsruhe', 'Augsburg'], 'Kassel': ['Mannheim', 'Frankfurt', 'Kassel'], 'Wurzburg': ['Mannheim', 'Frankfurt', 'Wurzburg'], 'Munchen': ['Mannheim', 'Karlsruhe', 'Augsburg', 'Munchen'], 'Erfurt': ['Mannheim', 'Frankfurt', 'Wurzburg', 'Erfurt'], 'Numberg': ['Mannheim', 'Frankfurt', 'Wurzburg', 'Numberg'], 'Stuttgart': ['Mannheim', 'Frankfurt', 'Wurzburg', 'Numberg', 'Stuttgart']})
('Frankfurt', {'Frankfurt': ['Frankfurt'], 'Mannheim': ['Frankfurt', 'Mannheim'], 'Kassel': ['Frankfurt', 'Kassel'], 'Wurzburg': ['Frankfurt', 'Wurzburg'], 'Karlsruhe': ['Frankfurt', 'Mannheim', 'Karlsruhe'], 'Augsburg': ['Frankfurt', 'Mannheim', 'Karlsruhe', 'Augsburg'], 'Munchen': ['Frankfurt', 'Wurzburg', 'Numberg', 'Munchen'], 'Erfurt': ['Frankfurt', 'Wurzburg', 'Erfurt'], 'Numberg': ['Frankfurt', 'Wurzburg', 'Numberg'], 'Stuttgart': ['Frankfurt', 'Wurzburg', 'Numberg', 'Stuttgart']})

Graph Algorithms: Pagerank

Graph Algorithms: Pagerank

It’s the page sorting algorithm that has fueled Google for a long time. It assigns scores to pages based on the number and quality of inbound and outbound links.

The Pagerank can be used anywhere we want to estimate the importance of nodes in any network. Here I am going to use Facebook data. We have an edge/link file between Facebook users. Let’s first create the FB graph using:

fb = nx.read_edgelist('facebook-combined.txt', create_using = nx.Graph(), nodetype = int)Code language: JavaScript (javascript)

Now let’s create the graph:

facebook pagerank

Graph Algorithms: Centrality Measures

There are many metrics of centrality that you can use as functionality for your machine learning models. I’ll talk about two of them:

  • Centrality between the two: It is not only the users who have the most friends that are important, the users who connect one geography to another are also important because it allows users to see content from various geographies. The centrality of the in-between quantifies the number of times a particular node arrives in the shortest path chosen between two other nodes.
  • Degree of centrality: this is simply the number of connections for a node.

The centrality measures can be used as a feature in machine learning models:

Graph Algorithms: Centrality Measures

You can see the nodes sized based on their centrality values between the two here. They can be considered as information brokers. Breaking one of the nodes with high centrality between the two will split the graph into several parts.

I hope you liked this article on the implementation of Graph Algorithms with Python that you need to know in Machine Learning. Feel free to ask your valuable questions in the comments section below.

Follow Us:

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1535

Leave a Reply