Implementation of information extraction pipeline that includes coreference resolution, entity linking, and relationship extraction techniques.

I am thrilled to present my latest project I have been working on. If you have been following my posts, you know that I am passionate about combining natural language processing and knowledge graphs. In this blog post, I will present my implementation of an information extraction data pipeline. Later on, I will also explain why I see the combination of NLP and graphs as one of the paths to explainable AI.

Information extraction pipeline

What exactly is an information extraction pipeline? To put it in simple terms, information extraction is the task of extracting structured information from unstructured data such as text.

Steps in my implementation of the IE pipeline. Image by author

How to combine Named Entity Linking with Wikipedia data enrichment to analyze the internet news.

A wealth of information is being produced every day on the internet. Understanding the news and other content-generating websites is becoming increasingly crucial to successfully run a business. It can help you spot opportunities, generate new leads, or provide indicators about the economy.

In this blog post, I want to show you how you can create a news monitoring data pipeline that combines natural language processing (NLP) and knowledge graph technologies.

The data pipeline consists of three parts. In the first part, we scrape articles from an Internet provider of news. Next, we run the articles through an NLP pipeline…

Learn how to use the GraphSAGE embeddings in Neo4j Graph Data Science library to improve your Machine Learning workflows

The use of knowledge graphs and graph analytics pipeline is getting more and more popular. If you keep an eye on the graph analytics field, you already know that graph neural networks are trending. Unfortunately, there aren’t many tutorials out there on how to use them in a practical application. For this reason, I have decided to write this blog post, where you will learn how to train a convolutional graph neural network and integrate it into your machine learning workflow to improve downstream classification model accuracy.


In this example, you will reproduce the protein role classification task from the…

Learn how to import, clean, and analyze ArXiv dataset in Neo4j. In the last step, you will learn how to create a search and recommendation engine for articles.

In Europe, we are deep in the second wave of Covid lockdown. I’ve seen some motivational speakers talk about using this time and learning a new skillset. As a child, I’ve always liked nuclear experiments, so I decided to build a reactor in my basement and try some experiments. I’ve already got a basement, so now I only need to learn nuclear physics or maybe get some nuclear researchers to help me out.

I’ve got the idea from Estelle Scifo, who imported and analyzed the ArXiv dataset in Neo4j. We’ll take a detailed look at the nuclear experiments category of…

Hands-on Tutorials, Marvel network analysis

Introducing the new k-nearest neighbors algorithm in the Graph Data Science library

A wise man once said that the 2020–30 decade will be the decade of graph data science. Actually, that happened just a few days ago at the Nodes 2020 conference, and that wise man was Emil Eifrem presenting at the keynote of the Nodes 2020. In case you missed the conference, all the presentations are already available online.

Only fitting Emil’s statement, a pre-release of the 1.4 version of the Neo4j Graph Data Science library was published a couple of days ago. It is a significant milestone for the GDS library. A lot of new features were added in this…

Traveling tourist

A deep dive into pathfinding algorithms available in Neo4j Graph Data Science library

In the first part of the series, we constructed a knowledge graph of monuments located in Spain from WikiData API. Now we’ll put on our graph data science goggles and explore various pathfinding algorithms available in the Neo4j Graph Data Science library. To top it off, we’ll look at a brute force solution for a Santa Claus problem. Now, you might wonder what a Santa Claus problem is. It is a variation of the traveling salesman problem, except we don’t require the solution to end in the same city as it started. …

Traveling tourist

Import data from WikiData and Open Street Map API to create a knowledge graph in Neo4j

After a short summer break, I have prepared a new blog series. In this first part, we will construct a knowledge graph of monuments located in Spain. As you might know, I have lately gained a lot of interest and respect for the wealth of knowledge that is available through the WikiData API. We will continue honing our SPARQL syntax knowledge and fetch the information regarding the monuments located in Spain from the WikiData API. I wasn’t aware of this before, but scraping the RDF data available online and importing it into Neo4j is such a popular topic that Dr…

Learn how to scrape the LOTR world from WikiData and analyze it with the Neo4j for Graph Data Science toolbox

After so much success with my previous lengthy post about combining NLP techniques and graphs, I have prepared another exhaustive tutorial. We will go over a couple of topics. We will begin by importing the data into Neo4j via the WikiData API. By the time we are done, we will scrape most of the LOTR information available on WikiData. In the next step, we will prepare an exploratory data analysis and show how to populate missing values based on some hypothesis. To top it off, we will run a couple of graph algorithms and prepare some beautiful visualizations.

Make sure…

Learn how to train your custom node2vec algorithm with Neo4j Graph Data Science

My last blog post about combining graphs with NLP techniques was the most successful by far. It motivated me to write more about this topic. During my research, I stumbled upon the node2vec algorithm and noticed how easy it would be to implement it with Neo4j and Graph Data Science library. I guess that left me no other choice than to put on my Neo4j Data Science glasses and demonstrate how easy it is to implement it.

Graph import

Today we will be using the Spoonacular Food Dataset that is available on Kaggle. It contains nutritional information alongside the ingredients used in…

Learn how to set up an NLP pipeline and analyze its results with Neo4j

Get ready, as today we will be roleplaying Neo from the Matrix. After he has had his moment of revelation, Neo realizes that even though the world seems very unstructured and random, there is a structured green code hiding behind all this disarray. We will take his knowledge of finding hidden structures in chaos and apply it to text. What might at first seem as only unstructured chaos, will soon become quite structured and full of insights once we put on our Neo4j glasses.


  1. Set up Neo4j Desktop
  2. Graph import
  3. Text classification
  4. Named-entity recognition
  5. Sentiment analysis
  6. Unipartite projection of a…

Tomaz Bratanic

Data explorer. Turn everything into a graph.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store