Thursday, May 30, 2019

Exploring Entities-viewed Graph Data

Let's say you have a graph database of entities viewed in a periodical over a five-month span (here, for example). How do you explore the data? Well, we can explore it visually, traversing node-to-node in the graph, or we can get the results of data analysis in a tabular form, because analysts' bread and butter is the spreadsheet, don't you know. Let's explore both approaches.

I. Visually

Case Study 1: April 2019 entity views

Let's say you want to see the top entities for a particular month. And we just so happen to know in which year that month is that we wish to explore, so that's a plus.

First, we request the year node:

match (y:Year) return y

With that query, you get back the Year node (or nodes).



Select it to view the months of that year:



Ah! I'm really interested in the top entities viewed in April. Let's see what they are. Select the April Month node to view them:





And up pops the entities and their view-counts for April. But note something else. Any entities that have shown up in any other month have the view-counts for those months linked to them, as well. Because those other month-nodes are present in this graph-view, you get those linkages manifested "fo' free!"

Case Study 2: From an Entity's Perspective

Let's now explore the graph from the opposite perspective. That is to say, instead of drilling down from the year to the month to the entities of that month, let's look at the graph starting from an entity. Choosing one, let's go with "donald trump" (because I just so happen to know this entity shows up in more than one month, oddly enough).

Our Cypher query to get a particular entity is:

match (e:Entity { name: "donald trump" }) return e

Executing that query, we get:



And – boom! – there you have it, "donald trump." Let's see the months this entity has been viewed, and the view-counts by month. To expand the entity we select it to get:



Well, how many times has this entity been viewed in total? That is to say, across the entire data set? Well, we can eyeball it and say "eh, more than 12?" and we'd not be wrong.

II. Cypher Queries

Case Study 1: April's Entities as a Spreadsheet

We've already been doing some rudimentary cypher queries to get our starting-point nodes. Let's use Cypher to get our data back in spreadsheet-form, because we all so love spreadsheets!

Let's do a simple enough query, repeating what we already saw, visually, above, and query the entities viewed in April. The Cypher query to return that (sorted) result in tabular form is:

match (e:Entity)<-[c:Count]-(m:Month { month: "April"})
return e.name as name,toInteger(c.count) as count
order by count descending

And we get a nice little spreadsheet back:



And in the upper right-hand corner is an 'export CSV'-button, so you can actually download the data as CSV and import that to your spreadsheet; lolneat!

Case Study 2: Donald Trump's Views

Let's answer the previous question: how many views did the "donald trump"-entity get across the entire duration of the data period? The Cypher query that gets us that information is:

match (e:Entity { name: "donald trump"})<-[c:Count]-()
return e.name as name, sum(toInteger(c.count)) as views

This returns the result of:



So, clocking in at 11,709 views, we were correct in saying the entity "donald trump" has more than 12 views in total.

Case Study 3: All Entities Viewed

Okay, we know the view-counts for the entity "donald trump." Great! How about a query that gives us all the entities viewed, sorted by most viewed first. The Cypher query to do this is:

match (e:Entity)<-[c:Count]-()
return e.name as name, sum(toInteger(c.count)) as views
order by views descending

And the results that query shows us is:



Conclusion

So there you have it! Exploring a graph both visually and by extracting data via a cypher query.

No comments:

Post a Comment