Logical Graphs LLC: Haskell and data analytics

Thursday, July 14, 2022

Haskell and data analytics

Here's a question:

Can Haskell be used to analyze data published by, say @KingdomInformed, on, say: @DefiKingdoms heroes for sale?

Answer: welp, let's find out.

We'll answer the above question by posing then answering smaller, simpler questions.

Question 1: can I look at only Serendale heroes?

Answer: yes! https://dfkanalytics.vercel.app/hero select Serendale Realm on filters.

Question 2: can I download the stats for Serendale?

Answer: yes! Yes, I can, via the 'export'-link.

Parsing DFK Heroes-for-sale Data

Question 3: can I parse this file?

Answer: YES! ... but this is divided into simpler questions.

Question 3.a: What are the relevant hero data from the CSV file?

Answer: These are, ...

... which we capture in this Haskell data-type.

Question 3.b: Do I want per-line error reporting?

Answer: yeup.

Question 3.c: Does that mean I need to create a language around the DFK hero knowledge-base?

Answer: But of course!

Okay, before we get into the guts of the parser: can we parse in the DFK hero sales data?

Yes, yes we can.

What do heroes sale data look like when parsed?

Answer: like this.

Alrighty, then. What does the parser look like?

Answer: At a high level, we ingest the CSV file, scan each line, then parse data from each line into the Hero-data-structure.

What does 'parsing at a low level' look like?

Answer: It looks like this. We match each column of CSV to each datum in the Hero-structure, capturing if we can't at each datum as errors which we report out.

Fortunately, there were no errors, this parse. 😅🎉

Constructing the individual parsers was a reductive exercise, fortunately. Why?

1. First, I constructed a Language of Parsing DFK heroes.

Once you shape a language of the problem, the solution simply falls out.

2. and I constructed the generic `parseT` higher-order function.

Distribution Analyses

That's great! I can parse DFK data!

Who cares? (I do) 🥺

That's a valid question. Data are to be analyzed.

So, what analyses can we do?

1. We can compute distributions along any dimension of the hero-vectors.

Say: "What are the distribution of levels of heroes for sale?"

Okay, but all that work for just heroes for sale-by-level?

Nupe.

We can do heroes-for-sale by x, where x is any dimension of the Hero-vector.

Let's say, heroes-by-generation? How do we code that?

Easy!

> count gen hs

And we get:

How about "Heroes for sale-by-(primary)-class"?

Answer: np

> count class1 hs

Once we've parsed the data into our SVM, we can query any dimension's distribution simply.

"What dark majick't is this `count`-function, geophf?" you demand vociferously.

Answer: um, ... it's just a distribution function.

"So, ya gon' be a d!ck? Or are you gon' share your source code, huh, geophfrmeisterG?"

Me: "...yes?" 🤓😎

Postlude and teaser

These are simple analyses. Useful? Sure!

But!

How about analyses considering several, or all, vectors?

How about the question: "Given 'a priori'-data, how much can I sell my hero, x, for?"

Would you like an answer for that question?

I would.

Tune in for bayesian analyses.

CrystalVale Hero Data

p.s.: "How many changes did I have to make to my system to parse DFK CrystalVale heroes-for-sale data?"

Answer: none.

Pages