Thursday, July 14, 2022

Haskell and data analytics

Here's a question:

Can Haskell be used to analyze data published by, say @KingdomInformed, on, say: @DefiKingdoms heroes for sale?

Answer: welp, let's find out.

We'll answer the above question by posing then answering smaller, simpler questions.

Question 1: can I look at only Serendale heroes?

Answer: yes! https://dfkanalytics.vercel.app/hero select Serendale Realm on filters. 


Question 2: can I download the stats for Serendale?

Answer: yes! Yes, I can, via the 'export'-link. 


Parsing DFK Heroes-for-sale Data


Question 3: can I parse this file?

Answer: YES! ... but this is divided into simpler questions.

Question 3.a: What are the relevant hero data from the CSV file?

Answer: These are, ...


... which we capture in this Haskell data-type.


Question 3.b: Do I want per-line error reporting?
Answer: yeup.


Question 3.c: Does that mean I need to create a language around the DFK hero knowledge-base?
Answer: But of course!


Okay, before we get into the guts of the parser: can we parse in the DFK hero sales data?

Yes, yes we can. 


What do heroes sale data look like when parsed?

Answer: like this. 


Alrighty, then. What does the parser look like?

Answer: At a high level, we ingest the CSV file, scan each line, then parse data from each line into the Hero-data-structure.


What does 'parsing at a low level' look like?

Answer: It looks like this. We match each column of CSV to each datum in the Hero-structure, capturing if we can't at each datum as errors which we report out.

Fortunately, there were no errors, this parse. 😅🎉


Constructing the individual parsers was a reductive exercise, fortunately. Why?

1. First, I constructed a Language of Parsing DFK heroes. 

Once you shape a language of the problem, the solution simply falls out.

2. and I constructed the generic `parseT` higher-order function.


Distribution Analyses

That's great! I can parse DFK data!

Who cares? (I do) 🥺

That's a valid question. Data are to be analyzed.

So, what analyses can we do?

1. We can compute distributions along any dimension of the hero-vectors.

Say: "What are the distribution of levels of heroes for sale?"


Okay, but all that work for just heroes for sale-by-level?

Nupe.

We can do heroes-for-sale by x, where x is any dimension of the Hero-vector.

Let's say, heroes-by-generation? How do we code that?

Easy!

> count gen hs

And we get:


How about "Heroes for sale-by-(primary)-class"?

Answer: np

> count class1 hs

Once we've parsed the data into our SVM, we can query any dimension's distribution simply. 


"What dark majick't is this `count`-function, geophf?" you demand vociferously.

Answer: um, ... it's just a distribution function. 


"So, ya gon' be a d!ck? Or are you gon' share your source code, huh, geophfrmeisterG?"

Me: "...yes?" 🤓😎 

Postlude and teaser


These are simple analyses. Useful? Sure!

But!

How about analyses considering several, or all, vectors?

How about the question: "Given 'a priori'-data, how much can I sell my hero, x, for?"

Would you like an answer for that question?

I would.

Tune in for bayesian analyses.

CrystalVale Hero Data


p.s.: "How many changes did I have to make to my system to parse DFK CrystalVale heroes-for-sale data?"

Answer: none. 



No comments:

Post a Comment