Saturday, February 13, 2016

Modeling Order Lifecycles with Graphs

Problem

The stock market is a large, complex system, which, due to its size and complexity, is difficult to visualize in meaningful ways: data is either too numerous to be useful or the data is portrayed in ways that only experts can decipher, e.g.: as spreadsheets.

What if this?

What if it were possible to trace the lifecycle of an order, from when the buy or the sell was placed to the execution at the exchange?

Nice dream, and, for over 90% of order lifecycles, entirely feasible: most trades from start to finish have a total of four transactions:

1. Order placed by the individual
2. Order consolidated by brokerage firm
3. (meta)Order placed at the exchange
4. Order executed

Simple.

What's so hard about that?

Two things:

1. One (simple) order is easy to tackle, but exchange data comes at the rate of 6 billion transactions per day. How do we handle the sheer volume of data what would cause most databases to falter on just one day's worth of data?

2. Some orders are not so simple. With computer trading, the order lifecycle can contain over 1 billion transactions for just one order as the computers track the volatile price of a security through its ups and downs with continuously placing and cancelling orders until the exact price is matched to the exact amount that maximizes profitability for the person or corporation making the trade.

Solution

What are we doing to address these database-killing concerns?

Nothing. HA!

No, really, that's our solution.

Up to this point solutions to the stock market-problem have been focussed around capturing these transactions and then, once captured, being able to reassemble the trade in a meaningful way and in a timely fashion. Now, thanks to FINRA's novel approach of capturing these data, half the problem is solved: storing these data is feasible, and retrieving these data is fast.

Good show.

But, its representation?

Up to now, orders have been reassembled as rows in spreadsheets. This representation is a problem, however: it tacitly views an order as a straight-line progression from start to finish.

For the most part, that is: for the vast majority of trades, this is not a 'problem' at all, as the order lifecycle for most trades is indeed linear: the order is placed, the order is consolidated, then the order is executed.

But, for the more complex cases, that is: the more interesting trades, the order lifecycle can contain mid-stream additions, subtractions, and joins between brokerage houses with other orders, and this complexity makes the idea for these orders are linear, or even a tree, a fallacy: there is no single origin, and, in these cases, an order can be distributed over multiple executions, so there is no single conclusion.

This is not linear, nor is this structure a tree: multiple origins and exits are best represented by a graph, not the straight line of rows in a spreadsheet tabled-representation.

Let's take a couple of examples and show how this bears out in practice.

Your 'Joe Average'-trade:

[figure 1: Joe Blow Schmoe Chloƫ]

Okay, you see here the linearliness of an ordinary trade.

No. Big. Deal.

Or. Is. It?

What you see here, and what you don't, is the compactness of this trade. By being represented as a graph, it can start from its origin: the buy or the sell order, if the analyst doesn't wish to see anything more than that, then they move onto the next trade, their screen not cluttered with data they do not wish to see.

What if they wish to see the data?

Then, in that case, they expand the origin, getting the simple set of related transactions, these transactions can be drilled into extracting the specifics of the data, or they can be left as is, keeping the data contained and the screen clear to show simply the flow of the trade.

The Spreadsheet

"If you live by the spreadsheet, you die by the spreadsheet."

Cases live and die on the spreadsheet, can the graph of a trade be exported to a spreadsheet?

Why, yes. Yes, it can.

[figure 2: spreadsheet ... so teh borin'z]

Okay, we've put that one to bed.

But what about complex trades? Or really, Really, REALLY BIG-HONKIN' trades?

How do we handle those?

Complex/Nonlinear Trades

See, now if someone wants to get down and dirty with a trade, that's the point where a spreadsheet has to throw up its hands and go hide under the covers.

But for graphs?

This is where graphs shine:

[figure 3: sparkle and shine on you crazy diamond]

With a complex order, we see that, yes, we start with a single entry point – the one that we know for this order – but as we expand the flow of trade, discovering the transactions of it, we see more and more entry points which then resolve to multiple exit points.

Gosh! Trades aren't the nice and simple things that I want them to be! But with the graph-representation, we capture that complexity and show it, firstly: accurately, and secondly: at staged levels of complexity that I, the analyst, and willing the drill into ... and, importantly, at not the complexity that I am not desirous to see.

BIG-OLE Trades

Now, showing BIG trades (100s+ of rows, to 1000s, to millions, to a billion rows) ... are there any problems with that?

Well, yeah. Obvs.

Let's discuss, firstly, the problem of show big trades in spreadsheets.

The problem: just too much. You get just too much data and a lot of those data, during certain times during the inquiry are not at all interesting to the analyst. What do I want to see, right off the bat, as the analyst?

Firstly, am I investigating the right trade? Is the time correct, the trader correct, and how many transactions do I need to see these data? One. The graph of any inquiry starts with just one node, not 100 rows of data paged in at 25 rows at a time: too much information can overwhelm the analyst.

What's the next thing I want to see as the analyst? When the trade is executed and the details of that execution. How many rows do I need to see that? One. How many rows does a spreadsheet show you to get the execution of the trade?

All teh rowz. Alles o' dem.

You page through, then page through, then page through this trade, one painfully refreshed page at a time to get to that very last row, which was all that you cared about, and how much brain-capital and time did you waste to get to the last row?

Now, a smart analyst can set the filters and look for the EX-row that pops right to the top.

How many analysts do that in practice? Right now? How many?

None? One? Two?

ALLES TEH ANALYSTS 'CUZ DEY SO PERFECT?

How many?

The problem with a system that provides features is the assumption that the user-base will, firstly, see them, and then, secondly, use them. In practice, this rarely, if ever, happens.

Why not this? Instead of presenting alles tehz daterz that the user has to shift through, manually, or be smart enough to filter out, optionally, just present the data visually by importance.

[figure 4: teh big win]

Here we see a transaction of 27,324 rows, first as one transaction, then as three nodes: the origin, the 'everything else in between' and then the execution.

Simple.

And, if, and when, the analyst wish to drill into the 'everything including the kitchen sink' set of transactions, the analyst can do that.

When they want to.

But there it is, right at their fingertips everything vitally important about an order lifecycle right up front, visually, in a very simple, easy-to-understand representation.

And can you save this order lifecycle out to a spreadsheet?

Yes. Yes, you can.

[figure 5: big ole spreadsheet]

Implementation

Okey-dokey (that's a technical term). So, but how do we do it?

We're not here to reinvent the wheel. FINRA has done the discovery and invention already. We're here to provide a meaningful and accurate representation of the order lifecycle of a security, and we'll use FINRA-processed data to do it.

[figure 6: tehz architecturez]

(hive)->(red shift)->(database)
                                    |
                                    v
                               (graph)

All of this architecture up to the graph database has already been designed and implemented by FINRA, so, by the time the analyst requests an order lifecycle to examine, it is already right-sized to contain the transactions of that OLA and that OLA alone. Uploading those transactions and linking them in the graph database takes from no-perceived time for small OLAs to a few seconds for 1000+-transaction OLAs (and these few seconds can be inlined with the hive-query, effectively giving the analyst the graph-representation 'for free' from the user experience).

When the analyst views the summarized OLA they make the determination what to drill into and examine further, and how do that, and when they want to do that. If, on summary inspection, they determine that these are not the transactions they are looking for and want to move along, move along, then, this, too, is simplicity in and of itself: the graph database flushes that OLA from the data store and loads the already realized next graph into the database, in sub-second time.

Easy-peasy (a technical term).

Technical Concerns

"It can't be done."

Yes, it can. Or: what can't be done? Modelling an entire exchange's trades for a day as a graph? Yeah, I'll grant you, that's hard, but that's not what the analyst wants nor needs: the analyst is assigned a set of OLAs to review and beyond those trades is not interested in how every other trade in the markets are doing. If there is that concern, there are other very good tools that FINRA has to answer questions along those lines: the purpose of this business is data visualization of the Order Lifecycle Assembly, not all and every trade of every security.

Or: "it can't be done" being: "some OLAs are too big for graph databases."

No, they aren't.

Or, put another way: who is going to look at one billion transactions all at once and know that they need to go right to row 3,456,123 to see that, Ah, yes! THERE is the fraudulence! I got'm!

No, putting a case together is an iterative process of building a case from relevant transactions, and such a trend is observed over a (relatively) small subset of data: this upspike is telling or that downswing is crucial.

These trends are better represented by other tools than graphs, when you are looking for trends, and we got those for you:

[figure 7: teh spreadsheet graph. yayz]

If you need a billion rows for your case, then we can have one node represent 

(:TRANSACTIONS { rows: 1,656,332,445 })

Was that hard? How much memory did that take?

And you want to drill into that? Simple, we expand the graph on an as-needed basis as you drill into the relevant nodes of interest.

Summary

Order Lifecycles are graph structures. Graph databases give you the tools to visualize these graphs meaningfully outside the same-old world of OLAs-as-spreadsheets and give you the fine-tuning controls to throttle when and how much data is coming at you.

While other companies are 'considering' graphs as solutions and 'looking at' the graph database-option, our company has set up graph databases for both U.S. Government customers as well as for commercial clients, so we understand the various levels of technical and managerial skills and vision for visualization products and we also understand the need for relevant data, now, and show in an effective way. Graph databases have given our customers what they want and even beyond their expectations of the technology: once you can see what you couldn't see before in your data, it then becomes useful for tasks needing to be addressed now, but also for new work never imagined before.

FINRA is in desperate need of a visualization tool that displays Order Lifecycles accurately and usefully.


Choose us, and be very impressed.

No comments:

Post a Comment