Wednesday, January 6, 2016

Fixing the Data Fix: Restoring from Backup

Data fix, part II, or: how to fix the data fix

So, yesterday, we looked at doing a hot data fix, and it ... worked, or it seemed to, but then, after adding the new day's data, instead of getting something like this:



this happened:



Oh, no! Now we have a data corruption, where stocks from the previous day and the current day are intermingled (the second graph-diagram) under one set of MKT_CAP, PRICE, and VOLUME headings instead of being separated into their respective days (as the first graph-diagram shows).

What to do? We have basically two options:

1. tease apart all the data relations for those two days, adding in the correct headings for each day and then partitioning the stocks to their respective days from the saved queries stored in the log files.

... you do save your log files don't you?

Basically, this is another hot fix. Possible? Yes. Doable? Yes. Risky?

... well: yes.

2. Or, we could make sure the data-load cypher queries are now correct offline (they are, I verified), blow away the current, corrupted database, restore from backup before the corrupt occurred and reload the corrected data from the stored cypher queries.

Basically, restore from backup.

Doable? Let's check. Do we have a backup? Well, thanks to grapheneDB.com, a DaaS ('Data as a Service'), we do, as they do a daily backup and save a week's worth of them for you under their professional plan. But before we start the restore, let's just be sure and save off our database as an export. You know: just in case we lose everything.

Exporting a database

How do we do that?

Simple. We go to the admin page which has the Export Database-option:



Then we select that option:


A database with six-month's worth of top 5s – under 2,000 nodes and under 2,000 relations – takes no time to prepare for export.  And then, we download that export:


Restoring from Backup

Now that we have a local export of our database, let's restore from out backup. 

The corruption occurred on 2016-01-05, so we use the 2016-01-04 backup:

... and accept the warning that we're about to wipe our database, because, yes, we want to eliminate the data corruption I introduced:

Then grapheneDB.com tells you, that, yes, your database is restored from backup:



Which is all well and good, but I'll do due-diligence, myself and confirm that the database is restored, firstly, and to the state before the corruption occurred.

Yup!

(Actually, I checked further into the database, and you should, too, to the level of your assurance, but this is not in the scope of this article.)

Data Correction from the point of the Restore

Okay, database restored to it's old state. WHEW! So we're back at ground zero, as it were, pre-2016. Now, let's reload our corrected 2016 data. To do that, I wrote a little utility that converts stock Top5s to Cypher-queries in JSON, called 'jsoner.sh.' I save off the top 5s to a daily record, the format of which is, e.g., this:

date>kind:Leaders|Losers
2015-12-31
Mkt_Cap:EPD,LBTYK,WMB,AAPL,GOOG|GOOGL,MSFT,AMZN
Price:NM-G,SWN,WPX,BCOM,LTRPB|AXON,NK,TXMD
Volume:GE,BAC,AAPL,KMI,QQQ,SPY,MSFT,SIRI,SUNE,FCX
2016-01-04
Mkt_Cap:EPD,BXLT,WFC-L,AMZN,GOOGL|GOOG,BABA,WFC
Price:SUNE,EPE,CHK,QUNR,AXON|RARE,NK,TEAM
Volume:SPY,BP,SAN,BHP,RIG,RDS.A,EEM,RIO
2016-01-05
Mkt_Cap:LLY,BABA,WMT,NTT,CHL|TOT,RDS.B,PTR,SAN
Price:SWHC,RGR,SHI,FLIR,GPRO|STRZB,CHK-D,BCOM,XLRN,EPE
Volume:ARRS,SPY,QQQ,XIV,SUNE,GPRO,SWHC,FCS,GDX

jsoner.sh scans the above and converts them to properly-formatted Cypher queries, ensuring the stocks fall into the the right heading for the appropriate day. So, let's run jsoner on 2016-01-04 and re-enter into the database the first corrected day's data:

Good. Let's verify those Top 5s are in and under the correct day:


And, you see at the bottom of the screen, I visually verified that the volume category is also now correct with the its appropriate date. It was incorrect before, showing 2016-01-05, and this was the root cause that lead to the data corruption before. Now we see that it is correct. YAY!

Okay, now let's reenter the corrected second day's data, that is the data for 2015-01-05 now:


And, we verify that the new data are in place, properly factored into the right days:


And so they are.

Database fully restored from a backup, ready, again, to receive the latest daily Top5s securities from the stock market.


We are back in business. YAY!

No comments:

Post a Comment