Monday, February 15, 2016

Plan C: Fly in to the Danger Zone, or: Backup Restore

Okay, you are making and archiving daily snapshots of your database. Great! You are lightyears ahead of most small businesses.

But do you know what you've got? The first step of data security is your backup, the second step is the restore of a backup. You don't know if you have your data safely saved off if you don't know you can restore from that save to full operability, so, test that, and get that assurance, now, when you're not out of the frying pan and into the fire, but now, when things are sailing smoothly at an even keel.

Yes, I am the master of the metaphor, so much so that I am actually a `pataphorist.


Let's do this.

You have your backup. You have two ways to test that restore will work when it needs to:

1. blow away your production data and restore that.

That, right there, is your 100%-guarantee for that snapshot, I tell you what. And you can do that, because one day, you're gonna hafta.

(dos words, doe)

2. In a lower environment, restore the snapshots and do your quality assurance tests there.

Most people will pick option 2. Perfectly fine. It's not 100%, but it does give you the confidence that you can restore your snapshot.

We'll go over option 2. in brief, because I'm actually going to do option 1 in this essay: I'm going to blow away my production database and restore a snapshot there. Why? Because I'm going to do this article ... LIKE A BOSS!

Backup Restore Locally

So, option 2: have on your local system/laptop the same version of neo4j that is running your production data on GrapheneDB DaaS.

Go into your neo4j data directory and just blow away graph.db/ there

geophf:1HaskellADay geophf$ ls neo4j-community-2.3.0/data
README.txt dbms graph.db import log
geophf:1HaskellADay geophf$ rm -rf neo4j-community-2.3.0/data/graph.db/
geophf:1HaskellADay geophf$ ls neo4j-community-2.3.0/data
README.txt dbms import log

Next, localize your snapshot and restore it to your neo4j/data directory

geophf:1HaskellADay geophf$ ls -l ~/Desktop/logical-graphs/backups/
total 680
-rw-r-----@ 1 geophf  staff  344810 Feb 14 21:34
geophf:1HaskellADay geophf$ cp ~/Desktop/logical-graphs/backups/ neo4j-community-2.3.0/data
geophf:1HaskellADay geophf$ cd neo4j-community-2.3.0/data/
geophf:data geophf$ mkdir graph.db
geophf:data geophf$ mv graph.db/
geophf:data geophf$ cd graph.db/
geophf:graph.db geophf$ unzip 
... unzips database ...

now remove the copy of your snapshot.

geophf:graph.db geophf$ rm 

Now start up your instance and check to see your data is restored from your snapshot.

geophf:1HaskellADay geophf$ ./neo4j-community-2.3.0/bin/neo4j start

And the Cypher query in the web-client shows, yes,
a full restore:

Okay, that was a nice, cool restore of a snapshot in a nice, safe place: locally.

Backup Restore in Production

Let's do this in production now.

And how we do this is to fly right into the danger zone. Go to the Admin tab of your GrapheneDB instance and select "Empty Database":

You'll get this challenge.

Challenge: accepted.

Once the database is emptied, it takes you to the overview page which shows you your database is now, indeed, empty:

Okay, we've emptied our database (another way to go about this is to delete your database and then to create a new one, but then you have to worry about settings, such as security configuration and connection information. I just went with the "Empty Database"-option), our next step is to restore the snapshot.

Let's do that. Return to the admin tab, and go back into the Danger Zone, this time choose the "Restore database"-option. Once selected, you're challenged to give a snapshot (and instructions on how to create one from your local database if you don't have a snapshot already). The location can be either local or from cloud storage, e.g. from AWS S3, that is if your GrapheneDB user has access to these services (something you have to work with your security to verify). I have a local copy, so I provide that. Doing that, select "Restore."

Several modal dialogs flash in rapid succession to notify you of progress in the restore process, then you get the following message:

Great! Again, let's verify. The overview page, once refreshed (so, hint: don't panic; you need to refresh manually here!), shows a populated database:

And we can choose to dig deeply as we need to in order to verify our data is restored from our snapshot:

Yay! Happy ending! And in Production, too.

Plan C

That was painless.

But what if it weren't? What if you restored and the snapshot you have is corrupted? Do you have an alternative approach if the restore doesn't work?

I do. I have the source data that I derived from and an ETL process that rebuilds my graph in under 1 minute, and I tested that out yesterday. That's why I had the confidence to take this approach.

Backup restore is a very simple process in general, and here, specifically so, because of the ease at which the GrapheneDB DaaS walks you through the process, holding your hand the whole way.

This is all the more reason for you to do due diligence on your side. If you do a backup restore locally first, you have confidence you can do it on the cloud on your production system (Plan B). Further I had a Plan C of knowing I could rebuild the graph from original sources if I have to.

Always have a Plan B, but if Plan B fails, have Plan C ready, just in case. Nothing like that confidence to proceed with operations like these with production data.

No comments:

Post a Comment