Upgrading to Neo4J-2.3.2 database. ETL-in-action
We have the following etl.perl script:
geophf:1HaskellADay geophf$ cat ~/bin/etl.perl
#!/usr/bin/perl
$junque = <>;
while(<>) {
chomp;
$day = $_;
$one = <>;
chomp $one;
$two = <>;
chomp $two;
$three = <>;
chomp $three;
print `jsoner.sh $day "$one" "$two" "$three"`
}
which calls this shell script:
geophf:1HaskellADay geophf$ cat ~/bin/jsoner.sh
#!/bin/bash
JSON=`jsonify $1 "$2" "$3" "$4"`
# echo $JSON
jsoneronious.sh $CYPHERDB_USER $CYPHERDB_PASSWD $CYPHERDB_URL "$JSON" $1
# echo my curl command has $CYPHERDB_USER:$CYPHERDB_PASSWD and $CYPHERDB_URL
echo
echo
echo Saved $1 top 5s to GrapheneDB
We've created a new GrapheneDB DaaS as (initialization-step):
See? It's really empty (verification-step):
Okay, let's update the access key-set with our new database name, then let's do this.
[updates database access key-set]
Then we make sure our top5s data source is up-to-date: http://lpaste.net/raw/4714982275408723968 by checking the most-recent (last) date:
2016-02-12
Mkt_Cap:JPM,WFC,PTR,GE,ATVI|LMCB,MOG.B,RAI,LPLA
Price:GRPN,ICPT,TCK,AXL,AVP|MOG.B,LPLA,JW.B,LMCB,P
Volume:BAC,GE,CSCO,CHK,QQQ,FCX,ATVI,AAPL,P,C
Okay, let's run this thing!
geophf:1HaskellADay geophf$ etl.perl Seer/data/top5s.csv
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2801 100 125 100 2676 557 11935 --:--:-- --:--:-- --:--:-- 11946
HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Server: nginx
Date: Sun, 14 Feb 2016 06:33:58 GMT
Content-Type: application/json
Content-Length: 125
Connection: keep-alive
Access-Control-Allow-Origin: *
{"results":[{"columns":[],"data":[]},{"columns":[],"data":[]},{"columns":[],"data":[]},{"columns":[],"data":[]}],"errors":[]}
Saved 2016-02-12 top 5s to GrapheneDB
I didn't time it, but it took less than a minute to upload nine months of top5s data. Let's check our results:
Okay, great; let's do a specific query against $NFLX, on which I'm currently writing a case-study.
One last check. How much space did we take up in the new database? How much space do we have still available to us? Both for nodes (not so much a worry) and for storage:
As you can see, 9 months of data doesn't even scratch the surface of the developers' edition of the database. We have, oh, 500 or so years-worth of data before we need to start looking at other options. Sweetness.
BAM! The old database is therefore ready for decommissioning. We just have to remember now before we do the scrape to do a backup manually and confirm the new database is receiving new top5s information as we scrape it.
No comments:
Post a Comment