Aws neptune going ga the good, the bad, and the ugly for graph database users and vendors zdnet data recovery xfs

AWS Neptune was sharp in its appointment with general availability (GA). While speculation was haywire, AWS sources were very specific about Neptune going GA in late May. So, now that it’s here, how much impact will Neptune have for users, and will it ‘ Amazon’ the graph database landscape? The good: Neptune tries to get the best of two worlds, looks production ready

Even before the announcement, AWS emphasized two key points: Neptune would enable users to seamlessly go from proof of concept to production, and there was great interest by many major clients. Its people were confident about the GA timeplan and how AWS clients using Neptune pre-GA would be able to go to production upon GA.

The names included in yesterday’s press release do not disappoint: Samsung Electronics, Pearson, Intuit, Siemens, AstraZeneca, FINRA, LifeOmic, Blackfynn, and Amazon Alexa.


Their use cases range from fraud detection to medical research, and AWS says that was precisely what drove Neptune’s development and is reflected in Neptune’s profile.

RDF and its query language SPARQL on the other hand offer other benefits, according to AWS. Most prominently they lend themselves well to data exchange and integration scenarios. By enabling users to integrate and ingest datasets such as Wikidata or life sciences data, RDF can help them bootstrap their applications and exchange data.

Neptune’s primary goal at this point, however, is high availability and durability, meaning up to 100 billion nodes / edges / triples, while automatically replicating six copies of data across three Availability Zones (AZs) and continuously backing up data to S3. AWS says Neptune is ACID-compliant both in SPARQL and Gremlin, offering repeatable reads that are up to date across AZs within 10 milliseconds.

AWS also says Neptune is designed to offer greater than 99.99 percent availability and automatically detects and recovers from most database failures in less than 30 seconds. Neptune also provides advanced security capabilities, including network security through Amazon Virtual Private Cloud (VPC) and encryption at rest using AWS Key Management Service (KMS).

That array of features puts Neptune in the same league as Microsoft CosmosDB in terms of high availability in the cloud. There are differences too, though — most notably the fact that CosmosDB is multi-model, while Neptune is exclusively a graph database, albeit a dual one. CosmosDB has more APIs besides graph, while Neptune has two different graph APIs. The bad: Neptune is missing features

Although Neptune has many things going for it, it’s not perfect. To begin with, at first sight, the dual RDF-PG model looks like a great asset compared to other graph databases, but it may not be as good as it looks. You can’t really use both interchangeably; data has to be ingested and queried either as PG or as RDF.

That should not come as a surprise, as bridging the two models is anything but trivial. AWS wants to pursue a unified view over RDF and PG, but that’s quite hard, and we do not expect to see it anytime soon. So, while having two graph databases for the price of one looks attractive, it becomes less attractive if you have to ETL data from one to the other to use them.

While Neptune has tools for ingesting data in CSV, RDF, and GraphML, these are only for static files. AWS says you can also use DynamoDB streams for dynamic data import, but you will have to write the ingestion code for this yourself. Same for exporting data — possible via SPARQL and Gremlin, but not very convenient in lack of a tool for this.

RDF inference is also missing. Inference is the ability to process rules, typically expressed in RDFS or OWL variants for RDF. These rules can be used to declare schema, including classes, inheritance, types, and restrictions for nodes, edges, and properties, effectively adding data in the database.

AWS has chosen not to include RDF inference in Neptune, citing its impact on scalability. AWS notes, however, it’s looking into adding RDFS support in the future. Doing so would enable data structure type validation, and type subsumption via query rewriting. For the time being, if you want support for those, you will have to use a reasoner engine in addition to Neptune.

And if you want to apply advanced analytics to your graph, utilizing solutions such as Spark or GraphX, you will have to find a way to integrate and move that data around yourself, too. Again, AWS says it is looking into ways of adding this, considering client needs.

Finally, Neptune is also lacking when it comes to visualization, which is an important feature for querying and exploring graphs. While Neptune does offer visualization via partnerships, these do not come out of the box and incur additional cost. So if you want to formulate queries or navigate results visually, you will have to turn to one of AWS’s partners for this. The ugly: Standing up to AWS

But how do other graph database vendors measure up against Neptune? And what can they do to avoid being ‘Amazoned’? That must be going through a lot of people’s minds for a while now. A while back, we had a discussion with the CEO and founder of Neo4j, Emil Eifrem. One of the things we talked about was exactly this.*

As Neo4j is the No. 1 graph database in terms of mindshare and adoption, Eifrem confessed to having done a lot of soul searching on this. His conclusions may be of interest not just for the graph database community, but beyond that as well. So, how does one stand up to a mega-cloud vendor entering their domain?

• Focus. Here, Eifrem quotes Daniel Ek, CEO of Spotify, when saying that — for companies like Amazon, Apple, and Google — music is a hobby, while for Spotify it is its core business. Similarly, says Eifrem, graph is going to be a hobby for AWS, so "shame on me if that does not result in a better product".

• Pervasiveness. Eifrem acknowledges that it’s hard to make the transition from an on-premise to a cloud company. But the flip side of that, he says, is that there is value in telling CIOs "we run on all clouds, and on your laptop, and on your data center."

• Vertical integration. Here, Eifrem refers to Neo4j’s graph platform strategy, claiming that Neo4j is moving up the stack and is becoming more than a database. Neo4j is the Oracle and the Tableau of graph databases, and a much richer offering, he says.

banner