Apache Cassandra is a NoSQL database management system that is used to handle large amounts of data across a variety of servers. Providing high availability with no single point of failure, Cassandra offers support for clusters that extend to a multitude of datacenters. A Java-based system, Cassandra can thus be managed and monitored via Java Management Extensions (JMX). Cassandra’s masterless application allows for low latency operations.
Why Cassandra?
· Data is automatically replicated to multiple nodes for fault tolerance
· Data replication across different data centers available
· Hadoop integration
· MapReduce, Apache Pig, and Apache Hive support available
· Cassandra Query Language (CQL) is a simple interface for accessing Cassandra
· CQL uses an abstraction layer to hide implementation details
· Language drivers available for Java, Python and Node.JS etc
· Masterless architecture and low latency ensure no data loss
· Failed nodes can easily be replaced
· Consumers have the option to choose between synchronous and asynchronous replication
· Optimized operations due to features such as Hinted Handoff and Read Repair
· Security maintained via the audit logging feature
· fqltool command allows for workload analysis
· Data in flight is transferred securely and is not compromised due to SSL encryption
· Encryption for client to node and node to node is independently configured
· It allows for linear scalability
· If data model is designed correctly, answers are retrieved efficiently
· Features like data compression, Cassandra Query Language (CQL) and tunable consistency are provided.