Apache HBase

HBase is a distributed, column-oriented database that supports structured data storage for large tables. The data model used in HBase is very similar to Google’s Bigtable. HBase also implements the MapReduce framework.

HBase is essentially a structured key-value store where you can set variable-length columns to an arbitrary number of rows. You can think of it as a key/value store with the ability to run complex sorted queries against subsets of data stored in HBase.

HBase is built on top of HDFS and uses Zookeeper for coordinating processes across a distributed system.

Why HBase?

  • HBase is highly scalable. You can have billions of rows in a single HBase table. NoSQL databases are great for ad-hoc queries since it allows random reads. This means that you could select a specific record for a patient or select a specific employee to find out their salary information.
  • You can set variable-length columns to an arbitrary number of rows. This is very useful if you want to identify individual records using a unique ID and have a lot of information about that individual, such as comments or likes, etc.
  • You can run complex queries against subsets of data stored in HBase.
  • HBase is more predictable than other NoSQL databases since it follows the MapReduce framework.
  • HBase is cost-effective since it is built on top of HDFS and uses a Zookeeper.
  • HBase is also great for data warehousing since it allows for random reads and writes. This means that you could update the salary column for a specific employee and add a new column to the table