Apache Hadoop

HDFS is the Hadoop Distributed File System. The HDFS design has aimed to achieve reliability and high-throughput access to application data. One of the most important design goals is the minimized cost of handling system faults caused by planned or unplanned outages.

HDFS has a master/slave architecture, where the master node exposes the NameNode service that manages files and their metadata, and the slave nodes expose DataNode services that manage block storage.

Why HDFS?

HDFS is built to work on low-cost hardware
HDFS has a simple design so it’s easy for developers to code and reason about it
HDFS has built-in replication so it’s very reliable. It replicates every file 3 times across different slaves, so even if a slave node fails it can recover
HDFS is high-throughput
HDFS has good availability of data (if the DataNode is available)
HDFS provides block-level storage (i.e., storing data in large blocks that can be spread over different nodes)
HDFS has good performance
HDFS is highly configurable and can be used for various applications

Apache Hadoop

Company

Resources

Investor Relations

Products

Freeware

Demo

Solutions

Contact Us