High-level Architecture

Grokking the Advanced System Design Interview

Ask Author

Back to course home

0% completed

This lesson gives a brief overview of HDFS’s architecture.

HDFS architecture

All files stored in HDFS are broken into multiple fixed-size blocks, where each block is 128 megabytes in size by default (configurable on a per-file basis). Each file stored in HDFS consists of two parts: the actual file data and the metadata, i.e., how many block parts the file has, their locations and the total file size, etc. HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data.

HDFS high-level architecture

All blocks of a file are of the same size except the last one.
HDFS uses large block sizes because it is designed to store extremely large files to enable MapReduce jobs to process them efficiently.
Each block is identified by a unique 64-bit ID called BlockID.
All read/write operations in HDFS operate at the block level.
DataNodes store each block in a separate file on the local file system and provide read/write access.
When a DataNode starts up, it scans through its local file system and sends the list of hosted data blocks (called BlockReport) to the NameNode.
The NameNode maintains two on-disk data structures to store the file system's state: an FsImage file and an EditLog. FsImage is a checkpoint of the file system metadata at some point in time, while the EditLog is a log of all of the file system metadata transactions since the image file was last created. These two files help NameNode to recover from failure.
User applications interact with HDFS through its client. HDFS Client interacts with NameNode for metadata, but all data transfers happen directly between the client and DataNodes.
To achieve high-availability, HDFS creates multiple copies of the data and distributes them on nodes throughout the cluster.

HDFS block replication

Comparison between GFS and HDFS

HDFS architecture is similar to GFS, although there are differences in the terminology. Here is the comparison between the two file systems:

To minimize latency, pipelining and streaming are used over TCP for data transfer.

</td> <td class="tg-rmb8">RPC over TCP is used for communication with the NameNode.

For data transfer, pipelining and streaming are used over TCP.

</td> </tr> <tr> <td class="tg-1yw4l">Cache Management</td> <td class="tg-yw4l">Clients cache metadata.

Client or ChunkServer does not cache file data.

ChunkServers rely on the buffer cache in Linux to maintain frequently accessed data in memory.

</td> <td class="tg-yw4l">HDFS uses distributed cache.

User-specified paths are cached explicitly in the DataNode's memory in an off-heap block cache.

The cache could be private (belonging to one user) or public (belonging to all the users of the same node).

</td> </tr> <tr> <td class="tg-1rmb8">Replication Strategy</td> <td class="tg-rmb8">Chunk replicas are spread across the racks. Master automatically replicates the chunks.

By default, three copies of each chunk are stored. User can specify a different replication factor.

The master re-replicates a chunk replica as soon as the number of available replicas falls below a user-specified number.

</td> <td class="tg-rmb8">The HDFS has an automatic rack-ware replication system.

By default, two copies of each block are stored at two different DataNodes in the same rack, and a third copy is stored on a Data Node in a different rack (for better reliability).

User can specify a different replication factor.

</td> </tr> <tr> <td class="tg-1yw4l">File system Namespace</td> <td class="tg-yw4l">Files are organized hierarchically in directories and identified by pathnames.</td> <td class="tg-yw4l">HDFS supports a traditional hierarchical file organization. Users or applications can create directories to store files inside.

HDFS also supports third-party file systems such as Amazon Simple Storage Service (S3) and Cloud Store.

</td> </tr> <tr> <td class="tg-1rmb8">Database</td> <td class="tg-rmb8">Bigtable uses GFS as its storage engine.</td> <td class="tg-rmb8">HBase uses HDFS as its storage engine.</td> </tr> </tbody></table>

Mark as Completed