GFS

Source - http://computer.howstuffworks.com/internet/basics/google-file-system2.htm

GFS - Google File System.

autonomic computing

, a concept in which computers are able to diagnose problems and solve them in real time without the need for human intervention. The challenge for the GFS team was to not only create an automatic monitoring system, but also to design it so that it could work across a huge network of computers.

The GFS team decided that users would have access to basic file commands. These include commands like open, create, read, write and close files. The team also included a couple of specialized commands: append and snapshot. They created the specialized commands based on Google's needs. Append allows clients to add information to an existing file without overwriting previously written data. Snapshot is a command that creates quick copy of a computer's contents.

Files on the GFS tend to be very large, usually in the multi-gigabyte (GB) range. Accessing and manipulating files that large would take up a lot of the network's bandwidth. Bandwidth is the capacity of a system to move data from one location to another. The GFS addresses this problem by breaking files up into chunks of 64 megabytes (MB) each. Every chunk receives a unique 64-bit identification number called a chunk handle. While the GFS can process smaller files, its developers didn't optimize the system for those kinds of tasks.

By requiring all the file chunks to be the same size, the GFS simplifies resource application. It's easy to see which computers in the system are near capacity and which are underused. It's also easy to port chunks from one resource to another to balance the workload across the system.

Google organized the GFS into clusters of computers. A cluster is simply a network of computers. Each cluster might contain hundreds or even thousands of machines. Within GFS clusters there are three kinds of entities: clients, master servers and chunkservers.

In the world of GFS, the term "client" refers to any entity that makes a file request. Requests can range from retrieving and manipulating existing files to creating new files on the system. Clients can be other computers or computer applications. You can think of clients as the customers of the GFS.

____________________________________________________

not good for files that need to be changed many times.

they care about throughput and not latency

multi gb files ar etypical

files are write once and append many times

results matching ""

    No results matching ""