Sunday, September 06, 2009

Apache Hadoop!

Interesting name isn't it? The goals it accomplishes is as interesting as it's name. Heard of Map reduce, Big table, Casandra, parallel processing, high performance and grid computing, hadoop supports most of it.

Let's understand the terminologies,

MapReduce is a framework for computing certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a cluster. Computational processing can occur on data stored either in a filesystem or within a database.

BigTable is a, compressed high performance, and proprietary file system built on Google File System (GFS), Chubby Lock Service, and a few other Google programs; it is currently not distributed or used outside of Google, although Google offers access to it as part of their Google App Engine.

Cassandra is an open source distributed database management system. It was initially developed by Facebook for storing very large amounts of data.

Hoping you get the context...

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes these subprojects,

  • Hadoop Common: The common utilities that support the other Hadoop subprojects.
  • Avro: A data serialization system that provides dynamic integration with scripting languages.
  • Chukwa: A data collection system for managing large distributed systems.
  • HBase: A scalable, distributed database that supports structured data storage for large tables.
  • HDFS: A distributed file system that provides high throughput access to application data.
  • Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
  • MapReduce: A software framework for distributed processing of large data sets on compute clusters.
  • Pig: A high-level data-flow language and execution framework for parallel computation.
  • ZooKeeper: A high-performance coordination service for distributed applications.
Excited about the project, want to try developing a reasonable business case for a tech works...


No comments: