Now a days we are getting a huge amount of data (terabytes to
petabytes), where we are facing a big problem in storing and processing such
huge amount of data. We call this huge amount of data as BIGDATA. Bigdata is the growing challenge that organization facing now a days.
To overcome from this problem Google has released white papers GFS (Google File System) and Map Reduce in 2000’s. Based on those papers “Doug Cutting” developed a new framework called HADOOP. This has got this name from the yellow colored Elephant Toy named Hadoop, with which Doug Cutting's son is used to play. Officially this was got released into the market on 15th Feb, 2011.
Hadoop is the Apache open source software framework. Hadoop includes number of components that were specifically designed to solve large scale distributed data storage, analysis and retrieval tasks. The following are the components/Ecosystems of Hadoop.
To overcome from this problem Google has released white papers GFS (Google File System) and Map Reduce in 2000’s. Based on those papers “Doug Cutting” developed a new framework called HADOOP. This has got this name from the yellow colored Elephant Toy named Hadoop, with which Doug Cutting's son is used to play. Officially this was got released into the market on 15th Feb, 2011.
Hadoop is the Apache open source software framework. Hadoop includes number of components that were specifically designed to solve large scale distributed data storage, analysis and retrieval tasks. The following are the components/Ecosystems of Hadoop.
HDFS
Map Reduce
Apache PIG
HIVE
SQOOP
HBASE etc...,
Among all the
above components HDFS (Hadoop Distributed File System) and Map Reduce are the two important components which we
have to know compulsorily. HDFS deals with storage where as Map Reduce deals
with processing. As Hadoop is designed for storage and processing these two
plays an important role.
Using Hadoop we can process three types of data, they are:
Volume: The volume of data that hadoop can store and process is very high.
Variety: Hadoop can process different varieties of data(Structured, semistructed and un structured).
Velocity: Hadoop can process huge amount of data with high velocity, i.e, very quickly.
Using Hadoop we can process three types of data, they are:
- Structured Data (Eg: RDBMS)
- Semi-structured Data (Eg: Excel Sheets)
- Unstructured Data (Eg: Normal Files)
Volume: The volume of data that hadoop can store and process is very high.
Variety: Hadoop can process different varieties of data(Structured, semistructed and un structured).
Velocity: Hadoop can process huge amount of data with high velocity, i.e, very quickly.