Continuing my journey with Hadoop

Published by Soma

February 20th, 2014

Lately with the explosion and availability of more Big Data tools, most of the DB vendors started providing support/connectors dedicated to connecting Hadoop clusters. Also packaged Hadoop distributions from vendors like Cloudera, IBM etc are integrating SQL engines along with Hadoop distributions. Here are few examples, I have come across:1) SQLH connector provided by Teradata - This connector is used to bring data from Hadoop Cluster into the traditional Teradata based Datawarehouse applications. 2) gNet for Hadoop (supported by Greenplum) : A Connector for Hadoop environments which enables to exchange data between DB and Hadoop clusters, Provides Direct Query interoperability between Hadoop Nodes and GP DB nodes and supports Conversion of Custom format data (Pig, Hive etc) into GPDB format via apReduce which can be imported into GP.3) Big Blue's BigSQL - which is little bit different from the above connectors which is just a SQL interface to query the Hadoop cluster. We can plug in any front end reporting/analytics with the support of BigSQL to the Hadoop cluster directly. 4) Cloudera Impala - Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Cloudera Impala query UI in Hue) as Apache Hive. This provides a familiar and unified platform for real-time or batch-oriented queries. (Well, if you are confused about the difference between Impala and Hive, please refer the link:-http://vision.cloudera.com/impala-v-hive/(Above list is not complete and to be explored). While reading about these updates, blogs, articles etc online related to Hadoop and Hadoop related support for traditional RDBMS, I have come across this How-to blog explaining the integration details about interfacing data between Haddop distribution and Teradata.http://hortonworks.com/blog/round-trip-data-enrichment-teradata-hadoop/I felt exactly the same like the author of this blog in the below point."As a data integrator who is familiar with RDBMS systems and is new to the Hadoop platform, I was looking for a simple way (i.e. “SQL-way") to exchange data with Teradata. Fortunately, it was just a matter identifying the tools and connecting the dots."Being installed Hortonworks Sandbox in my personal laptop, I am looking forward to recreating these steps. But that might require Teradata environment which I need to set up yet (not sure whether free version for personal learning is available).Also here is interesting discussion videos comparing Hadoop landscape provided by IBM's Big-Insights with other vendors. Teradata compared with Big-Insights:http://www.ibmbigdatahub.com/video/hadoop-competitive-landscape-teradata-compared-ibm-biginsightsCloudera Compared with Big-Insights:http://www.ibmbigdatahub.com/video/hadoop-competitive-landscape-cloudera-compared-ibm-biginsightsLet me know if you have any interesting updates/info, experiences to share about these. It seems SQL has got the green card to stay alove in the Hadoop Landscape because of the importance to support traditional Analytical tools along with Big-Data Analytics.

Hadoop, Teradata, Sandbox

My Random notes n thoughts

Continuing my journey with Hadoop

Soma

Latest Posts

My Random notes n thoughts

Soma

Latest Posts

Tag Cloud