Showing all posts tagged hadoop:

Techie Bytes

Hadooping Around!! :)

Are you in dilemma or as confused as me trying to understand the Hadoop framework and the evolving tool stacks associated with Hadoop? Then try this blog link which explains different processing frameworks of Hadoop and things you need to understand when choosing the framework for your use case - "http://radar.oreilly.com/2015/02/processing-frameworks-for-hadoop.html".

DW Architecture Tech Byte

The Ongoing challenge of Enterprise Analytics/Data warehouse applications is Budget and agility to adapt the evolving sources of data. An excellent article from an IBM architect/Consultant, Explaining the significance of Adaptive Architecture w.r.t Data warehouse applications ( http://ibmdatamag.com/2015/02/what-is-a-data-warehouse-after-all/)

H Learning - for EDW Professionals

Data is imperative to any business, Which is why we are in business today. The question now for us is How can we keep up with traditional Enterprise Information management landscape and equip ourselves personally with the evolving Data science and Big Data technology stack. Often the expectation from our customers in this competitive industry, is to be on top of the technological evolution and keep abridged with our latest industry trends. This factor also helps Organizations to form CoEs, forums etc which will contribute in developing the niche skills and capabilities, which will indirectly benefit our solutions we provide to existing customers and to win new businesses/deals.

This factor and expectation drives the interest to learn more about these latest technological trends and potential impact on traditional DW environments. I have been following passively by reading some blogs, forums etc to boost my understanding about the latest Big Data/Hadoop trends.

You will find below my favorite links and blogs I liked while exploring. I intend to continue on this journey as and when I found interesting piece of wisdom/information.

Hadoop in 5 minutes: A inspiring blog by MapR CEO explaining the top use cases of Hadoop (Like to Mention first use case is Aadhaar project - adhaar is providing a unique identifier for every resident of India, so that's 1.2 billion residents.).

https://www.mapr.com/blog/hadoop-5-minutes-or-less#.U_d3XJRdXrw

Hadoop 101 for EDW Professionals : Ralph Kimball explains how Hadoop can be both a destination data warehouse, and also an efficient staging and ETL source for an existing data warehouse.

http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/building-a-hadoop-data-warehouse--hadoop-101-for-enterprise-data-slides.html

MapR Sandbox

Following IBM BigInsights Quick edition and HortonWorks Sandbox the personal Hadoop environments to educate /train the customers and developers which I have came across, It's MapR's turn to release the MapR customized Sandbox Hadoop environment for developers.More details about this can be found in the MapR blog.

SQL-On-Hadoop

There can't be a better time, to bump into this blog about the topic "SQL-On-Hadoop". As you might have noticed, My previous blog is about the same genre, comparing the SQL on Hadoop solutions provided by few key vendors on the Big-Data/Hadoop space (Big SQL, Hive, Impala etc). This blog gives details about a lot more products, which are offering SQL-On-Hadoop capabilities (Some of them I haven't even heard before! But nice to know) which are showcased in Strata Conference this year.

Continuing my journey with Hadoop

Lately with the explosion and availability of more Big Data tools, most of the DB vendors started providing support/connectors dedicated to connecting Hadoop clusters. Also packaged Hadoop distributions from vendors like Cloudera, IBM etc are integrating SQL engines along with Hadoop distributions. Here are few examples, I have come across:1) SQLH connector provided by Teradata - This connector is used to bring data from Hadoop Cluster into the traditional Teradata based Datawarehouse applications. 2) gNet for Hadoop (supported by Greenplum) : A Connector for Hadoop environments which enables to exchange data between DB and Hadoop clusters, Provides Direct Query interoperability between Hadoop Nodes and GP DB nodes and supports Conversion of Custom format data (Pig, Hive etc) into GPDB format via apReduce which can be imported into GP.3) Big Blue's BigSQL - which is little bit different from the above connectors which is just a SQL interface to query the Hadoop cluster. We can plug in any front end reporting/analytics with the support of BigSQL to the Hadoop cluster directly. 4) Cloudera Impala - Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Cloudera Impala query UI in Hue) as Apache Hive. This provides a familiar and unified platform for real-time or batch-oriented queries. (Well, if you are confused about the difference between Impala and Hive, please refer the link:-http://vision.cloudera.com/impala-v-hive/(Above list is not complete and to be explored). While reading about these updates, blogs, articles etc online related to Hadoop and Hadoop related support for traditional RDBMS, I have come across this How-to blog explaining the integration details about interfacing data between Haddop distribution and Teradata.http://hortonworks.com/blog/round-trip-data-enrichment-teradata-hadoop/I felt exactly the same like the author of this blog in the below point."As a data integrator who is familiar with RDBMS systems and is new to the Hadoop platform, I was looking for a simple way (i.e. “SQL-way") to exchange data with Teradata. Fortunately, it was just a matter identifying the tools and connecting the dots."Being installed Hortonworks Sandbox in my personal laptop, I am looking forward to recreating these steps. But that might require Teradata environment which I need to set up yet (not sure whether free version for personal learning is available).Also here is interesting discussion videos comparing Hadoop landscape provided by IBM's Big-Insights with other vendors. Teradata compared with Big-Insights:http://www.ibmbigdatahub.com/video/hadoop-competitive-landscape-teradata-compared-ibm-biginsightsCloudera Compared with Big-Insights:http://www.ibmbigdatahub.com/video/hadoop-competitive-landscape-cloudera-compared-ibm-biginsightsLet me know if you have any interesting updates/info, experiences to share about these. It seems SQL has got the green card to stay alove in the Hadoop Landscape because of the importance to support traditional Analytical tools along with Big-Data Analytics.

Starting my journey on Big Data and Hadoop

We all might have noticed(atleast the EIM members) that the Hadoop/Big Data has become the talk of the town. While we come across all these jargon, first thing is anxiety and confusion with all these terms. I am sure some of you might have been intrigued by the below questions like me:

How am i going to use it with my data warehouse?
How it is going to enhance the analytical applications in my enterprise or client place?
How can I breakdown the unstructured data using Hadoop for analytical needs?

The key to these questions is familiarizing ourselves with these Big Data technologies (Hadoop and Mapreduce).

There are lots of open source tools and Commercial Hadoop distributions offered by different vendors which are available as free downloads online. To name a few : Cloudera, HortonWorks, IBM BigInsights etc are famous Hadoop Distributions which I came across.

Recently I have downloaded Sandbox (Personal Hadoop Distribution) provided by Hortonworks and installed in my personal laptop and started exploring.
[Tip: By default, Virtualization is disabled in our laptops. Depending on whether the processor supports virtualization, I would recommend enabling this feature before trying to install Personal Hadoop distributions in your laptop.]

HortonWorks Sandbox - http://hortonworks.com/products/hortonworks-sandbox/
IBM Big Insights Quick Start edition - http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/

We certainly need a much clear picture and might required to explore a lot online about the landscape of all the tools, capabilities, purposes of each tools. Would like to share a nice link which I have came across which provides good picture about the Big-Data tools.
http://www.bigdatalandscape.com/
http://www.kernel-labs.org/node/2

You can also start your e-learning using the free course offered in the site ==> bigdatauniversity.com

For those of you who are familiar/well versed with Big Data/Hadoop related technologies, here is a interesting discussion about Cloudera vs IBM Bigsights - http://www.youtube.com/watch?v=kQVMTmUgCj0

Happy Learning, Appreciate your comments, feedback and valuable experiences!