Page 4-5 - Magazine.indt

Basic HTML Version

Page 4
Page 5
This problem has been compounded by the emergence of web 2.0 technologies whose legion of
loyal fans who can number into the millions generate copious amounts of data every minute, and
by the time you realize it you have gigabytes and terabytes of data in one single day. Obviously,
this calls for very radical departures from the current state of the art for data storage and mining
technologies.
While traditional IT houses not of the web 2.0 stripe may not face this sort of real estate issues when
it comes to data storage, mining that data for meaningful intelligence is still a work in progress and
a major headache no matter what the size
of your Data Warehouse. So while you may
not want to be on the bleeding edge and opt
for a grid based MPP solution for your ever
increasing storage needs, you will certainly
want to take a serious look at the emerging
Algorithm and Heuristics driven data mining
techniques led by Map/Reduce.
Map/Reduce may yet be your killer app that
can be the panacea for all your Business
Intelligence ailments. This is very serious
stuff. If Google has bet its house on it and
has made this the foundation for their search
technology, then you better believe that this
is very strong medicine.
Using traditional relational database technology to cater to your Big Data data warehousing (DW)
needs is nowquitewell known. It is not easy performing operations between databases, especially if
they span networks. Try performing a join between two database instances and you will knowwhat
I am talking about. To solve these issues, there are custom solutions from vendors like Teradata and
Netezza. The barrier for entry is still quite high in adopting these systems, however, both in terms of
license fees and setup and maintenance costs.
There is an alternative. We are now in the era of framework-based DW, DIY DW and
DW in the Cloud. The current set of tools and technologies that have emerged have
helped democratize this domain which was for long the exclusive preserve of a few
select vendors. The revolution was led by grid-based implementations adopted by
the leading players like Google (Bigtable), Facebook (Cassandra) and Yahoo (Hadoop).
Hadoophas emerged as one of themost popularMap/Reducebasedopen source frameworks for Big
Data and several Information majors have adopted this technology. Beware that this is a framework
and may need significant amounts of customization and programming to get it to do what you
want. If Hadoop is not your cup of tea, then there are similar implementations like AsterData and
GreenPlum which work on the same concepts but can get you up and running very quickly with
their own abstractions libraries like SQL-MR and intelligent dashboards for easy configuration and
maintenance. Another very appealing feature of these offerings is their ability to be hosted in a
Cloud so all your advanced analytic needs can be performed off premises.
Speaking in a broad sense, there are three general flavors to choose from
when it comes to Big Data solutions:
Custom build BigData frameworks like Teradata and VLDB implementations from Oracle that are
proprietary frameworks designed to deal with large datasets.
These frameworks are still very relational in orientation and are not designed to work with
unstructured data sets.
DataWarehouse Appliances like Oracle’s Exadata. This introduces the concept of DW-in-a-box where
the entire framework needed for a typical DW implementation (the Hardware, Software Framework
in terms of data store and Advanced Analytical tools) are all vertically integrated and provided by
the same vendor as a packaged solution.
Open Source NoSQL-oriented Big Data Frameworks such as Hadoop and Cassandra. These
frameworks implement advanced analytical and mining algorithms such as Map/Reduce and are
designed to be installed on commodity hardware for an MPP architecture with huge Master/Slave
clusters. They are very good at dealingwith vast amounts of unstructured, text-oriented information.
Commercial Big Data Frameworks like AsterData and GreenPlum, which follow the same paradigm
of MPP infrastructures but have implemented their own add-ons such as SQL-MR and other
optimizations for faster analytics.
Article Source: http://EzineArticles.com/5894361
Roopal is an Online Marketing Professional from an IT Services Company, and writes blog, content, and articles.
She writes marketing col-laterals and advice to visit her web page for your concerns regarding Business Intelligence (BI) services.