2014 March Magazine - page 4-5

Page 4
Page 5
One of the most pressing issues that is not
well addressed in cloud storage today is
creating distributed (LAN or WAN) block
stores that are suitable for use with virtual
machines. There are some unique challenges
here, including the I/O patterns of typical
virtual machines (heavy random small I/O
workloads that also often require write-order
durability and sync-write mechanics).
There are various open-market solutions for
distributed stores (Hadoop, many others) that
are ideally suited for large datasets, but do not
perform adequately for virtual machine use.
Additionally, VM installations for cloud hosting
also often demand copy-on-write and zero-
copy cloning support, features which are not
adequately represented in the market solutions.
Some things to look at are the Google File Sys-
tem (GFS) and things like DRBD. The real prob-
lem to solve though: how to put VMs on com-
modity hardware at a low cost, such that failure
of individual nodes does not impact the VM.
I believe the biggest areas in cloud com-
puting concepts will be in the big data
realm. There are also very many interest-
ing problems if you are looking to create a
startup. Those problems are largely about
implementation and how to move existing
enterprises to newer models of computing.
On the Big Data side, the areas I find interest-
ing are about how you combine differing
data sets in meaningful ways. A requirement
in going forward will be determining effec-
tive and efficient methods for using existing
data for big data activities, without impact-
ing the way it is stored and used already.
One of the things that the NoSQL movement
(and unstructured data in general) has brought
is the dependence on the application to sup-
ply context and meaning to that data. That
context and meaning is not well captured
in a way that can be transferred to other
potential users of the data. In a relational
data model, you can find much meaning
inherent in the data structure itself.
Other aspects are about what types of data
can be meaningfully combined and which
cannot. I see many examples of statistically
irrelevant data being used together. It pro-
duces a nice dashboard or infographic, but
ultimately is meaningless for real analysis
because the data doesn’t actually mesh in the
ways proposed. It may be possible to develop
models to limit this or at least provide some
information on what data can show causal-
ity and which can show merely correlation.
You mentioned a particular interest in PaaS
and IaaS. The platform side is rich for explor-
ing the above
concepts, as well
as creating more
efficient methods
and algorithms for
combining multiple
dispersed data sets and sources. Ifttt.com is a
good example of this with popular social sites,
but I would like to see the ability to combine
any platform with any other more easily.
On the IaaS side, one of the major hurdles I
hear is about hypervisor insecurities. There
are methods to hack across the hypervisor
to gain control to another VM running on
the same hypervisor. Current techniques
are not adequate to prevent or expose this
behavior. Another area I would like to see is
truly distributed computing along the lines
of every PC, laptop, server, and device in an
organization, being able to power parts of
total computation for that organization. Every
processing task
could be processed
as one pushed
to the organiza-
tion, effectively
making the en-
tire office a large super computer.
There is much to be done with algo-
rithms, computer science, and ap-
plications for this to go smoothly.
What Are The Hot Topics In Cloud
Data Management & Cloud
Computing Research?
“The Cloud is an
efficiency and scale
game changer....”
1,2-3 6-7,8-9,10-11,12-13,14-15,16-17,18-19,20
Powered by FlippingBook