2014 March Magazine - page 4-5

Page 4

Page 5

One of the most pressing issues that is not

well addressed in cloud storage today is

creating distributed (LAN or WAN) block

stores that are suitable for use with virtual

machines. There are some unique challenges

here, including the I/O patterns of typical

virtual machines (heavy random small I/O

workloads that also often require write-order

durability and sync-write mechanics).

There are various open-market solutions for

distributed stores (Hadoop, many others) that

are ideally suited for large datasets, but do not

perform adequately for virtual machine use.

Additionally, VM installations for cloud hosting

also often demand copy-on-write and zero-

copy cloning support, features which are not

adequately represented in the market solutions.

Some things to look at are the Google File Sys-

tem (GFS) and things like DRBD. The real prob-

lem to solve though: how to put VMs on com-

modity hardware at a low cost, such that failure

of individual nodes does not impact the VM.

I believe the biggest areas in cloud com-

puting concepts will be in the big data

realm. There are also very many interest-

ing problems if you are looking to create a

startup. Those problems are largely about

implementation and how to move existing

enterprises to newer models of computing.

On the Big Data side, the areas I find interest-

ing are about how you combine differing

data sets in meaningful ways. A requirement

in going forward will be determining effec-

tive and efficient methods for using existing

data for big data activities, without impact-

ing the way it is stored and used already.

One of the things that the NoSQL movement

(and unstructured data in general) has brought

is the dependence on the application to sup-

ply context and meaning to that data. That

context and meaning is not well captured

in a way that can be transferred to other

potential users of the data. In a relational

data model, you can find much meaning

inherent in the data structure itself.

Other aspects are about what types of data

can be meaningfully combined and which

cannot. I see many examples of statistically

irrelevant data being used together. It pro-

duces a nice dashboard or infographic, but

ultimately is meaningless for real analysis

because the data doesn’t actually mesh in the

ways proposed. It may be possible to develop

models to limit this or at least provide some

information on what data can show causal-

ity and which can show merely correlation.

You mentioned a particular interest in PaaS

and IaaS. The platform side is rich for explor-

ing the above

concepts, as well

as creating more

efficient methods

and algorithms for

combining multiple

dispersed data sets and sources. Ifttt.com is a

good example of this with popular social sites,

but I would like to see the ability to combine

any platform with any other more easily.

On the IaaS side, one of the major hurdles I

hear is about hypervisor insecurities. There

are methods to hack across the hypervisor

to gain control to another VM running on

the same hypervisor. Current techniques

are not adequate to prevent or expose this

behavior. Another area I would like to see is

truly distributed computing along the lines

of every PC, laptop, server, and device in an

organization, being able to power parts of

total computation for that organization. Every

processing task

could be processed

as one pushed

to the organiza-

tion, effectively

making the en-

tire office a large super computer.

There is much to be done with algo-

rithms, computer science, and ap-

plications for this to go smoothly.

What Are The Hot Topics In Cloud

Data Management & Cloud

Computing Research?

“The Cloud is an

efficiency and scale

game changer....”

SEO Version

Warning.

You are currently viewing the SEO version of !text.
It has a number of design and functionality limitations.

We recommend viewing the Flash version or the basic HTML version of this publication.

1,2-3 6-7,8-9,10-11,12-13,14-15,16-17,18-19,20