WHAT IS A GRID?                                                       12.09.02
Andrew Grimshaw CTO and Founder, Avaki Corporation                   GRIDtoday
==============================================================================

Andrew S. Grimshaw is Professor of Computer Science at the University of
Virginia and founder and CTO of Avaki Corporation.

His research interests include grid computing, high-performance parallel
computing, heterogeneous parallel computing, operating systems, and high-
performance parallel I/O.

He is the chief designer and architect of Mentat, Legion, and Avaki. Grimshaw
received his M.S. and Ph.D. from the University of Illinois at Urbana-
Champaign in 1986 and 1988 respectively.

"For over thirty years science fiction writers have spun yarns featuring
worldwide networks of interconnected computers that behave as a single entity.
Until recently such science fiction fantasies have been just that.
Technological changes are now occurring which may expand computational power
in the same way that the invention of desk top calculators and personal
computers did. In the near future computationally demanding applications will
no longer be executed primarily on supercomputers and single workstations
using local data sources. Instead enterprise-wide systems, and someday
nationwide systems, will be used that consist of workstations, vector
supercomputers, and parallel supercomputers connected by local and wide area
networks. Users will be presented the illusion of a single, very powerful
computer, rather than a collection of disparate machines. The system will
schedule application components on processors, manage data transfer, and
provide communication and synchronization in such a manner as dramatically
improve application performance. Further, boundaries between computers will be
invisible, as will the location of data and the failure of processors." [1]

Over eight years have passed since I was quoted saying this in Science.

The future is now; after almost a decade of research and development by the
Grid community we see Grids (then called metasystems [1]) being deployed
around the world both in academic settings, and more tellingly, for production
commercial use.

What exactly is a Grid?

It seems a bit odd to be discussing such a fundamental question at this stage
of the evolution of the Grid industry.

Nonetheless there seems to be a diversity of opinions.

While this may seem like an academic question, it is not.

How we answer to the question of "what is a grid?" defines what Grids will
become by shaping the dialogue, customer requirements and expectations, and
what software vendors (both commercial and government) produce.

First, let me explain that there are different classifications of grids in the
marketplace e.g. cluster grids, desktop grids, compute grids, data grids.

My definition is far more general encompassing all of these.

This is important because to realize their full potential, grids cannot be
limited to any one of these. To me there are two different perspectives one
can take on Grid, a hardware perspective, and end user perspective.

>From a hardware perspective a Grid is a collection of distributed resources
connected by a network, possibly at different sites and in different
organizations. Those resources may include terascale supercomputers,
instruments such as telescopes and microscopes, computer-controlled factory
floor tools, mid- level servers, desktop machines, laptops, PDA's, and even
someday devices such as video cameras, cell phones, and kitchen appliances.

What distinguishes these resources is that they have a network interface and
some software that grid-enables the device. Thus, one could say that from a
hardware perspective potential Grid resources range from toasters to
teraflops. One could argue that the above definition of Grid is what we used
to call a "distributed system." I do not dispute that it is what we used to
call a distributed system. To me Grids are the evolution of distributed
systems to a wide area, multi-organizational context. We can learn a lot from
how distributed systems have been built over the years. The physical
characteristics of the Grid determine to a large extent the challenges that
are faced when writing Grid software and applications.

The resources in a Grid typically share at least some of the following
characteristics:

-- They are numerous.

-- They are owned and managed by different, potentially mutually-distrustful
organizations and individuals that likely have different security policies and
practices.

-- The resources are potentially faulty.

-- They have different security requirements and policies.

-- They are heterogeneous, i.e., they have different CPU architectures, are
running different operating systems, and have different amounts of memory and
disk.

-- They are connected by heterogeneous, multilevel networks.

-- They have different resource management policies.

-- They are likely to be geographically-separated (on a campus, in an
enterprise, on a continent).

These characteristics conspire to give Grids their most important
characteristic they have the potential to become very complex, making
complexity management the number one priority of any Grid middleware system.

The reason is simple.

If the complexity of the underlying hardware resources is not managed by the
Grid middleware then the application developer will be fully exposed to all
the myriad things that can go wrong, vastly increasing the difficulty of
writing robust Grid applications, and thus raising the costs. From a user
perspective Grids are all about resource sharing and virtualization across
sites and organizations.

A Grid gathers together resources (e.g., CPU, data, and applications) and
makes them accessible in a secure manner to users and applications.

The objective of Grid middleware is to virtualize resources, provide access,
and in general deal with the physical characteristics of the Grid. For
example, in order to reduce complexity, Grid middleware should allow users and
applications to access Grid resources in a transparent manner, i.e., the user
does not need to know where the resource is physically located, the type of
machine it is on, that it may have failed and recovered, etc. The requirement
to virtualize and share a wide range of resources across sites and
organizations can be daunting.

The first and most important aspect of the problem is how do you name and
access these resources?

This has been a problem in distributed systems for over two decades. The
solution is to develop an integrated, global naming scheme where all
resources, applications, hosts (CPU's), storage, files, people, security
policies, etc., are all named in a consistent manner.

Thus, naming is one of the cornerstones of OGSI [3] the Grid standard being
developed in the Global Grid Forum ( http://www.gridforum.org . ).

Who are the users?

I see four different classes of Grid users:

-- end-users of applications,

-- application developers,

-- system administrators,

-- and managers of organizations that are using the Grid infrastructure to
accomplish some mission such as drug discovery.

Each of these users has a different perspective on what the Grid
means to them. They do share one common theme though  they want the Grid to
help them get their job done with the minimum amount of hassle and disruption
to their existing routine. The Grid helps them get their job done in many
ways. (More on benefits and ROI next week.)

When used to share and manage CPU resources Grids help organizations get more
work done (jobs per day, or some such metric) with their existing hardware
infrastructure  or perhaps even with less hardware. This allows organizations
to produce products (results) faster or to produce better products (results).
Grids can also help reduce the amount of time users spend finding the right
resource on which to execute their job thus saving human time as well.

When used to share data Grids can vastly simplify the complex task of managing
data on an enterprise wide scale, reduce the amount of manual copying, reduce
the amount of managed storage required, reduce errors caused by data in-
consistency across sites, and in general take humans out of the data loop,
reducing the time needed to perform tasks and reducing the cost.

When used to share data between organizations Grids create "virtual
organizations" and facilitate the collaboration that is increasingly a
necessary part of business life. In summary, Grids have come a long way since
the early 1990's. They are moving beyond the academic, supercomputing, realm
and into commercial use.

While many have defined Grids in terms of high-throughput compute Grids, I
believe that focusing on just computing fails to capture the full potential of
Grid computing.

Grid computing provides the most benefit when all resources within a virtual
organization, computing, data, storage, applications, devices, policies, etc.,
are made transparently and securely accessible to applications and users.

-- [1] A.S. Grimshaw, "Enterprise-Wide Computing," Science, 256: 892-894, Aug
12, 1994.

-- [2] L. Smarr and C.E. Catlett, "Metacomputing, Communications of the
ACM. 35(6): 44- 52, June 1992.

-- [3] S. Tuecke, K. Czajkowski, I. Foster, J. Frey,  S. Graham, C.
Kesselman," Grid Service Specification," Open Grid Services Infrastructure WG,
Global Grid Forum, Draft 3, July 17, 2002.

***************************************************************************
           Full background information on all Sponsoring Companies

      [ ] 921) SGI                         [ ] 934) Hewlett-Packard
      [ ] 527) Intel                       [ ] 942) Sun Microsystems
      [ ] 909) Fujitsu                     [ ] 943) Linux NetworX 


         For sponsorship information contact: gridads@gridtoday.com

              GRIDtoday welcomes bylined comments for publication.

***************************************************************************
Copyright 2002 GRIDtoday Redistribution of this article is forbidden by
law without the express written consent of the publisher. For a
subscription to GRIDtoday, send e-mail to gridfree@gridtoday.com