Thursday, December 20, 2007

Web Services

Distributed Computing
Pressures to share information and cooperatively share processing lead to the notion of
distributed processing. Traditional distributed processing models assume that there is
a common environment or architecture between cooperating entities. When both parties
try to accomplish a processing task using J2EE or COM+, a common architecture
exists for the invocation of operations or sharing of data. This makes it relatively easy
to connect applications. While a common architecture does not guarantee interoperability,
it makes it easier to achieve.
It isn’t always possible for all the participants in distributed processing activities to
use the same architecture and processing environment. When processing must be
spread across organizations, their architectures, platforms, and development languages
are likely to be different. Complications arising from mismatches in environments can
exist between companies and can even exist between departments or divisions within
the same company. An organization with a large investment in an existing infrastructure
cannot afford to change its architecture and processing capabilities, even if successful
distributed processing depends on it. And, if one organization is willing to make
the change to accommodate another organization, there are probably other groups it
needs to work with that can’t make such an all-encompassing change. As a result, it’s
unlikely that organizations will be able to use a common environment.
Current processing architectures are single domain, but multitiered. That is, the processing
load within a domain is spread among several systems, each handling a welldefined
portion of a transaction. The systems can work sequentially or in parallel. A
common division of responsibility is to have a front-end processor that handles data
presentation and user interaction, a middle tier that is responsible for implementing
business logic, and a back-end system that may be a data repository or a mainframe
that performs batch processing.
A logical extension of multitiered processing is multidomain processing. A processing
domain is a computing facility under the control of a single organization. Adomain
may include many computers and utilize different processing architectures. A department
or a division within a company may control a domain, or a domain may be under
the control of a company. Within a large company, there may be an accounting domain
and a purchasing domain. We want the accounting system to know of purchases occurring
in the purchasing system so that the bills can be paid automatically. Between companies,
it may be desirable for a purchasing system to request bids from and send
purchase orders to vendors’ systems.
Multidomain processing is generally very difficult to implement because of the disparate
platforms, environments, and languages in different domains.
One notable attempt at achieving multidomain processing is Electronic Data Interchange
(EDI). EDI is a standard format for exchanging financial or commercial information.
Two versions of EDI are in use. They are Accredited Standards Committee
(ASC) X12 and the International Standards Organization’s Electronic Data Interchange
for Administration, Commerce, and Transport (EDIFACT). The latter standard is often
referred to as UN/EDIFACT, since it was originally developed by a United Nations
working party.
With EDI, a company can transmit a purchase order to its vendor. Banks use EDI to
send funds transfer information to financial clearinghouses. Value-added networks are
used to transfer the EDI messages. EDI has existed since about 1980, and it has been
used successfully by many companies.
By dealing with the structure and format of data exchanged, EDI frees each party to
the transaction from the requirement for a uniform computing environment. So long as
the sender can construct the correct message, it does not matter what platform, operating
system, or application created the message. Likewise, on the receiving side, so long
as the receiver can parse the message, identify the elements of interest, and process
them appropriately, the processing environment at the receiver’s end is of no consequence.
The transaction has been processed by two loosely coupled systems located in
two separate domains.
There are several reasons why EDI is not used more widely. EDI messages are rigid.
The data is not self-defining, and it is presented in a prescribed order with a fixed representation.
This rigid structure often needs modification when users discover needs
that cannot be accommodated by the existing fields. However, EDI’s rigidity makes
changes, such as adding new fields, difficult to implement. This leads to a multitude of
vendor- and customer-specific implementations.
Another reason for EDI’s limited acceptance is that specialized software is required,
which can be very expensive. EDI documents are often transferred via specialized,
value added networks, increasing cost and support requirements. Implementing EDI
can be very costly, and a company needs a very compelling reason before choosing to
adopt it.
Distributed Processing across the Web
Extensible Markup Language (XML), which is a platform-independent way to specify
information, is the foundation of Web Services. SOAP, which originally stood for Simple
Object Access Protocol (newer versions of the specification do not use it as an
acronym), builds on XML and supports the exchange of information in a decentralized
and distributed environment. SOAP consists of a set of rules for encoding information
and a way to represent remote procedure calls and responses, allowing true distributed
processing across the Web. XML and SOAP enable platform- and data-independent
interfaces to applications. Because Web Services are usually built on HTTP, they
can be delivered with little change to existing infrastructures, including firewalls.
UDDI and WSDL also support Web Services. Universal Description, Discovery, and
Integration (UDDI) is a mechanism for discovering where specific Web Services are
provided and who provides them. Web Services Description Language (WSDL) specifies
the interfaces to these Web Services, what data must be provided, and what is
returned. SOAP, UDDI, and WSDL are the underlying technologies upon which Web
Services are based. Using these protocols (shown in Figure 2.1), systems from different
domains, independent environments, or with different architectures can engage in a
cooperative manner to implement business functions. SOAP, UDDI, and WSDL are
built using XML and various Internet protocols such as HTTP.
SOAP, UDDI and WSDL are used in different phases, called publishing, finding, and
binding, in the Web Services development cycle. The Publish, Find, and Bind Model is
shown in Figure 2.2.
The model begins with the publish phase, when an organization decides to offer a
Web Service (1). The Web Service can be an existing application with a new Web Service
front end, or it can be a totally new application. Once an enterprise has developed
the application and made it available as a Web Service, the enterprise describes the
interface to the application so that potential users interested in subscribing to it can
understand how to access it. This description can be oral, in some human language
such as English, or it can be in a form, such as WSDL, that can be understood by Web
Services development tools. To facilitate automated lookups, the service provider
advertises the existence of the service by publishing it in a registry (2). Paper publications
or traditional Web Services can provide this service, or UDDI directories can
advertise the existence of the Web Service.
The next step of the model is the find phase. Once the service is advertised in a
UDDI registry, potential subscribers can search for possible providers (3 and 4) and
implement applications that utilize the service (5). Potential subscribers use the entries
in the registry to learn about the company offering the service, the service being
offered, and the interface to the service.
The final phase of the model is the bind phase. When a subscriber decides to use a
published service, it must implement the service interface, also called binding to the
service, and negotiate with the service provider for the use of the service. The negotiation
can cover mutual responsibilities, fees, and service levels.
When the application has been implemented and the business relationships
resolved, the Web Service is utilized operationally. The only participants at this point
are the service subscriber, who requests the service (6), and the service provider, who
delivers the service (7). WSDL and UDDI registries are generally only used during the
initial discovery of the service and the design of the application.
Web Services Pros and Cons
Web Services have many advantages that were not enjoyed by earlier attempts at crossdomain
interoperability. Since Web Services are in the early phase of adoption, we cannot
readily point to many actual implementations that prove Web Services live up to
expectations. Nevertheless, Web Services have many characteristics that set them apart
from solutions that came before them and make Web Services more likely to succeed.
The advantages of Web Services are:
■■ Web Services processing is loosely coupled. Earlier attempts to address
cross-domain interoperability often assumed a common application environment
at both ends of a transaction. Web Services allow the subscriber and
provider to adopt the technology that is most suited to their needs to do the
actual processing.
■■ Web Services use XML-based messages. Web Services using XML have a
flexible model for data interchange that is independent of the computing
environment.
■■ Participating in Web Services does not require abandoning existing investments
in software. Existing applications can be used for Web Services by
adding a Web Services front end. This makes possible the gradual adoption of
Web Services.
■■ Software vendors are coming out with tools to support the use of Web Services.
Organizations can use currently available tools from vendors such as IBM,
Microsoft, Sun, and others. There is no delay between interest in the technology
and the availability of tools to implement and use Web Services.
■■ There is a lot of emphasis on the interoperability of Web Services. Web Services
tool developers are working to demonstrate interoperability between implementations.
It’s likely that this will pay off and allow developers to choose
tools from one vendor and be confident that they will be able to interoperate
with other implementations.
■■ The modular way Web Services are being defined allows implementers to pick
and choose what techniques they will adopt. Other than having a basis in XML,
SOAP, UDDI, and WSDL, the building blocks of Web Services have related, but
independent capabilities. They are not tightly coupled and don’t depend on
each other to function.
■■ Use of Internet standard protocols means that most organizations already have
much of the communications software and infrastructure needed to support
Web Services. Few new protocols need to be supported, and existing development
environments and languages can be used.
■■ Web Services can be built and interoperate independently of the underlying
programming language and operating system. In organizations where there
isn’t a single standard, Web Services make interoperability possible, even when
one part of the organization uses .NET, while another portion uses Java, to
build their Web services, and other organizations use other technologies.
Reservations about Web Services fall into two categories. First, Web Services are not
proven technology; there is some suspicion that Web Services are the fashionable solution
of the day. That is, some think that Web Services are the current fad, and like many
other solutions to the distributed processing problem from the past, they will not
deliver. While we cannot disprove this, the advantages that Web Services have over
past solutions are significant.
The second reservation about Web Services centers on its reliance on XML. While
there are many advantages to XML, size is not one of them. Use of XML expands the
size of data several times over. The size of a SOAP message translates into more storage
and transmission time. The flexibility of SOAP means that more processing is
needed to format and parse messages. Do the advantages of XML outweigh the additional
storage requirements, transmission time, and processing needed? The answer is
a qualified yes. The flexibility offered by XML is required when trying to connect two
dissimilar processing environments in a useful way. Spanning processing domains
requires a flexible representation. However, once a message is within a single environment,
on either side of the connection, implementers must decide the extent to which
XML is required. XML will not always be the choice to represent data within a single
processing domain.
Extensible Markup Language
In order to understand Web Services, the reader must understand XML. Much of what
we’ll be discussing in this chapter, and other chapters in this book, is based on XML.
You’ll see it in many of our examples.
XML is a derivative of the Standard General Markup Language (SGML) (ISO 1986).
SGML is an international standard for defining electronic documents and has existed
as an ISO standard since 1986. SGML is a meta document definition language used for
describing many document types. It specifies ways to describe portions of a document
with identifying tags. Specific document types are defined by a document type definition
(DTD). A DTD may have an associated parser, which is software that processes
that document type.
HTML, an SGML application, has been well accepted on the Web but regarded as
limited because of its fixed set of tags and attributes. What was needed was a way to
define other kinds of Internet documents with their own markups, which led to the
creation of XML. Work on XML began in 1996, under the auspices of the World Wide
Web Consortium (W3C). The XML Special Interest Group, chaired by Jon Bosak of Sun
Microsystems, took on the work. It was adopted as a W3C Recommendation in 1998
(W3C 2000).
XML is a specialized version of SGML used to describe electronic documents available
over the Internet. Like SGML, XML is a document definition metalanguage. Since
XML is a subset of SGML, XML documents are legal SGML documents. However, not
all SGML documents are legal XML documents.
XML describes the structure of electronic documents by specifying the tags that
identify and delimit portions of documents. Each of these portions is called an element.
Elements can be nested. The top-level element is called the root. Elements enclosed by
the root are its child elements. Each one of these elements can, in turn, have its own
child elements. In addition, XML provides a way to associate name-value pairs, called
attributes, with elements. XML also specifies what constitutes a well-formed document
and processing requirements. XML, like SGML, allows for DTDs. But, DTDs are not
used with SOAP, which will be discussed later in this chapter. Instead, SOAP uses XML
Schemas, so our examples will be based on XML Schemas rather than DTDs.
XML elements begin with a start tag and end with an end tag. Each document type
has a set of legal tags. Start tags consist of a label enclosed by a left angle bracket (<)
and a right angle bracket (>). The corresponding end tag is the same label as in the start
tag prefaced by a slash (/), both enclosed by the left and right angle brackets. For
instance, a price element looks like 123.45. Unlike HTML, every start
tag must be matched by a corresponding end tag.
Start tags may also contain name-value pairs called attributes. Attributes are used to
characterize the element between the start and end tags. In our previous example, a
currency attribute could be included in the start tag to designate the currency of the
price, 123.45. There are several kinds of attributes.
Those most commonly encountered are strings. A specific predefined attribute
that will be important later in this chapter is ID. The ID attribute associates a name
with an element of an XML document.
XML defines a small number of syntax restrictions such as requiring an end tag to
follow a start tag. These restrictions enable the use of XML parsers, which must be flexible
enough to work with any XML-specified document. Any document that follows
these restrictions is said to be well formed.
The term XML is used in the literature in several ways. The common uses are:
■■ The metalanguage specified in (W3C 2000). In our examples, this will involve
the use of XML Schemas as well.
■■ An XML specification for an application-specific document type.
■■ A specific document created using the application-specific markup language.
To clarify these uses, let’s consider the case of a developer wishing to implement a
purchasing application. This developer wants to describe a purchase order and decides
to use XML, the metalanguage, for this purpose. So, the developer uses XML, the metalanguage,
to define the tags that identify the elements of a purchase order. The developer
defines an order as a sequence of element. Then, she defines tags for the elements.
These elements are orderNum, itemDescription, quantity, unitPrice, and aggregatePrice.
The developer also defines an attribute called currency, which can be applied to order.
If the attribute is used, the purchase order application will associate the currency of
order with the price elements. The resulting XML specification is shown below:

xmlns:xs=”http://www.w3.org/2001/XMLSchema”
xmlns=”www.widgets.com”
elementFormDefault=”qualified”
attributeFormDefault=”unqualified”>













An instance of a purchase order is an order for five widgets, part number 9876, for
$34.23 each. This XML purchase order document is shown below. Note that each name
is now a tag. Values associated with each tag are sandwiched between the start tag and
its corresponding end tag. We also use the attribute to designate prices in dollars.

xmlns=”www.widgets.com”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation=”www.widgets.com>
9876
widgets
5
34.23
171.15

Supporting Concepts
XML relies on several other concepts to be effective. Two important concepts used
within the XML specification are Uniform Resource Identifiers (URIs) and the XML
namespace. XML Schemas, a separate W3C Recommendation, is used with XML to
provide greater control over data types. In fact, we’ve already been using all three in
our examples.
Uniform Resource Identifiers
URIs identify abstract or physical resources. The resource can be a collection of names
that has been defined by some organization or it can be a computer file that contains
that list. A URI follows the form: :.
The most familiar form of a URI is the Uniform Resource Locator (URL). It usually
specifies how to retrieve a resource. It denotes the protocol used to access the resource
and the location of the resource. The location can be relative or absolute, but it must be
unambiguous. For URLs, the scheme is usually a protocol to access the resource, and
the scheme-specific part is the user’s name when accessing the resource, the password
that allows access, the host of the resource, the port, and the URL path. Not all of the
constituents of the scheme-specific part are required.
In addition to complete resources, URLs can be used to refer to an element of an
XML document. In order to do this, an ID attribute must be used with the element to
associate a unique name with the element. Then, the URL string ends with the ID
string. We modified our purchase order to include an ID attribute.

ID=”ThisPO”
xmlns=”www.widgets.com”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation=”www.widgets.com>
9876
widgets
5
34.23
171.15

External references to the element must be qualified by the complete URL to the document
followed by # and the ID string. An example of this is:
http://www.mysys.com/ThisOrder.xml#ThisPO. If the element is being referenced from
within the XML document, the URL can be shortened to #ThisPO.
The other form of URI is the Uniform Resource Name (URN). Unlike a URL, the
URN is not location dependent. There are no requirements that a URN be locatable. It
can be purely logical and abstract. It does have to be globally unique and persistent.
Global uniqueness is ensured by registering the URN. For a URN, the scheme is “urn:”,
which is fixed. The scheme-specific part consists of an identifier followed by a “:” and
then a namespace-specific string, which is interpreted according to the rules of the
namespace (this is described in the next chapter). An example of a URN is:
urn:ISBN:0471267163. In this case, ISBN identifies the namespace as an International
Standard Book Number and the number identifies a particular book.
Namespaces
As XML-based applications are implemented, a developer may wish to use elements
defined by the service developer. But, XML documents are likely to consist of a combination
of elements and attributes from several different sources, each source working
independently of the others. It should be possible to associate elements and attributes
with specific applications, while eliminating confusion due to duplication of element
or attribute names.
To make it easier to use elements or attributes associated with specific applications
while resolving possible ambiguity over the use of an element or attribute name,
namespaces are used (W3C 2002c). A namespace is a collection of names. An element
or an attribute can be associated with a namespace, thereby identifying it as having the
semantics of the elements or attributes from that namespace. Qualifying a local name
with a namespace eliminates the possibility of misunderstanding what a name denotes
or how its value should be formatted. Qualifying a name is accomplished by declaring
a namespace, then associating the namespace with a local name.
Namespaces are identified by a URI, usually a URL. An example of a namespace
declaration is: . This declaration
allows elements and attributes within the scope order to identify their membership
within the namespace by prepending acct: to the element or attribute name. The URL
in the declaration does not always resolve to a location that can be reached over the
Internet. It may simply serve to make any names qualified in the namespace unique.
The following example takes our purchase order and illustrates how to qualify
names. Two namespaces are declared. The first is used for elements defined by the purchasing
department, which includes the purchase order number and the item description.
The second declares a namespace defined by the accounting department, which
includes the number of units and the prices. To make this example more meaningful,
we’ve changed the element name orderNum to num, and quantity to num. Now, without
some assistance, we wouldn’t be able to differentiate the two elements named num.
This is where namespaces are useful.

xmlns=”www.widgets.com”
xmlns:orderform=”http://www.widgets.com/purchasing”
xmlns:acct=”http://www.widgets.com/accounting”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation=”www.widgets.com>
9876
widgets
5
34.23
171.15

In this example, two additional namespaces are declared for use within a purchase
order. The first is designated orderform, and the second is acct. Neither of the URLs that
specify the namespace have to be reachable via the Internet nor do they even have to
exist as files. Their purpose is to uniquely qualify names and attributes as belonging to
the purchasing namespace or the accounting namespace. Later, two child elements
orderform:num and acct:num are specified. Because they are qualified, we know that the
9876 is a purchase order number and that 5 is a number of units.
XML Schema
XML Schema (W3C 2001d, W3C 2001e) is a language used with XML specifications to
describe data’s structure, the constraints on content, and data types. It was designed to
provide more control over data than was provided by DTDs that use the XML syntax.
While XML Schema and DTDs are not mutually exclusive, XML Schema is regarded as
an alternative to DTDs for specifying data types. SOAP, which we will discuss later,
explicitly prohibits the use of DTDs.
In many ways, XML Schema makes XML interesting. XML provided two ways to
aggregate elements: sequence and choice. A sequence of elements requires that each
element of the sequence appear once in the order specified. Choice requires that a
single element be present from a list of potential elements. With XML Schema, the
language designer can specify whether an element in a sequence must appear at all,
minOccurs, or whether there is a maximum number of appearances, maxOccurs.
XML Schema datatypes are primitive or derived. A primitive datatype does not
depend on the definition of any other datatype. Many built-in primitive datatypes have
been predefined by XML Schema. They include integer, boolean, date, and others.
Derived datatypes are other datatypes that have been constrained, explicitly listed, or
combined (the actual term used in the specification is “union”). Constrained datatypes
take an existing datatype and restrict the possible values of the datatype. The derived
datatype belowSix consists of integers restricted to values between 0 and 5. The restriction
on the datatype is called a facet. A datatype may consist of a list of acceptable values. A
datatype of U.S. coins contains penny, nickel, dime, and quarter. The union of U.S. coins
with U.S. paper denominations results in all United States currency denominations.
XML Schema is useful for several reasons. First, the built-in datatypes of XML
Schema support the precise definition of data. With facets, schemas can constrain the
values of XML data. Finally, a definition that is more precise can be achieved with
derived datatypes. Once a schema has been defined, schema processors are able to validate
a document to ensure that the document corresponds to the schema’s structure
and permissible values. This checking can eliminate a source of many of the vulnerabilities
that plague Web-based systems.
We have modified the purchase order example to show some of the features we’ve
just discussed. Up until now, we have conveniently avoided discussing lines 2– 4 of the
example. What they do is identify this XML document as an XML Schema document
that defines the namespace http://www.widgets.com. Line 4 also declares the default
scope of the names in the schema to be www.widgets.com. We’ve been using XML
Schema all along. In this example, each of the elements is now associated with an
appropriate data type. In addition, we have specified that the itemDescription element
is optional and does not have to be in the sequence.

xmlns:xs=”http://www.w3.org/2001/XMLSchema”
xmlns=”www.widgets.com”
elementFormDefault=”qualified”
attributeFormDefault=”unqualified”>




minOccurs=”0”/>


type=”xs:decimal”/>





There are many other aspects to XML Schema. Agood overview is contained in XML
Schema Part 0: Primer (W3C 2001c). XML Schema are placed in a separate schema document
so that type definitions can be reused in other XML documents. This can lead to
confusion when the term XML schema is used. This confusion is comparable to what
occurs when XML is used. When a separate XML schema document is used, references
to the XML schema instance must be namespace qualified so that the XML schema
processor can determine that a separate schema instance is being referenced. This is
usually done by declaring an XML namespace using an attribute with xmlns: for a suffix.
The location of the schema instance can be declared eliminating any possibility of
ambiguity. We’ve been declaring the namespace in our order examples using the
xmlns: attribute.
The advantage of using this schema is that there are schema processors that check
the values of elements to ensure that the values comply with the facets in the schema.
This reduces the possibility of using improperly formed input as a means of compromising
the security of an XML-based system.

No comments: