Wattle Software - producers of XMLwriter XML editor
 Home | Search | Site Map 
XMLwriter
 Screenshots
 Features
 About Latest Version
 Awards & Reviews
 User Comments
 Customers
Download
 Download XMLwriter
 Download Plug-ins
 Download Help Manual
 MSXML Updates
 Downloading FAQ
Buy
 Buy XMLwriter
 Pricing
 Upgrading
 Sales Support
 Sales FAQ
Support
 Sales Support
 Technical Support
 Submit a Bug Report
 Feedback & Requests
 Technical FAQ
Resources
 XML Links
 XML Training
 XMLwriter User Tools
 The XML Guide
 XML Book Samples
Wattle Software
 About Us
 Contact Details
Designing Distributed Applications with XML, ASP, IE5, LDAP and MSMQ

Buy this book

BackContentsNext

Resource Description Framework

The Resource Description Framework (RDF) is certainly the most ambitious of all the metadata efforts from the W3C Metadata Activity, it became a W3C Recommendation on the 22nd February 1999. RDF is a syntax for describing resources. Resources are defined as anything that can be designated by a URI. RDF does not specify a vocabulary for describing resources. Rather, it provides the means for vocabulary authors to build up descriptions and facts about some topic of interest. It was influenced by the W3C experience with PICS, but it attempts to break out of the narrow model of PICS by providing a generalized model for describing resources.

RDF is a model for defining statements about resources. Each resource possesses one or more properties, each of which has a value. The model provides a means of defining classes of resources and properties. These classes are used to build statements which assert facts about the resource. RDF defines a syntax for writing a schema for a resource. A schema is analogous to a DTD, but is much more expressive. The schema uses the model defined for some vocabulary to express the structure of a document in the vocabulary. The statements in the model place constraints on the statements that can be made in a document conforming to the schema.

RDF Model

The basic RDF model is built from three types of objects:

  • Resources — anything that can be named with a URI
  • Properties — a specific, meaningful attribute of a resource
  • Statements — a combination of a resource, a property of the resource, and the value of the property

Resources

Resources can be almost anything: a document, a collection of documents, a site, even a specific portion of a document. This allows RDF to describe almost anything that can be placed online.

Properties

Properties have well-defined meanings. This means that constraints are placed on a property to define the types of resources to which it can be applied, the range and types of values it can take on, and how it relates to other properties. These constraints are a major reason why RDF is so expressive — the constraints give meaning to the properties, and hence to the resources they describe.

Statements

Statements are triplets consisting of a subject resource, a predicate property, and an object value. Objects can be literal values or resources, making complex statements possible. Consider the natural language statement:

The topic of urn:this-book is designing distributed applications.

The subject resource is urn:this-book. The property is topic, and the object is designing distributed applications.

Strictly speaking, properties are a subtype of resources. This is important from a theoretical perspective, but it is simpler for our introductory purposes to think of them as entirely separate entities. Our common sense view of them as separate items will make it easier to conceptualize the RDF model.

One property defined in the basic RDF model is type. This gives RDF a way to assign types to resources. Resources and properties use a class typing mechanism, so a given resource may be said to be a subtype of another class type. The RDF namespace has names for the class of resources and the subClassOf a property. By successively defining new classes of resources and properties, a vocabulary builder can develop RDF statements of arbitrary complexity and meaning.

Constraints are a specialized type of property. They are further refined in the range and domain of the property. Where typing gives us specialized properties, constraints bound a property, thereby giving it definition and meaning.

RDF also defines a variety of containers and collection classes. As we have seen in the previous chapter, it is often necessary to discuss collections of objects. RDF's container classes are much more sophisticated than ours. They define a variety of ordering and containment models.

An examination of RDF container classes is outside the scope of this book. The full W3C RDF Recommendation can be found at http://www.w3.org/TR/REC-rdf-syntax/

RDF Schema

RDF would be of little more than theoretical value if it did not include a format for transmitting data models. The creators of RDF chose to define an XML vocabulary for this task. This vocabulary defines resources and properties in a typed system similar to object oriented languages like C++ and Java.

The terminology of RDF can be overly theoretical in places. A few words on terminology for those of us who are not set theorists is therefore in order. RDF is a model for talking about things. Those things we can discuss, use, or otherwise refer to in an RDF schema are called resources. Both classes and properties are kinds of resources in the RDF model. Each property has a range – the set of values it can talk on – and a domain – the class to which the property applies.

Let's illustrate these concepts with a very simple RDF schema. Suppose we wish to talk about our retail customers. For generality, we'd like to say that retail customers are a specialized type of some customer class. This is done with the following lines:

<rdfs:Class rdf:ID="Customer">
	<rdfs:comment>Generic class for describing customers</rdfs:comment>
	<rdfs:subClassOf 
		rdf:resource="http://www.w3.org/TR/WD-rdf-schema#Resource"/>
	</rdfs:Class>
<rdfs:Class rdf:ID="RetailCustomer">

<rdfs:comment>Derived class for describing retail customers</rdfs:comment>
	<rdfs:subClassOf 
	rdf:resource="#Customer"/>
</rdfs:Class>

The rdf and rdfs namespaces are part of the RDF proposal and are declared elsewhere in our schema document. Our class named Customer is a subclass of the RDF-defined class resource. RetailCustomer, then, is a subclass of Customer. Now let's give our customer a way to pay for his purchases. RetailCustomer should have a property that will take on one of the names of a set of credit cards. That is accomplished with this property definition:

<rdf:Property ID="paymentType">
	<rdfs:range rdf:resource="#CreditCards"/>
	<rdfs:domain rdf:resource="#RetailCustomer"/>
</rdf:Property>

Our property is named paymentType. It takes on a value from the class CreditCards, which we shall define shortly. The property's domain — the class to which it can apply — is the class RetailCustomer. We know that the values for this property will be a limited number of strings naming the major credit card types. First we define a class of literals.

<rdfs:Class rdf:ID="CreditCards"/>

Next we define some literal values of this type:

<CreditCards rdf:ID="MasterCard"/>
<CreditCards rdf:ID="AmericanExpress"/>
<CreditCards rdf:ID="Visa"/>
<CreditCards rdf:ID="OtherCredit"/>

Perhaps we are interested in keeping track of who referred this customer to us. This should be a property whose value is a resource of the type Customer. This allows us to have any sort of customer derived class as a value for this property. That way, we could have referrals from RetailCustomer instances or as-yet undefined WholesaleCustomer instances without having to enumerate these specific derived classes. Similarly, if we derive more classes from Customer, the referrer can participate in these relationships without modifying the range declaration.

<rdf:Property ID="referrer">
	<rdfs:range rdf:resource="#Customer"/>
	<rdfs:domain rdf:resource="#RetailCustomer"/>
</rdf:Property>

Our property is called referrer, it can be applied to the RetailCustomer class, and its value must be a resource of the Customer class. Since we have previously defined that class, no further specification is necessary. Here's the full text of our simple RDF schema:

<rdf:RDF xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"
xmlns:rdfs="http://www.w3.org/TR/WD-rdf-schema#">

	<rdfs:Class rdf:ID="Customer">
		<rdfs:comment>Generic class for describing customers</rdfs:comment>
		<rdfs:subClassOf rdf:resource=
					"http://www.w3.org/TR/WD-rdf-schema#Resource"/>
	</rdfs:Class>

	<rdfs:Class rdf:ID="RetailCustomer">
		<rdfs:comment>Derived class for describing retail customers</rdfs:comment>
		<rdfs:subClassOf rdf:resource="#Customer"/>
	</rdfs:Class>

	<rdf:Property ID="paymentType">
		<rdfs:range rdf:resource="#CreditCards"/>
		<rdfs:domain rdf:resource="#RetailCustomer"/>
	</rdf:Property>

	<rdf:Property ID="referrer">
		<rdfs:range rdf:resource="#Customer"/>
		<rdfs:domain rdf:resource="#RetailCustomer"/>
	</rdf:Property>

	<rdfs:Class rdf:ID="CreditCards"/>
		<CreditCards rdf:ID="MasterCard"/>
		<CreditCards rdf:ID="AmericanExpress"/>
		<CreditCards rdf:ID="Visa"/>
		<CreditCards rdf:ID="OtherCredit"/>	
	</rdf:RDF>

RDF and Our Philosophy

RDF, quite simply, is far too ambitious for our purposes. Many of its assignments are nothing more than names. A complicated system of mappings between names and resources is needed to discern meaning. More advanced features, e.g., ranges, domains, and container classes, are needed to communicate metadata regarding the topic under discussion. These features, however, are a bit too much for the simple kinds of automated metadata applications we are likely to support in the immediate future. If RDF can be supported, it is a powerful mechanism for communicating intellectual models. Our needs, however, are somewhat simpler.

Indeed, both XML and our development philosophy share the belief that simple features that can be readily implemented are more useful than complex features that can be implemented only with great difficulty. Given some XML vocabulary, we'd like to be able to discover the proper structure for a document that conforms to that vocabulary. This is far simpler to implement. We really need a better way of encoding a DTD. This is what the remaining proposals aim to achieve.

Meta Content Framework Using XML

The Meta Content Framework (MCF) is similar to RDF, although it doesn't seem to have influenced quite so many later efforts as has RDF. Like RDF (and, indeed, most of the metadata proposals), the MCF uses a directed graph model of nodes and edges to build conceptual models. Objects are the nodes and property values are the edges. An XML vocabulary is provided for encoding MCF models. Subclassing and inheritance is permitted. Like RDF, a core set of property and object types are used to describe more complicated types, and so forth until the complete metadata model is described. An interesting property of MCF is that its authors anticipated using MCF to define componentized blocks of metadata. These blocks would then be combined through the XML linking specification to compose complete metadata models. In this way, MCF blocks found to be useful to particular problems could be reused by other vocabulary authors working on related problems. The following illustration shows a simple MCF schema for this book. The book object is derived from the category (MCF’s term for class) Book, which in turn derives from Document. The book has chapters (i.e., the book is the domain of the Chapter category) which takes their values from the category English_Prose. That category is derived from the category text. Note that typeof, domain, and range are properties of their respective objects.

Here’s the XML document that captures the information in the illustration above:

<xml-mcf>
   <Category id="Designing_Distributed_Applications">
	<name>Designing_Distributed_Applications</name>
	<superType unit="Book"/>
	<description>The category whose sole member is this book</description>
   </Category>

   <Category id="Book">
 	<name>Book</name>
	<superType unit="Document"/>
	<description>The notion of a bound book</description>
   </Category>

   <!-- The supertype, Page, is a category from MCF itself. -->
   <Category id="Document">
	<name>Document</name>
	<superType unit="Page"/>
	<description>A generalized document</description>
   </Category>

   <Category id="Chapter">
   	<name>Chapter</name>
	<superType unit="Page"/>
	<description>The notion of an organized sequence of pages</description>
	<domain unit="Designing_Distributed_Applications"/>
	<range unit="English_Prose"/>
   </Category>

   <Category id="English_Prose">
	<name>English_Prose</name>
	<superType unit="text"/>
	<description>The notion of prose written in English</description>
   </Category>

   <Category id="text">
	<name>text</name>
	<superType unit="Page"/>
	<description>The notion of some organized natural language</description>
   </Category>

</xml-mcf>

The W3C MCF Note can be found at http://www.w3.org/TR/NOTE-MCF-XML/.

XML Data

XML Data is an ambitious proposal for the definition of schemas. Like RDF, it can express both conceptual and syntactic models. To clarify, a DTD is an example of a syntactic model – it specifies the allowable syntax of some vocabulary, whereas a relational database schema is a conceptual model, as it describes things and the relations between things in the model. XML Data also uses an XML vocabulary as its documentation format. It can express all the information of a conventional XML DTD, but it adds strong typing of elements and attributes. In addition, constraints may be placed on the value and use of an element. XML Data also supports inheritance of types, which allows us to conveniently extend existing definitions. Further aiding authors of schemas is the ability to use a defined element type as a complex structure. Hence, our RetailCustomer from the RDF discussion may be used as a basic type in later schemas.

Unlike a DTD, an XML Data schema allows you to declare a model open. In an open model, the syntactic rules laid down in the schema do not preclude the inclusion of content not covered in the schema. This might be useful in cases when we wish to precisely define some content but are indifferent to other content that might be added to documents. If the model is declared closed, an XML Data schema specifies content in the same formal manner as a DTD. In which case, all content must be explicitly described in the schema to be permitted in a document conforming to the model. In order to embrace conceptual models such as relational database schemas, XML Data introduces relations, a concept in which an element acts as a reference to another. This is like the notion of primary and foreign keys in a database; an element contained in one item of content establishes a relationship with another item of content. The element in question is a key or index into the other content. Aliases are also permitted. This allows us to establish subtle concepts. An element can have an alias, or correlative in XML Data's terminology, which establishes the context of a relationship. For example, we might have a STUDIED element with the correlative STUDENT. This establishes that STUDIED is an alias for STUDENT, in the context of the student's relation to the topic she studies.

We will not discuss XML Data and the related proposal that follows, XML Document Content Description, in great depth because a partial implementation is included with the version of MSXML that ships with Internet Explorer 5.0. This partial implementation, intended as a technology preview, is termed XML Schema. We will discuss its implementation at length and develop some prototype code using it later in this chapter.

For further information on XML Data see the W3C Note on their Web site at http://www.w3.org/TR/1998/NOTE-XML-data/

XML Document Content Description

The XML Document Content Description (DCD) proposal is an attempt to extract the subset of XML Data's features that permit the encoding of a DTD in XML. It is thus a simplification of XML Data that addresses a pressing need in a valuable way. Its authors modified the syntax of XML Data so that DCD would be more closely aligned with RDF.

DCD also offers a few features that cannot be expressed in an XML 1.0 DTD. The first, and perhaps most important to the exchange of business data using XML, is the ability to specify the data type of elements and attributes. One criticism of XML is that it expresses all values as text, leaving the native data type in question. DCD identifies a host of native types drawn from common programming languages as well as the core tokenized types defined in XML 1.0.

DCD explored two additional features in appendices to the main submission. The first is the ability to nest element type definitions within other definitions in order to declare an element type with scope local to the containing element type definition. The second, of somewhat broader use, is the inheritance and subclassing mechanism. This borrows a powerful technique from the world of object oriented programming. Element and attribute type definitions can be extensions of simpler type definitions. When a type definition includes the keyword element <Extends Type="some_type_definition"/>, it inherits all the elements and properties previously defined for the class some_type_definition.

For further information on DCDs see the W3C Note on their Web site at http://www.w3.org/TR/NOTE-dcd

Metadata Support in Microsoft Internet Explorer

Internet Explorer 5.0 supports metadata in several ways. First, it uses the current draft of the namespaces specification. Second, it uses namespaces to provide an approach to typing of elements. This is coupled with Microsoft's extensions to the DOM so that a program can retrieve the value of an element in either text (i.e., as it appears in the document) or native binary data format (e.g., int, float). Finally, it offers a technology preview termed XML Schema. This is based on the XML Data proposal, but only supports the feature subset that is also part of the XML DCD proposal. These features may be used to explore the metadata in XML and suggest ways we could use it in our applications.

The various metadata efforts seen in this chapter cover a spectrum from the highly ambitious to the narrowly focused. Each minimally gives us a way to capture the same metadata about a vocabulary that a DTD expresses. Each goes further, however, adding more expressive techniques for describing data. That is what is interesting to us in terms of the third principle of developing cooperative network applications:

3. Services shall be provided as self-describing data.

The more descriptive our data can be, the better. An automated consumer of service data such as an agent may encounter an unfamiliar vocabulary. Unlike a human consumer, the robot needs a great deal of help in exploring the data. When the thicket of metadata efforts is cleared, service programmers will have a very powerful tool for providing that help. Since these efforts use XML for their own syntax, we have the added benefit of being able to reuse MSXML and other XML parsers with which we may be familiar.

Defining Datatypes in XML

There are many occasions when the textual contents of an element represent a typed value other than text in the domain we are describing. This is most obvious in the case of numeric values. The integer 1234 requires two bytes of storage in its native form on a PC. In XML's default character encoding, it consumes four bytes. Worse, before we can use it in calculations, we must perform a conversion from the string to the numeric form. Beyond the issues of storage and conversion, if we simply use unadorned text the type of data is implicit knowledge. If we use the data type namespace, however, we can make the type explicit. This might be useful to us if we wanted to examine a document in an unknown format. For example, a graphing component might search a document for collections of numeric types. If found, these could be presented to the user for selection of what data to put in a graph. Use of the data type namespace also allows us to manipulate data in native form. For example, if I have this element

<VELOCITY dt:type="r8">1.5E5</VELOCITY>

I can retrieve it as either the string 1.5E5 or as an eight-byte floating point numeric value. The DOM extensions to support this consist of two properties of the Node class:

Property Description
nodeTypedValue read-write; typed value of the node
dataType read-write; the type of the node

Twenty-five types frequently encountered in programming languages are supported. Additionally, the XML 1.0 recommendation defines ten enumerated or tokenized types and these are supported as well. The definitive list of types supported is found at http://www.microsoft.com/workshop/xml/schema/reference/datatypes.asp.


BackContentsNext
©1998 Wrox Press Limited, US and UK.

Buy this book



Select a Book

Beginning XML
Beginning XHTML
Professional XML
Professional ASP XML
Professional XML Design...
Professional XSLT...
Professional VB6 XML
Designing Distributed...
Professional Java XML...
Professional WAP

© Wattle Software 1998-2013. All rights reserved.