A Quick Introduction to XML Schemas

XML (“Extensible Markup Language“) is rapidly establishing itself as a useful tool for data exchange because it has the incredible potential to become a universal format for structuring information.

To use XML effectively in a community such as the Internet, there must be some constraints on the valid tags and tag sequences so that the data exchange can actually make sense to someone other than the creator. Still commonly used, DTDs (Document Type Definition) fulfilled this need. DTDs, however, have several disadvantages, such as:

  • The creation of DTDs requires the use and knowledge of a completely different syntax from XML.
  • Very limited ability to specify custom datatypes
  • Desire commonly used database data types (such as dates): DTDs support 10 types

The answer to these problems is XML Schemas. Schemas overcome DTD’s shortcomings and still provide the user with the power he needs. Here are some advantages of schemas:

  • Schemas use the same syntax as XML, so there’s less to learn.
  • Allowed to specify custom datatypes.
  • More predefined types (over 40!).
  • Attribute grouping.

In this tutorial, we will cover the basics of writing a schema to validate XML documents.

A Simple Schema

First, we are going to use the following XML as an example data file that is intended to follow the schema that we discuss next.

While this may not seem like a very useful example, we will flesh it out in a bit. So, we see we have an XML containing a listing of parks of different varieties. You may be unfamiliar with this line:

This line simply tells the validating parser where to look for our schema. In this case, it is telling the parser to retrieve the schema from the same location as the XML file. This may be a local file system directory that holds all your XML stuff or a place on the Internet.

Moving forward, our schema would look something like this:

In this schema, we declare each of our tags and how they relate to each other. Let’s dissect our schema line by line. First:

This line starts off the schema with the schema element, declaring the namespace and the datatype namespace (which we will explain later). A namespace is a set of names that are specified as legal elements or attributes within an XML document. Namespaces may be declared using a URI (Universal Resource Identifier), which is in the form of some URL (Uniform Resource Locator) or URN (Uniform Resource Number). It doesn’t matter whether you use a URL or a URN since URLs are unique across the Internet.

In this tutorial, we have chosen to use Microsoft namespace because it is a little easier to understand than the W3C standard. For more information on the W3C namespace, check this out.

Our next line:

declares a type, conditions, and child elements of our element ‘parks’. This line states that we will have an element with the name ‘parks’ that will only contain other elements and no other values. The next three lines:

actually define what elements can legally be contained within the element ‘parks‘. Note that using the notation <element type='zoo' /> is shorthand for <element type='zoo'</element>, which would be an empty element in XML terms. This saves time in coding a well-formed document.

We continue doing this for each element until we get to some element that actually contains data. For example:

declares the name element and the ID element. They both contain only text, so these are the tags that will actually contain the data in our document.

Something More Interesting

Let’s spice up our XML now to make a more interesting schema. Let’s remove the ‘theme’ and ‘amusement’ elements for brevity. Here is the XML we’ll work with:

Assuming everyone is very interested in zoos and their animals, let’s create a schema to describe our modified example.

It may look long but it really isn’t all that different from our previous example. Our schema now describes valid animals for our zoo, outlining their name, gender, date of birth, and “aggression rating“. By adding the animal element, we have introduced a number of interesting changes.

You can see the differences from our previous example in bold. First, we need to declare the attribute ‘species’. It is an attribute of an element, so we use the AttributeType type definition to describe it as an attribute. We want this to be required in every element in which it is included because we may have several different ‘tigers’ but we want them to be distinguishable by more than just their elements.

We have also changed the content type to ‘mixed’ indicating that we no longer only contain elements but other types as well. In this case, our element contains attributes and elements. We also need to add the attribute to the ‘lion’ definition, which is the final bold item.

In addition to adding attributes, datatypes are now used actively. For example:

In the schema element declaration, we assign the dt prefix to the datatype namespace we desire. This is used later on, as seen in the ‘date_of_birth’ and ‘aggression_rating’ elements, where we define those as being of type ‘date’ and ‘decimal’ respectively. This is useful because it allows us to impose restrictions on the type of data that can be contained in those elements. After all, it does not make much sense to have “Elephant” as of the date of birth or “&” for an aggression rating.

Here is a list of some common primitive “built-in” datatypes you may want to use (this list is not complete):

Conclusion

XML Schemas are the result of a need for a new way to make XML a viable solution for data exchange and storage. Their power and ease of use make them the clear choice for XML definition.

I hope it was a useful article.