DISA logo

DISA Technical Note, 2002-01
16 September 2002

Published by:
Data Interchange Standards Association
333 John Carlyle Street, Suite 600
Alexandria, VA 22314 USA
+1 703-548-7005
+1 703-548-5738 Fax

Componetizer: a tool for extracting and documenting XML Schema components

Marcel Jemio and Alan Kotok
Data Interchange Standards Association

1. Introduction

1.1 Document status
1.2 Acknowledgments
1.3 Disclaimer
1.4 Copyright

2. Problem statement

2.1 Business background
2.2 Example, OTA_VehAvailRateRQ.XSD

3. Componetizer solution

4. Componetizer applications

4.1 Early applications
4.2 Later applications

5. About the authors

Exhibit 1.  OTA_VehAvailRateRQ.xsd, XML Schema syntax

Exhibit 2. OTA_VehAvailRateRQ.xsd, Componetizer HTML table output

1. Introduction

1.1 Status of this document

This document is a technical paper for discussion in the general e-business community. Its distribution is unlimited. Style and formatting follow the Data Interchange Standards Association (DISA) publication guidelines.

Current version: Componetizer: a tool for extracting and documenting XML Schema components, DISA Technical Note 2002-1, 13 September 2002

1.2 Acknowledgments

This paper represents developments in standards publishing technology developed by DISA, and while it documents the work of DISA’s technical operations director Marcel Jemio, it also reflects ideas and comments provided by DISA’s president Jerry Connors, and vice-presidents Tim Cochran and Julia O’Brien.

1.3 Disclaimer

The views expressed in this document are those of the authors and are not necessarily those of DISA. The authors and DISA specifically disclaim responsibility for any problems arising from correct or incorrect implementation or use of this information.

This document and the information contained herein is provided on an "AS IS" basis. DISA disclaims all warranties, express or implied, including but not limited to any warranty that the use of the information herein will not infringe any rights or any implied warranties of merchantability or fitness for a particular purpose.

1.4 Copyright

The entire contents of the document are Copyright ã 2002, Data Interchange Standards Association, all rights reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to Data Interchange Standards Association, except as required to translate it into languages other than English.


2. Problem statement

The Componetizer solves the problem of identifying and extracting individual data items or components from the electronic rules for structuring XML documents, called XML Schemas. Since business processes can easily become large and detailed, the schemas representing those processes can also become lengthy and complex. This section describes the reasons for developing this tool.

2.1 Business background

The publication by the World Wide Web Consortium or W3C of the XML Schema 1.0 standard in May 2001 marked an important milestone in the development of XML as a business resource. XML Schema adds a number of key features to the basic eXtensible markup Language or XML 1.0 standard that give XML more power and flexibility:

Structure – XML Schema defines and catalogues XML vocabularies, describing the meaning, usage, and relationship of the constituent parts of those vocabularies

Datatypes – XML Schema provides more choices for describing the data contained in XML documents, including basic built-in (or primitive) datatypes and the ability for users to define their own datatypes

The basic XML 1.0 standard allows for hierarchical documents using sets of electronic rules called document type definitions or DTDs to define and validate their structure. DTDs offer a way to define the structural rules for XML documents, but they use a syntax different from XML itself, derived from the Standard Generalized Markup Language that preceded XML. Likewise, XML 1.0 offers limited datatypes beyond simple strings, enumerated lists, and boolean operators. XML Schemas, on the other hand, use XML itself, thus relieving the end-user of mastering a separate syntax for document definition.

For business applications, XML Schema opens up a wider range of options. It makes XML more adaptable to business databases that use relational or object-oriented structures, as well as offering the ability to support more types of data, including those defined by the users themselves. But all of these features come with a cost, namely more complexity. And while XML Schema offers a more powerful set of tools for business, using those tools requires more intensive hands-on management by implementers and developers.

One of the more demanding jobs in writing XML schemas is preparing the associated documentation that implementers and developers need to understand their structure and contents. While tools exist to help develop schemas, few if any tools are available to extract the individual data items from schemas and present them in an easy-to-read and understandable form.

XML editors have features that generate documentation directly from the schema files. While the documentation produced by commercial editors is often comprehensive, it can also get also voluminous and contains much more than a simple table of components. The development of XML vocabularies, a job normally undertaken as a collaborative process, requires a way of capturing and documenting components from multiple schemas in a direct and simple way.  That is the function provided by the Componetizer.

2.2. Example, OTA_VehAvailRateRQ.XSD

Exhibit 1 and  Exhibit 2 appended to this document offer an example of the need for a tool like the Compentizer, with one of the schemas from the OpenTravel Alliance (OTA) specifications 2002A. This schema, OTA_VehAvailRateRQ.XSD, sends a request message for car rental availability and rates.

This schema, part of a larger collection of schemas in the OTA specifications, follows two general schema format or complex types, VehicleAvailRQCoreType and VehicleAvailRQAdditionalInfoType. These complex types provide common components that other OTA car-rental schemas can reuse, with obvious efficiencies for schema and message designer.

Someone familiar with the syntax could probably read and understand the contents of the schema, as listed in Exhibit 1, and identify those components and their properties. But the components of the schema and their properties are seen and understood much more clearly in Exhibit 2.

The tables prepared by the Componetizer provide seven characteristics for each component:

Container. The name of the component

Asset. The general type of schema used in the overall schema. OTA defines, for example, OTA_CommonTypes.xsd and OTA_SimpleTypes.xsd schemas as its major categories.

Tag. The OTA naming conventions applied to the component’s XML tags.

Type. Basic data type, known as primitive in XML Schema

Restriction. Values allowed to represent the component.

Extension. Additional restrictions or allowances for component values.

Definition. Brief description of the component


3. The Componetizer solution

The Componentizer uses a component object model (COM) architecture that defines a structure for program routines run in Microsoft Windows. DISA implements the Componetizer with Foxpro database software (Visual Foxpro, version 7.0).

Currently, the database schema is a series of tables mapped by identifiers to simulate a hierarchical relationship. XML Schema documents are hierarchical and capturing this content into a relational database is complex.  Subsequent releases of the Componetizer will store content in a native XML database so as to take advantage of the inherent hierarchical relationships.

All programming logic is written in an OO paradigm, with code written to optimize processing speed.


4. Componetizer applications

The Componetizer has immediate applications in DISA with payoffs for DISA’s affiliate services. But the Componetizer also provides opportunities for uses in expanded services for DISA and its affiliates.

4.1. Early applications

The Componetizer provides immediate payoffs in writing the documentation for schemas, including document type definitions or DTDs, developed by DISA’s affiliates. The most time-consuming part of the documentation is recording the details of the schema contents, which often involves tables. In the past, the specifications editors (industry volunteers or DISA staff) would manually capture these details in Word or Excel files. Any changes in the schemas would also mean adjusting or rewriting the tables, which for schemas of any complexity could take weeks.

With the Componetizer, however, DISA can generate the tables automatically. This tool enables industry groups to consider more comprehensive and complex schemas, as their business processes demand. It also enables DISA to publish the documentation for the proposed specifications more quickly.

4.2. Later applications

DISA uses the Componetizer now for documentation, but HTML tables are just one product from this tool. The Componetizer can be enhanced to provide graphical output (e.g., scalable vector graphics), spreadsheet formats, or word processing formats, as well as XML Topic Maps. With small adjustments, the Componetizer can also provide output in various database formats, which would enable DISA to establish a component store or warehouse. With the development of ebXML core components, the Componetizer could also assign core components as part of the database. With this step, DISA can also indicate where DISA affiliate components are semantically equivalent, which would provide a powerful interoperability feature.

This component warehouse will link to DISA’s registry initiative, known as DRIve. DRIve is a registry of standards and specifications developed by DISA's affiliated organizations, and compliant with version 2 the ebXML registry specifications. The component warehouse would assign a unique identifier to the schema components, which the registry would index as part of the metadata for the schema or component, depending on the level of detail appropriate for the specification.

The component warehouse could also support Web services applications. Universal Description, Discovery and Integration (UDDI) registries could list component identifiers as part of one or more tModel descriptions of services. Likewise, Web Services Description Language (WSDL) services could reference component identifiers as part of their Web service descriptions.


5. About the authors

Marcel Jemio

Marcel Jemio is DISA’s Director Of Technical Operations, serving as specifications manager for OpenTravel Alliance (OTA) since September 2001. For OTA and other DISA affiliated standards organizations, Jemio manages the direction and application of all XML related technology including XML Schema, XML Web Services (SOAP, ebXML MS 2.0, WSDL), XSLT, XPath, and SVG. Jemio also wrote the DISA Componetizer program that extracts XML components for rapid documentation of XML Schemas, and leads development of DISA’s XML Component Repository that inventories and maps several XML Schema vocabularies.

Jemio is DISA’s lead representative to the World Wide Web Consortium, and serves on the group’s Web Services Architecture and XML Schema working groups. Jemio also takes part in development of the ASC X12 XML Reference Model.

In previous work, Jemio served as a systems engineer with Excelon Corporation, where he managed multiple development teams, and deployed corporate Internet-based products for multiple clients. Jemio also conducted client pre-sales meetings and presentations, facilitated functional requirements sessions, and participated in the system/database design, development and deployment of corporate products. He also has experience in product and project management for other software and end-user companies.

Alan Kotok

Alan Kotok is DISA’s Director of Publishing and editor of E-Business Standards Today, published by DISA as an online daily newswire and in a weekly newsletter. Kotok previously served as DISA’s Director of Education and as standards manager for the OpenTravel Alliance.

Before joining DISA in 1999, Kotok served 10 years with Graphic Communications Association (GCA) as Director of Management Technologies and then as Vice President for Electronic Business. Before joining GCA, he served 15 years with U.S. Information Agency in the U.S. and overseas, becoming chief of the agency’s technology planning staff.

He is the author of two books, most recently ebXML: The New Global Standard for Doing Business on the Internet (with David Webber), ISBN: 0735711178, New Riders Publishing, August 2001. Kotok also writes frequently for the information technology trade press, and is author of three DISA white papers on e-business standards.


13 September 2002
Copyright ã 2002, Data Interchange Standards Association, all rights reserved

Free JavaScripts provided
by The JavaScript Source