13 November, 2000
1. Document Object Model Core
Editors
Arnaud Le Hors, IBM
Gavin Nicol, Inso EPS (for DOM Level 1)
Lauren Wood, SoftQuad, Inc. (for DOM Level 1)
Mike Champion, ArborText (for DOM Level 1 from November 20, 1997)
Steve Byrne, JavaSoft (for DOM Level 1 until November 19, 1997)
Table of contents
* 1.1. Overview of the DOM Core Interfaces
o 1.1.1. The DOM Structure Model
o 1.1.2. Memory Management
o 1.1.3. Naming Conventions
o 1.1.4. Inheritance vs. Flattened Views of the API
o 1.1.5. The DOMString type
o 1.1.6. The DOMTimeStamp type
o 1.1.7. String comparisons in the DOM
o 1.1.8. XML Namespaces
* 1.2. Fundamental Interfaces
o DOMException, ExceptionCode, DOMImplementation, DocumentFragment, Document, Node, NodeList, NamedNodeMap, CharacterData, Attr, Element, Text, Comment
* 1.3. Extended Interfaces
o CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstruction
1.1. Overview of the DOM Core Interfaces
This section defines a set of objects and interfaces for accessing and manipulating document objects. The functionality specified in this section (the Core functionality) is sufficient to allow software developers and web script authors to access and manipulate parsed HTML and XML content inside conforming products. The DOM Core API also allows creation and population of a Document object using only DOM API calls; loading a Document and saving it persistently is left to the product that implements the DOM API.
1.1.1. The DOM Structure Model
The DOM presents documents as a hierarchy of Node objects that also implement other, more specialized interfaces. Some types of nodes may have child nodes of various types, and others are leaf nodes that cannot have anything below them in the document structure. For XML and HTML, the node types, and which node types they may have as children, are as follows:
* Document -- Element (maximum of one), ProcessingInstruction, Comment, DocumentType (maximum of one)
* DocumentFragment -- Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference
* DocumentType -- no children
* EntityReference -- Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference
* Element -- Element, Text, Comment, ProcessingInstruction, CDATASection, EntityReference
* Attr -- Text, EntityReference
* ProcessingInstruction -- no children
* Comment -- no children
* Text -- no children
* CDATASection -- no children
* Entity -- Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference
* Notation -- no children
The DOM also specifies a NodeList interface to handle ordered lists of Nodes, such as the children of a Node, or the elements returned by the getElementsByTagName method of the Element interface, and also a NamedNodeMap interface to handle unordered sets of nodes referenced by their name attribute, such as the attributes of an Element. NodeList and NamedNodeMap objects in the DOM are live; that is, changes to the underlying document structure are reflected in all relevant NodeList and NamedNodeMap objects. For example, if a DOM user gets a NodeList object containing the children of an Element, then subsequently adds more children to that element (or removes children, or modifies them), those changes are automatically reflected in the NodeList, without further action on the user's part. Likewise, changes to a Node in the tree are reflected in all references to that Node in NodeList and NamedNodeMap objects.
Finally, the interfaces Text, Comment, and CDATASection all inherit from the CharacterData interface.
1.1.2. Memory Management
Most of the APIs defined by this specification are interfaces rather than classes. That means that an implementation need only expose methods with the defined names and specified operation, not implement classes that correspond directly to the interfaces. This allows the DOM APIs to be implemented as a thin veneer on top of legacy applications with their own data structures, or on top of newer applications with different class hierarchies. This also means that ordinary constructors (in the Java or C++ sense) cannot be used to create DOM objects, since the underlying objects to be constructed may have little relationship to the DOM interfaces. The conventional solution to this in object-oriented design is to define factory methods that create instances of objects that implement the various interfaces. Objects implementing some interface "X" are created by a "createX()" method on the Document interface; this is because all DOM objects live in the context of a specific Document.
The DOM Level 2 API does not define a standard way to create DOMImplementation objects; DOM implementations must provide some proprietary way of bootstrapping these DOM interfaces, and then all other objects can be built from there.
The Core DOM APIs are designed to be compatible with a wide range of languages, including both general-user scripting languages and the more challenging languages used mostly by professional programmers. Thus, the DOM APIs need to operate across a variety of memory management philosophies, from language bindings that do not expose memory management to the user at all, through those (notably Java) that provide explicit constructors but provide an automatic garbage collection mechanism to automatically reclaim unused memory, to those (especially C/C++) that generally require the programmer to explicitly allocate object memory, track where it is used, and explicitly free it for re-use. To ensure a consistent API across these platforms, the DOM does not address memory management issues at all, but instead leaves these for the implementation. Neither of the explicit language bindings defined by the DOM API (for ECMAScript and Java) require any memory management methods, but DOM bindings for other languages (especially C or C++) may require such support. These extensions will be the responsibility of those adapting the DOM API to a specific language, not the DOM Working Group.
1.1.3. Naming Conventions
While it would be nice to have attribute and method names that are short, informative, internally consistent, and familiar to users of similar APIs, the names also should not clash with the names in legacy APIs supported by DOM implementations. Furthermore, both OMG IDL and ECMAScript have significant limitations in their ability to disambiguate names from different namespaces that make it difficult to avoid naming conflicts with short, familiar names. So, DOM names tend to be long and descriptive in order to be unique across all environments.
The Working Group has also attempted to be internally consistent in its use of various terms, even though these may not be common distinctions in other APIs. For example, the DOM API uses the method name "remove" when the method changes the structural model, and the method name "delete" when the method gets rid of something inside the structure model. The thing that is deleted is not returned. The thing that is removed may be returned, when it makes sense to return it.
1.1.4. Inheritance vs. Flattened Views of the API
The DOM Core APIs present two somewhat different sets of interfaces to an XML/HTML document: one presenting an "object oriented" approach with a hierarchy of inheritance, and a "simplified" view that allows all manipulation to be done via the Node interface without requiring casts (in Java and other C-like languages) or query interface calls in COM environments. These operations are fairly expensive in Java and COM, and the DOM may be used in performance-critical environments, so we allow significant functionality using just the Node interface. Because many other users will find the inheritance hierarchy easier to understand than the "everything is a Node" approach to the DOM, we also support the full higher-level interfaces for those who prefer a more object-oriented API.
In practice, this means that there is a certain amount of redundancy in the API. The Working Group considers the "inheritance" approach the primary view of the API, and the full set of functionality on Node to be "extra" functionality that users may employ, but that does not eliminate the need for methods on other interfaces that an object-oriented analysis would dictate. (Of course, when the O-O analysis yields an attribute or method that is identical to one on the Node interface, we don't specify a completely redundant one.) Thus, even though there is a generic nodeName attribute on the Node interface, there is still a tagName attribute on the Element interface; these two attributes must contain the same value, but the it is worthwhile to support both, given the different constituencies the DOM API must satisfy.
1.1.5. The DOMString type
To ensure interoperability, the DOM specifies the following:
*
Type Definition DOMString
A DOMString is a sequence of 16-bit units.
IDL Definition
valuetype DOMString sequence
* Applications must encode DOMString using UTF-16 (defined in [Unicode] and Amendment 1 of [ISO/IEC 10646]).
The UTF-16 encoding was chosen because of its widespread industry practice. Note that for both HTML and XML, the document character set (and therefore the notation of numeric character references) is based on UCS [ISO-10646]. A single numeric character reference in a source document may therefore in some cases correspond to two 16-bit units in a DOMString (a high surrogate and a low surrogate).
Note: Even though the DOM defines the name of the string type to be DOMString, bindings may use different names. For example for Java, DOMString is bound to the String type because it also uses UTF-16 as its encoding.
Note: As of August 2000, the OMG IDL specification ([OMGIDL]) included a wstring type. However, that definition did not meet the interoperability criteria of the DOM API since it relied on negotiation to decide the width and encoding of a character.
1.1.6. The DOMTimeStamp type
To ensure interoperability, the DOM specifies the following:
*
Type Definition DOMTimeStamp
A DOMTimeStamp represents a number of milliseconds.
IDL Definition
typedef unsigned long long DOMTimeStamp;
*
Note: Even though the DOM uses the type DOMTimeStamp, bindings may use different types. For example for Java, DOMTimeStamp is bound to the long type. In ECMAScript, TimeStamp is bound to the Date type because the range of the integer type is too small.
1.1.7. String comparisons in the DOM
The DOM has many interfaces that imply string matching. HTML processors generally assume an uppercase (less often, lowercase) normalization of names for such things as elements, while XML is explicitly case sensitive. For the purposes of the DOM, string matching is performed purely by binary comparison of the 16-bit units of the DOMString. In addition, the DOM assumes that any case normalizations take place in the processor, before the DOM structures are built.
Note: Besides case folding, there are additional normalizations that can be applied to text. The W3C I18N Working Group is in the process of defining exactly which normalizations are necessary, and where they should be applied. The W3C I18N Working Group expects to require early normalization, which means that data read into the DOM is assumed to already be normalized. The DOM and applications built on top of it in this case only have to assure that text remains normalized when being changed. For further details, please see [Charmod].
1.1.8. XML Namespaces
The DOM Level 2 supports XML namespaces [Namespaces] by augmenting several interfaces of the DOM Level 1 Core to allow creating and manipulating elements and attributes associated to a namespace.
As far as the DOM is concerned, special attributes used for declaring XML namespaces are still exposed and can be manipulated just like any other attribute. However, nodes are permanently bound to namespace URIs as they get created. Consequently, moving a node within a document, using the DOM, in no case results in a change of its namespace prefix or namespace URI. Similarly, creating a node with a namespace prefix and namespace URI, or changing the namespace prefix of a node, does not result in any addition, removal, or modification of any special attributes for declaring the appropriate XML namespaces. Namespace validation is not enforced; the DOM application is responsible. In particular, since the mapping between prefixes and namespace URIs is not enforced, in general, the resulting document cannot be serialized naively. For example, applications may have to declare every namespace in use when serializing a document.
DOM Level 2 doesn't perform any URI normalization or canonicalization. The URIs given to the DOM are assumed to be valid (e.g., characters such as whitespaces are properly escaped), and no lexical checking is performed. Absolute URI references are treated as strings and compared literally. How relative namespace URI references are treated is undefined. To ensure interoperability only absolute namespace URI references (i.e., URI references beginning with a scheme name and a colon) should be used. Note that because the DOM does no lexical checking, the empty string will be treated as a real namespace URI in DOM Level 2 methods. Applications must use the value null as the namespaceURI parameter for methods if they wish to have no namespace.
Note: In the DOM, all namespace declaration attributes are by definition bound to the namespace URI: "http://www.w3.org/2000/xmlns/". These are the attributes whose namespace prefix or qualified name is "xmlns". Although, at the time of writing, this is not part of the XML Namespaces specification [Namespaces], it is planned to be incorporated in a future revision.
In a document with no namespaces, the child list of an EntityReference node is always the same as that of the corresponding Entity. This is not true in a document where an entity contains unbound namespace prefixes. In such a case, the descendants of the corresponding EntityReference nodes may be bound to different namespace URIs, depending on where the entity references are. Also, because, in the DOM, nodes always remain bound to the same namespace URI, moving such EntityReference nodes can lead to documents that cannot be serialized. This is also true when the DOM Level 1 method createEntityReference of the Document interface is used to create entity references that correspond to such entities, since the descendants of the returned EntityReference are unbound. The DOM Level 2 does not support any mechanism to resolve namespace prefixes. For all of these reasons, use of such entities and entity references should be avoided or used with extreme care. A future Level of the DOM may include some additional support for handling these.
The new methods, such as createElementNS and createAttributeNS of the Document interface, are meant to be used by namespace aware applications. Simple applications that do not use namespaces can use the DOM Level 1 methods, such as createElement and createAttribute. Elements and attributes created in this way do not have any namespace prefix, namespace URI, or local name.
Note: DOM Level 1 methods are namespace ignorant. Therefore, while it is safe to use these methods when not dealing with namespaces, using them and the new ones at the same time should be avoided. DOM Level 1 methods solely identify attribute nodes by their nodeName. On the contrary, the DOM Level 2 methods related to namespaces, identify attribute nodes by their namespaceURI and localName. Because of this fundamental difference, mixing both sets of methods can lead to unpredictable results. In particular, using setAttributeNS, an element may have two attributes (or more) that have the same nodeName, but different namespaceURIs. Calling getAttribute with that nodeName could then return any of those attributes. The result depends on the implementation. Similarly, using setAttributeNode, one can set two attributes (or more) that have different nodeNames but the same prefix and namespaceURI. In this case getAttributeNodeNS will return either attribute, in an implementation dependent manner. The only guarantee in such cases is that all methods that access a named item by its nodeName will access the same item, and all methods which access a node by its URI and local name will access the same node. For instance, setAttribute and setAttributeNS affect the node that getAttribute and getAttributeNS, respectively, return.
1.2. Fundamental Interfaces
The interfaces within this section are considered fundamental, and must be fully implemented by all conforming implementations of the DOM, including all HTML DOM implementations [DOM Level 2 HTML], unless otherwise specified.
A DOM application may use the hasFeature(feature, version) method of the DOMImplementation interface with parameter values "Core" and "2.0" (respectively) to determine whether or not this module is supported by the implementation. Any implementation that conforms to DOM Level 2 or a DOM Level 2 module must conform to the Core module. Please refer to additional information about conformance in this specification.
Exception DOMException
DOM operations only raise exceptions in "exceptional" circumstances, i.e., when an operation is impossible to perform (either for logical reasons, because data is lost, or because the implementation has become unstable). In general, DOM methods return specific error values in ordinary processing situations, such as out-of-bound errors when using NodeList.
Implementations should raise other exceptions under other circumstances. For example, implementations should raise an implementation-dependent exception if a null argument is passed.
Some languages and object systems do not support the concept of exceptions. For such systems, error conditions may be indicated using native error reporting mechanisms. For some bindings, for example, methods may return error codes similar to those listed in the corresponding method descriptions.
IDL Definition
exception DOMException {
unsigned short code;
};
// ExceptionCode
const unsigned short INDEX_SIZE_ERR = 1;
const unsigned short DOMSTRING_SIZE_ERR = 2;
const unsigned short HIERARCHY_REQUEST_ERR = 3;
const unsigned short WRONG_DOCUMENT_ERR = 4;
const unsigned short INVALID_CHARACTER_ERR = 5;
const unsigned short NO_DATA_ALLOWED_ERR = 6;
const unsigned short NO_MODIFICATION_ALLOWED_ERR = 7;
const unsigned short NOT_FOUND_ERR = 8;
const unsigned short NOT_SUPPORTED_ERR = 9;
const unsigned short INUSE_ATTRIBUTE_ERR = 10;
// Introduced in DOM Level 2:
const unsigned short INVALID_STATE_ERR = 11;
// Introduced in DOM Level 2:
const unsigned short SYNTAX_ERR = 12;
// Introduced in DOM Level 2:
const unsigned short INVALID_MODIFICATION_ERR = 13;
// Introduced in DOM Level 2:
const unsigned short NAMESPACE_ERR = 14;
// Introduced in DOM Level 2:
const unsigned short INVALID_ACCESS_ERR = 15;
Definition group ExceptionCode
An integer indicating the type of error generated.
Note: Other numeric codes are reserved for W3C for possible future use.
Defined Constants
DOMSTRING_SIZE_ERR
If the specified range of text does not fit into a DOMString
HIERARCHY_REQUEST_ERR
If any node is inserted somewhere it doesn't belong
INDEX_SIZE_ERR
If index or size is negative, or greater than the allowed value
INUSE_ATTRIBUTE_ERR
If an attempt is made to add an attribute that is already in use elsewhere
INVALID_ACCESS_ERR, introduced in DOM Level 2.
If a parameter or an operation is not supported by the underlying object.
INVALID_CHARACTER_ERR
If an invalid or illegal character is specified, such as in a name. See production 2 in the XML specification for the definition of a legal character, and production 5 for the definition of a legal name character.
INVALID_MODIFICATION_ERR, introduced in DOM Level 2.
If an attempt is made to modify the type of the underlying object.
INVALID_STATE_ERR, introduced in DOM Level 2.
If an attempt is made to use an object that is not, or is no longer, usable.
NAMESPACE_ERR, introduced in DOM Level 2.
If an attempt is made to create or change an object in a way which is incorrect with regard to namespaces.
NOT_FOUND_ERR
If an attempt is made to reference a node in a context where it does not exist
NOT_SUPPORTED_ERR
If the implementation does not support the requested type of object or operation.
NO_DATA_ALLOWED_ERR
If data is specified for a node which does not support data
NO_MODIFICATION_ALLOWED_ERR
If an attempt is made to modify an object where modifications are not allowed
SYNTAX_ERR, introduced in DOM Level 2.
If an invalid or illegal string is specified.
WRONG_DOCUMENT_ERR
If a node is used in a different document than the one that created it (that doesn't support it)
Interface DOMImplementation
The DOMImplementation interface provides a number of methods for performing operations that are independent of any particular instance of the document object model.
IDL Definition
interface DOMImplementation {
boolean hasFeature(in DOMString feature,
in DOMString version);
// Introduced in DOM Level 2:
DocumentType createDocumentType(in DOMString qualifiedName,
in DOMString publicId,
in DOMString systemId)
raises(DOMException);
// Introduced in DOM Level 2:
Document createDocument(in DOMString namespaceURI,
in DOMString qualifiedName,
in DocumentType doctype)
raises(DOMException);
};
Methods
createDocument introduced in DOM Level 2
Creates an XML Document object of the specified type with its document element. HTML-only DOM implementations do not need to implement this method.
Parameters
namespaceURI of type DOMString
The namespace URI of the document element to create.
qualifiedName of type DOMString
The qualified name of the document element to be created.
doctype of type DocumentType
The type of document to be created or null.
When doctype is not null, its Node.ownerDocument attribute is set to the document being created.
Return Value
Document
A new Document object.
Exceptions
DOMException
INVALID_CHARACTER_ERR: Raised if the specified qualified name contains an illegal character.
NAMESPACE_ERR: Raised if the qualifiedName is malformed, if the qualifiedName has a prefix and the namespaceURI is null, or if the qualifiedName has a prefix that is "xml" and the namespaceURI is different from "http://www.w3.org/XML/1998/namespace" [Namespaces].
WRONG_DOCUMENT_ERR: Raised if doctype has already been used with a different document or was created from a different implementation.
createDocumentType introduced in DOM Level 2
Creates an empty DocumentType node. Entity declarations and notations are not made available. Entity reference expansions and default attribute additions do not occur. It is expected that a future version of the DOM will provide a way for populating a DocumentType.
HTML-only DOM implementations do not need to implement this method.
Parameters
qualifiedName of type DOMString
The qualified name of the document type to be created.
publicId of type DOMString
The external subset public identifier.
systemId of type DOMString
The external subset system identifier.
Return Value
DocumentType
A new DocumentType node with Node.ownerDocument set to null.
Exceptions
DOMException
INVALID_CHARACTER_ERR: Raised if the specified qualified name contains an illegal character.
NAMESPACE_ERR: Raised if the qualifiedName is malformed.
hasFeature
Test if the DOM implementation implements a specific feature.
Parameters
feature of type DOMString
The name of the feature to test (case-insensitive). The values used by DOM features are defined throughout the DOM Level 2 specifications and listed in the Conformance section. The name must be an XML name. To avoid possible conflicts, as a convention, names referring to features defined outside the DOM specification should be made unique by reversing the name of the Internet domain name of the person (or the organization that the person belongs to) who defines the feature, component by component, and using this as a prefix. For instance, the W3C SVG Working Group defines the feature "org.w3c.dom.svg".
version of type DOMString
This is the version number of the feature to test. In Level 2, the string can be either "2.0" or "1.0". If the version is not specified, supporting any version of the feature causes the method to return true.
Return Value
boolean
true if the feature is implemented in the specified version, false otherwise.
No Exceptions
Interface DocumentFragment
DocumentFragment is a "lightweight" or "minimal" Document object. It is very common to want to be able to extract a portion of a document's tree or to create a new fragment of a document. Imagine implementing a user command like cut or rearranging a document by moving fragments around. It is desirable to have an object which can hold such fragments and it is quite natural to use a Node for this purpose. While it is true that a Document object could fulfill this role, a Document object can potentially be a heavyweight object, depending on the underlying implementation. What is really needed for this is a very lightweight object. DocumentFragment is such an object.
Furthermore, various operations -- such as inserting nodes as children of another Node -- may take DocumentFragment objects as arguments; this results in all the child nodes of the DocumentFragment being moved to the child list of this node.
The children of a DocumentFragment node are zero or more nodes representing the tops of any sub-trees defining the structure of the document. DocumentFragment nodes do not need to be well-formed XML documents (although they do need to follow the rules imposed upon well-formed XML parsed entities, which can have multiple top nodes). For example, a DocumentFragment might have only one child and that child node could be a Text node. Such a structure model represents neither an HTML document nor a well-formed XML document.
When a DocumentFragment is inserted into a Document (or indeed any other Node that may take children) the children of the DocumentFragment and not the DocumentFragment itself are inserted into the Node. This makes the DocumentFragment very useful when the user wishes to create nodes that are
No comments:
Post a Comment