Java XML DOM Parser
Parse XML documents into an in-memory tree in Java with the DOM parser.
Java XML DOM Parser
The DOM (Document Object Model) parser reads an entire XML document into memory and hands you a tree of nodes you can navigate, query, and modify. It ships with the JDK in javax.xml.parsers and org.w3c.dom, so there is nothing to add to your classpath. DOM is the right tool when documents are small enough to hold in memory and you need random access to any part of the tree — reading a config file, transforming a payload, or building XML programmatically.
How DOM models a document
DOM turns markup into a tree of Node objects. Every element, attribute, piece of text, and comment is a node, and the whole document hangs off a single Document root. You read the tree by asking nodes for their children, and you change it by creating, moving, or removing nodes.
| Concept | Interface | What it represents |
|---|---|---|
| Document | Document | The whole parsed file; entry point to the tree |
| Element | Element | A tag such as <book>, with attributes and children |
| Attribute | Attr | A name/value pair on an element |
| Text | Text | Character data inside an element |
| Node list | NodeList | An ordered, index-addressable collection of nodes |
The key trade-off: DOM is convenient because the whole tree is addressable, but it loads everything into memory at once. For multi-gigabyte feeds you would reach for SAX or StAX instead, which stream the document without building a tree.
Parsing a document
You never construct a parser directly. You ask a DocumentBuilderFactory for a DocumentBuilder, then call parse on a stream, file, or URI. Configure the factory before building — namespace awareness and validation are factory-level switches.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File("library.xml"));
// Collapse adjacent text nodes and drop empty ones so getTextContent is clean.
doc.getDocumentElement().normalize();parse throws SAXException for malformed XML and IOException if the source cannot be read, so both are checked exceptions you must handle. Calling normalize() once after parsing merges split text nodes — a common source of surprises when reading element text.
Navigating the tree
Two methods cover most reading: getElementsByTagName finds all descendants with a given tag, and getChildNodes returns the direct children of a node. Remember that getChildNodes includes whitespace text nodes, so filter by node type when you only want elements.
Element root = doc.getDocumentElement(); // <library>
NodeList books = doc.getElementsByTagName("book"); // every <book> in the tree
for (int i = 0; i < books.getLength(); i++) {
Element book = (Element) books.item(i);
String id = book.getAttribute("id"); // attribute by name
String title = book.getElementsByTagName("title")
.item(0).getTextContent(); // first child <title> text
System.out.println(id + " -> " + title);
}NodeList is index-based, not iterable, so you loop with getLength() and item(i). getAttribute returns an empty string (never null) when the attribute is absent, which is worth knowing before you write a null check that never fires.
Modifying and creating nodes
The DOM tree is mutable. You change text with setTextContent, change attributes with setAttribute, and grow the tree by creating nodes through the Document factory methods and appending them. Nodes must be created by the same document they are inserted into.
// Update existing content.
Element price = (Element) book.getElementsByTagName("price").item(0);
price.setTextContent("49.50");
price.setAttribute("currency", "USD");
// Build a new subtree and attach it.
Element added = doc.createElement("book");
added.setAttribute("id", "b3");
Element title = doc.createElement("title");
title.setTextContent("The Pragmatic Programmer");
added.appendChild(title);
doc.getDocumentElement().appendChild(added);createElement makes a detached node; nothing appears in the document until you appendChild it somewhere. To remove a node, call parent.removeChild(child).
Writing the tree back out
DOM has no toString() that produces XML. To serialize, hand the document to a Transformer with a DOMSource and a StreamResult. The same javax.xml.transform package lets you write to a file, a string, or any stream, and set pretty-printing options.
Transformer tr = TransformerFactory.newInstance().newTransformer();
tr.setOutputProperty(OutputKeys.INDENT, "yes");
tr.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
tr.transform(new DOMSource(doc), new StreamResult(new File("out.xml")));For untrusted input, harden the factory before parsing — disable DOCTYPE declarations with factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) to block XXE (XML External Entity) attacks.
A complete worked example
This program parses an in-memory library document, reads each book, raises every price by 10%, inserts a new book, and serializes the first updated price line back to XML — exercising the full read-modify-write cycle on a single tree.
What to take from the run:
- The root element prints as
librarybecausegetDocumentElement()returns the single top node that everything else hangs off. getElementsByTagName("book")reports a count of 2 before the insert, confirming it collected both<book>descendants of the root.- Prices are read with
getTextContent()and parsed withDouble.parseDouble, so45.00and38.50sum to the printed total of83.50. - After
appendChild, the samegetElementsByTagName("book")query returns 3, showing the live tree picked up the node created withdoc.createElement. - The serialized first price line reads
49.50, provingsetTextContentmutated the in-memory node and theTransformerwrote the updated value (45.00 raised by 10%) back to XML.
Practice
In the DOM API, why must you call doc.createElement() before appendChild() to add a new node?