Java XML DOM Parser | W3Docs Learn Java

The DOM (Document Object Model) parser reads an entire XML document into memory and hands you a tree of nodes you can navigate, query, and modify. It ships with the JDK in javax.xml.parsers and org.w3c.dom, so there is nothing to add to your classpath. DOM is the right tool when documents are small enough to hold in memory and you need random access to any part of the tree — reading a config file, transforming a payload, or building XML programmatically.

This chapter walks through the full lifecycle: how DOM models a document, how to parse a source into a tree, how to read and modify nodes, and how to write the tree back out as XML. If you are new to XML in Java, start with the XML introduction; for huge documents where memory matters, compare DOM against the streaming SAX parser.

How DOM models a document

DOM turns markup into a tree of Node objects. Every element, attribute, piece of text, and comment is a node, and the whole document hangs off a single Document root. You read the tree by asking nodes for their children, and you change it by creating, moving, or removing nodes.

Concept	Interface	What it represents
Document	`Document`	The whole parsed file; entry point to the tree
Element	`Element`	A tag such as `<book>`, with attributes and children
Attribute	`Attr`	A name/value pair on an element
Text	`Text`	Character data inside an element
Node list	`NodeList`	An ordered, index-addressable collection of nodes

The key trade-off: DOM is convenient because the whole tree is addressable, but it loads everything into memory at once. For multi-gigabyte feeds you would reach for SAX or StAX instead, which stream the document without building a tree. And if you are mapping XML to and from Java objects rather than walking raw nodes, JAXB is usually less code than hand-written DOM.

Parsing a document

You never construct a parser directly. You ask a DocumentBuilderFactory for a DocumentBuilder, then call parse on a stream, file, or URI. Configure the factory before building — namespace awareness and validation are factory-level switches.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);

DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File("library.xml"));

// Collapse adjacent text nodes and drop empty ones so getTextContent is clean.
doc.getDocumentElement().normalize();

parse throws SAXException for malformed XML and IOException if the source cannot be read, so both are checked exceptions you must handle. Calling normalize() once after parsing merges split text nodes — a common source of surprises when reading element text.

Navigating the tree

Two methods cover most reading: getElementsByTagName finds all descendants with a given tag, and getChildNodes returns the direct children of a node. Remember that getChildNodes includes whitespace text nodes, so filter by node type when you only want elements.

Element root = doc.getDocumentElement();          // <library>
NodeList books = doc.getElementsByTagName("book"); // every <book> in the tree

for (int i = 0; i < books.getLength(); i++) {
    Element book = (Element) books.item(i);
    String id = book.getAttribute("id");           // attribute by name
    String title = book.getElementsByTagName("title")
                       .item(0).getTextContent();   // first child <title> text
    System.out.println(id + " -> " + title);
}

NodeList is index-based, not iterable, so you loop with getLength() and item(i). getAttribute returns an empty string (never null) when the attribute is absent, which is worth knowing before you write a null check that never fires.

Modifying and creating nodes

The DOM tree is mutable. You change text with setTextContent, change attributes with setAttribute, and grow the tree by creating nodes through the Document factory methods and appending them. Nodes must be created by the same document they are inserted into.

// Update existing content.
Element price = (Element) book.getElementsByTagName("price").item(0);
price.setTextContent("49.50");
price.setAttribute("currency", "USD");

// Build a new subtree and attach it.
Element added = doc.createElement("book");
added.setAttribute("id", "b3");
Element title = doc.createElement("title");
title.setTextContent("The Pragmatic Programmer");
added.appendChild(title);
doc.getDocumentElement().appendChild(added);

createElement makes a detached node; nothing appears in the document until you appendChild it somewhere. To remove a node, call parent.removeChild(child).

Writing the tree back out

DOM has no toString() that produces XML. To serialize, hand the document to a Transformer with a DOMSource and a StreamResult. The same javax.xml.transform package lets you write to a file, a string, or any stream, and set pretty-printing options.

Transformer tr = TransformerFactory.newInstance().newTransformer();
tr.setOutputProperty(OutputKeys.INDENT, "yes");
tr.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");

tr.transform(new DOMSource(doc), new StreamResult(new File("out.xml")));

For untrusted input, harden the factory before parsing — disable DOCTYPE declarations with factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) to block XXE (XML External Entity) attacks.

A complete worked example

This program parses an in-memory library document, reads each book, raises every price by 10%, inserts a new book, and serializes the first updated price line back to XML — exercising the full read-modify-write cycle on a single tree.

java— editable, runs on the server

import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
import java.io.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String xml = """
            <?xml version="1.0" encoding="UTF-8"?>
            <library>
              <book id="b1" lang="en">
                <title>Effective Java</title>
                <price currency="USD">45.00</price>
              </book>
              <book id="b2" lang="en">
                <title>Clean Code</title>
                <price currency="USD">38.50</price>
              </book>
            </library>
            """;

// 1. Build a parser and load the whole document into memory.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
        doc.getDocumentElement().normalize();

// 2. Inspect the root element.
        Element root = doc.getDocumentElement();
        System.out.println("Root element: " + root.getNodeName());

// 3. Walk every <book> node and read children + attributes.
        NodeList books = doc.getElementsByTagName("book");
        System.out.println("Book count: " + books.getLength());
        double total = 0;
        for (int i = 0; i < books.getLength(); i++) {
            Element book = (Element) books.item(i);
            String id = book.getAttribute("id");
            String title = book.getElementsByTagName("title")
                               .item(0).getTextContent();
            Element priceEl = (Element) book.getElementsByTagName("price").item(0);
            double price = Double.parseDouble(priceEl.getTextContent());
            String currency = priceEl.getAttribute("currency");
            total += price;
            System.out.printf("  %s: %s (%.2f %s)%n", id, title, price, currency);
        }
        System.out.printf("Total catalogue value: %.2f%n", total);

// 4. Mutate the tree: bump every price by 10% and add a <book>.
        for (int i = 0; i < books.getLength(); i++) {
            Element price = (Element) ((Element) books.item(i))
                    .getElementsByTagName("price").item(0);
            double p = Double.parseDouble(price.getTextContent()) * 1.10;
            price.setTextContent(String.format("%.2f", p));
        }
        Element added = doc.createElement("book");
        added.setAttribute("id", "b3");
        Element t = doc.createElement("title");
        t.setTextContent("The Pragmatic Programmer");
        added.appendChild(t);
        root.appendChild(added);
        System.out.println("Books after insert: "
                + doc.getElementsByTagName("book").getLength());

// 5. Serialize the modified DOM back to XML text.
        Transformer tr = TransformerFactory.newInstance().newTransformer();
        tr.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        tr.setOutputProperty(OutputKeys.INDENT, "yes");
        StringWriter sw = new StringWriter();
        tr.transform(new DOMSource(doc), new StreamResult(sw));
        String first = sw.toString().lines()
                .filter(l -> l.contains("price"))
                .findFirst().orElse("");
        System.out.println("First price line after update:" + first.strip());
    }
}

What to take from the run:

The root element prints as library because getDocumentElement() returns the single top node that everything else hangs off.
getElementsByTagName("book") reports a count of 2 before the insert, confirming it collected both <book> descendants of the root.
Prices are read with getTextContent() and parsed with Double.parseDouble, so 45.00 and 38.50 sum to the printed total of 83.50.
After appendChild, the same getElementsByTagName("book") query returns 3, showing the live tree picked up the node created with doc.createElement.
The serialized first price line reads 49.50, proving setTextContent mutated the in-memory node and the Transformer wrote the updated value (45.00 raised by 10%) back to XML.

Practice

In the DOM API, why must you call doc.createElement() before appendChild() to add a new node?

Because a node must be created by the owning 'Document' so it belongs to that tree before it can be attachedBecause appendChild() only accepts strings, not Element objectsBecause createElement() immediately inserts the node at the end of the documentBecause DOM trees are immutable and createElement() returns a fresh copy of the whole document