Java XML Processing Overview | W3Docs Learn Java

XML (Extensible Markup Language) is a text format for representing structured, hierarchical data using nested tags. Long before JSON dominated web APIs, XML was the default for configuration files, document formats, and message exchange — and it is still everywhere, from Maven pom.xml files to SOAP services to Office documents.

Java has rich, built-in XML support in the JDK: you do not need any external library to read or write XML. The javax.xml.parsers and org.w3c.dom packages, plus org.xml.sax and javax.xml.stream, give you three distinct parsing models. This chapter maps out what XML is, which parsing model to reach for, and how XML compares to JSON — so the rest of this part (DOM, SAX, and JAXB) builds on a clear foundation.

This page covers:

What an XML document looks like and the terms you need (element, attribute, root, well-formed).
The three JDK parsing models — DOM, SAX, and StAX — and when each fits.
A runnable DOM example using only JDK classes.
How XML and JSON differ, so you can choose between them.

What XML looks like

An XML document is a tree of elements. Each element has a name, optional attributes, and either text content or nested child elements. There is always exactly one root element wrapping everything.

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
  <book id="1" lang="en">
    <title>Effective Java</title>
    <price>45.00</price>
  </book>
</catalog>

Here <catalog> is the root, <book> is a child element with two attributes (id and lang), and <title> and <price> carry text. The XML declaration on the first line states the version and character encoding. Well-formed XML requires every opening tag to be closed and properly nested.

The three parsing models

The JDK offers three ways to read XML, each with a different trade-off between convenience and memory. Picking the right one is the most important XML decision you will make.

Model	Style	Memory	Best for
DOM	Loads the whole tree into memory	High	Random access, editing, small/medium docs
SAX	Push events as it scans (callbacks)	Low	Large docs, read-only streaming
StAX	Pull events on demand (cursor)	Low	Large docs, with simpler control flow

DOM builds a complete in-memory tree you can navigate freely and modify. SAX fires callbacks (startElement, characters, endElement) as it reads, never holding the whole document. StAX is also streaming but lets your code pull the next event when ready, which is usually easier to follow than SAX callbacks.

DOM: the in-memory tree

DOM is the most convenient model when documents are small enough to fit in memory. You parse once, then walk or query the tree as often as you like.

import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document doc = factory.newDocumentBuilder().parse("catalog.xml");

NodeList books = doc.getElementsByTagName("book");
System.out.println("Books: " + books.getLength());

getElementsByTagName returns a live NodeList; you index into it and cast nodes to Element to read attributes and child text. The dedicated Java XML DOM parser chapter walks through navigating, modifying, and writing the tree in full.

Warning

By default the JDK parser resolves external entities, which exposes you to XXE (XML External Entity) attacks when parsing untrusted input. For production code that reads XML from outside your control, disable DTDs with factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); before creating the builder.

SAX and StAX: streaming

When a document is too large to hold in memory, you stream it. SAX pushes events to a handler you supply:

import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.Attributes;

DefaultHandler handler = new DefaultHandler() {
    public void startElement(String uri, String local, String name, Attributes a) {
        System.out.println("Start: " + name);
    }
};
SAXParserFactory.newInstance().newSAXParser()
    .parse("catalog.xml", handler);

StAX gives you a cursor you advance yourself, which many find clearer:

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.XMLStreamConstants;
import java.io.FileReader;

XMLStreamReader r = XMLInputFactory.newInstance()
    .createXMLStreamReader(new FileReader("catalog.xml"));
while (r.hasNext()) {
    if (r.next() == XMLStreamConstants.START_ELEMENT) {
        System.out.println("Start: " + r.getLocalName());
    }
}

See the Java XML SAX parser chapter for a complete event-handler walkthrough. If you would rather skip parsing nodes by hand and map XML straight onto Java objects, JAXB binds elements to annotated classes for you.

A self-contained example

The runnable example below uses only JDK classes — no Jackson or JAXB needed. It parses an XML catalog from an in-memory string with DOM, walks the <book> elements, reads attributes and child text, and sums the prices.

java— editable, runs on the server

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;

public class XmlIntroDemo {
    public static void main(String[] args) throws Exception {
        String xml = """
            <?xml version=\"1.0\" encoding=\"UTF-8\"?>
            <catalog>
              <book id=\"1\" lang=\"en\">
                <title>Effective Java</title>
                <price>45.00</price>
              </book>
              <book id=\"2\" lang=\"en\">
                <title>Clean Code</title>
                <price>38.50</price>
              </book>
            </catalog>
            """
;

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(
            new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
        doc.getDocumentElement().normalize();

Element root = doc.getDocumentElement();
        System.out.println("Root element: " + root.getNodeName());

NodeList books = doc.getElementsByTagName("book");
        System.out.println("Books found: " + books.getLength());

double total = 0;
        for (int i = 0; i < books.getLength(); i++) {
            Element book = (Element) books.item(i);
            String id = book.getAttribute("id");
            String title = book.getElementsByTagName("title")
                               .item(0).getTextContent();
            double price = Double.parseDouble(
                book.getElementsByTagName("price").item(0).getTextContent());
            total += price;
            System.out.printf("  #%s %s ($%.2f)%n", id, title, price);
        }
        System.out.printf("Total catalog value: $%.2f%n", total);
    }
}

What to take from the run:

DOM parsing needs no external dependency — DocumentBuilderFactory and org.w3c.dom ship with the JDK, which is why the program prints results with nothing on the classpath.
The root element name printed as catalog confirms there is exactly one root wrapping the whole document.
getElementsByTagName("book") returned a NodeList of length 2, so you index it like a list and cast each item to Element.
Attributes (id) are read with getAttribute, while text content (title, price) is read with getTextContent — they are different kinds of data on the same element.
Because the entire tree is in memory, summing prices across all books to $83.50 is just a loop with random access — the convenience that makes DOM worth its memory cost.

XML or JSON?

XML and JSON solve the same problem — exchanging structured data — but make different trade-offs.

Aspect	XML	JSON
Syntax	Verbose tags, opening and closing	Compact braces and brackets
Attributes	Yes (`id="1"`)	No — everything is a key/value pair
Comments	Supported (`<!-- ... -->`)	Not supported
Schema/validation	Mature (XSD, DTD)	JSON Schema, less ubiquitous
JDK support	Built in (`javax.xml.*`)	None built in — needs a library
Typical use today	Config, documents, SOAP, legacy systems	Web/REST APIs, modern services

Reach for XML when you must consume an existing XML format (a SOAP service, a pom.xml, an Office document) or when you need attributes, comments, or strict schema validation. Reach for JSON for new web APIs, where its smaller size and native browser support win.

Next steps

Java XML DOM parser — read, modify, and write the tree.
Java XML SAX parser — stream large documents with event handlers.
JAXB — map XML to and from annotated Java objects.
Java JSON introduction — the modern alternative for web data.

Practice

Which XML parsing model loads the entire document into memory as a navigable tree?

DOM, which builds a complete in-memory tree you can navigate and modifySAX, which pushes events to callback handlersStAX, which exposes a pull-based cursorNone of them keep the document in memory