Everyone says JSON won. But I still use XML regularly - for document processing, enterprise integrations, and anywhere I need real validation. Here's when and why.
The project that made me appreciate XML
I used to roll my eyes at XML. Then I worked on a healthcare integration project where data validation was legally required. JSON Schema couldn't cut it. XSD saved us weeks of manual validation code.
Now I see XML and JSON as tools for different jobs. JSON for APIs and lightweight data. XML when you need schema validation, document structure, or you're integrating with systems that have used XML for decades. Knowing both makes you more versatile.
XML vs JSON: When to Choose Which
The choice between XML and JSON is not about which is better—each format serves different purposes. Understanding their strengths helps you make informed architectural decisions.
- Choose JSON for: REST APIs, JavaScript applications, simple data structures, mobile apps, and when minimizing payload size matters
- Choose XML for: Document-centric data, strict schema validation, mixed content (text with markup), namespaces for vocabulary mixing, and regulated industries requiring formal schemas
- XML attributes provide metadata without adding complexity; JSON lacks this distinction
- XML namespaces allow mixing vocabularies; JSON has no native equivalent
- XML supports document validation via XSD, DTD, or Relax NG; JSON Schema is less mature and widely adopted
- XML preserves whitespace and formatting when needed; JSON normalizes whitespace
- XML comments are part of the format; JSON does not support comments
- Both support Unicode and can represent hierarchical data structures
Understanding XML Namespaces
Namespaces solve the problem of element name collisions when combining XML vocabularies. They use URIs (usually URLs) as unique identifiers, though the URI does not need to point to an actual resource.
- Default namespace: xmlns="http://example.com/ns" applies to unprefixed elements
- Prefixed namespace: xmlns:prefix="http://example.com/ns" requires explicit prefix:element syntax
- Namespace scope: Declaration applies to the element and all descendants unless overridden
- The URI is just an identifier—it does not need to be a valid URL or resolve to anything
- Common namespaces: XHTML (http://www.w3.org/1999/xhtml), SOAP, XLink, and SVG
- Namespace-aware parsers distinguish elements by (namespace, local-name) pairs
- Attributes can have namespaces too, though unqualified attributes belong to no namespace
- Multiple namespaces can coexist in a single document using different prefixes
XML Validation with XSD (XML Schema Definition)
XSD provides powerful validation capabilities that go far beyond what JSON Schema offers. It defines the structure, data types, and constraints for XML documents with precision.
- Element types: Define complex types with sequences, choices, or all groups
- Data types: XSD includes 44 built-in types (string, integer, decimal, date, boolean, etc.)
- Constraints: minOccurs, maxOccurs, minLength, maxLength, pattern (regex), enumeration
- Custom types: Extend or restrict base types to create domain-specific types
- Namespace support: XSD validates namespaced documents correctly
- Key constraints: unique, key, and keyref enforce referential integrity
- Substitution groups: Allow polymorphic element substitution
- Annotations: Document your schema with appinfo and documentation elements
Validation Alternatives: DTD and RELAX NG
XSD is not the only validation option. DTD (Document Type Definition) and RELAX NG offer alternatives with different trade-offs.
- DTD: Original XML validation method; simpler but limited (no namespaces, few data types)
- DTD is still used for entity declarations and basic structure validation
- RELAX NG: Modern alternative with simpler syntax than XSD and full namespace support
- RELAX NG Compact: An even more readable non-XML syntax for RELAX NG schemas
- Schematron: Rule-based validation for business logic constraints XSD cannot express
- Many projects combine schemas: XSD for structure, Schematron for business rules
- For new projects, prefer XSD for widespread tool support or RELAX NG for simplicity
- Legacy systems often use DTDs; migration to XSD is possible but may not be worth the effort
DOM Parsing: Full Document Access
DOM (Document Object Model) parsing loads the entire XML document into memory as a tree structure. This provides random access to any element but requires memory proportional to document size.
- Loads complete document into memory as navigable tree
- Allows random access: jump to any element, traverse up/down/sideways
- Supports document modification: add, remove, and change elements
- XPath queries work on DOM trees for powerful element selection
- Memory usage: roughly 5-10x the file size in memory
- Best for: Small to medium documents (under 10MB), documents needing modification, complex queries
- Available in all major languages: JavaScript (DOMParser), Python (xml.dom), Java (javax.xml.parsers)
- Modern browsers provide native DOM parsing for XML just like HTML
SAX Parsing: Memory-Efficient Streaming
SAX (Simple API for XML) is an event-based parser that reads XML sequentially without building a tree. It calls handler functions when encountering elements, attributes, and text. This uses minimal memory but only allows forward-only reading.
- Event-driven: Your code receives callbacks for startElement, endElement, characters, etc.
- Constant memory: Processes gigabyte files with minimal RAM
- Forward-only: Cannot go back or look ahead in the document
- No modification: Read-only; you cannot change the document
- Best for: Large files, data extraction tasks, streaming scenarios
- Requires managing state: You track context (current element path) manually
- Pull parsers (StAX in Java, xmlreader in Python) offer similar efficiency with iterator-style API
- For validation during SAX parsing, use a validating parser with schema attached
Choosing Between DOM and SAX
The choice between DOM and SAX depends on your use case. Modern applications often use hybrid approaches, processing large documents in chunks while using DOM for complex subsections.
- File size under 10MB with complex queries? Use DOM
- File size over 100MB? SAX or streaming parser is essential
- Need to modify the document? DOM (or load section, modify, stream out)
- Extracting specific data from large files? SAX with state machine
- Web services with medium payloads? DOM is usually fine
- Processing log files or data feeds? SAX for efficiency
- XPath queries required? DOM or streaming XPath libraries
- Memory-constrained environment? SAX or pull parser
Modern XML Processing Libraries
Modern XML libraries offer better APIs than the standard DOM and SAX implementations. They provide cleaner syntax, better error messages, and often better performance.
- Python: lxml (C-based, fast, XPath 1.0/2.0, XSLT), ElementTree (stdlib, simple API)
- Java: JAXB (binding to objects), StAX (pull parser), Jackson XML (JSON-like API)
- JavaScript: fast-xml-parser (fast, configurable), xml2js (promises, simple), libxmljs (native bindings)
- .NET: LINQ to XML (modern API), XmlReader (streaming), XDocument (querying)
- Go: encoding/xml (stdlib), etree (ElementTree-like API)
- Rust: quick-xml (fast streaming), serde-xml-rs (serde integration)
- For web browsers: DOMParser and XMLSerializer are built-in and efficient
- Consider schema-driven code generation (JAXB, xjc) for strongly-typed access to known schemas
Where XML Still Dominates
Despite JSON popularity, XML remains the standard format in many domains. Knowing these areas helps you understand where XML expertise is valuable.
- Enterprise integration: SOAP services, EDI, B2B communications
- Document formats: OOXML (Office), ODF (LibreOffice), EPUB, DocBook
- Configuration: Maven pom.xml, Ant, Spring XML config, Android layouts
- Graphics: SVG (scalable vector graphics) used everywhere on the web
- Feeds: RSS and Atom for content syndication
- Healthcare: HL7 CDA, FHIR (increasingly), DICOM metadata
- Finance: FpML, XBRL for regulatory reporting, ISO 20022 messaging
- Government and legal: Often mandated by regulation for data exchange
Best Practices for XML in 2026
Whether maintaining legacy systems or choosing XML for new projects, following best practices ensures maintainability and interoperability.
- Always use a schema (XSD preferred) for production XML formats
- Validate early: Validate input at system boundaries before processing
- Namespace everything: Even if you control all vocabularies, namespaces prevent future conflicts
- Use meaningful element names: Self-documenting XML reduces need for external documentation
- Prefer elements over attributes for data; use attributes for metadata
- Handle encoding correctly: Declare encoding in XML declaration, prefer UTF-8
- Pretty-print for human consumption, minify for transmission if size matters
- Use XML tools for XML: Regex parsing XML is fragile and error-prone
Conclusion
XML is not dead—it is specialized. While JSON handles most API and configuration needs today, XML remains essential for document processing, strict validation, enterprise integration, and regulated industries.
Understanding XML deeply—namespaces, validation with XSD, and efficient parsing strategies—makes you effective in the many domains where XML is the standard. The choice between DOM and SAX parsing depends on your specific use case: random access and modification favor DOM, while large files and streaming scenarios demand SAX or pull parsers.
FAQ
Is XML still used in modern development?
Absolutely. Enterprise systems, Office documents, SVG graphics, RSS feeds, healthcare data, financial reporting. It's everywhere once you look. JSON didn't replace XML - they serve different purposes.
What is the main difference between XML and JSON?
XML is for documents with validation, namespaces, and mixed content. JSON is for simple data structures. I use JSON for APIs, XML when I need schema validation or I'm working with document-centric data.
What are XML namespaces and why do they matter?
They prevent name collisions when combining XML vocabularies. Without them, 'title' in one format would conflict with 'title' in another. Essential for large integrations.
When should I use DOM vs SAX parsing?
DOM loads everything into memory - fine for small files. SAX streams through with constant memory - necessary for large files. My rule: under 10MB use DOM, over 100MB use SAX.
What is XSD and why should I use it?
XSD defines the structure and data types for XML documents. It catches invalid data automatically. I use it whenever data correctness matters - it's like TypeScript for XML.
Can I use XML and JSON together?
All the time. Accept JSON from web clients, convert to XML for enterprise backends, convert back. Every language has libraries for this. Pick the right format for each context.
What XML library should I use in JavaScript?
Browser: built-in DOMParser. Node.js: fast-xml-parser for speed, xml2js for simplicity. I usually reach for fast-xml-parser these days.
How do I handle large XML files efficiently?
SAX or streaming parsers - they don't load the whole file into memory. Combine with state machines to track position. For Java, StAX. For Python, xmlreader. Don't try DOM on gigabyte files.