• Register
31,120 points
10 6 4


Well naturally you can constantly extract the top-level elements . In C#, you'd apply the XmlDocument class. Example given-  in case your XML file looked something like this:

     Some text
     Some other text

Later on, you'd apply code like this to extract all of the Pieces:

XmlDocument doc = new XmlDocument();
doc.Load("<path to xml file>");
XmlNodeList nl = doc.GetElementsByTagName("Piece");
foreach (XmlNode n in nl)
    // Do something with each Piece node

At one time you've obtain the nodes, you can work something with them in your code, or you can transfer the complete text of the node to its own XML document and act on that as if it were an independent piece of XML.


This Groovy-script is executing StAX (Streaming API for XML) to split an XML document between the top-level elements (that shares the same QName as the first child of the root-document). It's performs very fast, handles arbitrary large documents and is very crucial at the time you intend to split a large batch-file into smaller pieces.


Needs Groovy on Java 6 or a StAX API and execution such as Woodstox in the CLASSPATH

import javax.xml.stream.*

pieces = 5
input = "input.xml"
output = "output_%04d.xml"
eventFactory = XMLEventFactory.newInstance()
fileNumber = elementCount = 0

def createEventReader() {
    reader = XMLInputFactory.newInstance().createXMLEventReader(new FileInputStream(input))
    start = reader.next()
    root = reader.nextTag()
    firstChild = reader.nextTag()
    return reader

def createNextEventWriter () {
    println "Writing to '${filename = String.format(output, ++fileNumber)}'"
    writer = XMLOutputFactory.newInstance().createXMLEventWriter(new FileOutputStream(filename), start.characterEncodingScheme)
    return writer

elements = createEventReader().findAll { it.startElement && it.name == firstChild.name }.size()
println "Splitting ${elements} <${firstChild.name.localPart}> elements into ${pieces} pieces"
chunkSize = elements / pieces
writer = createNextEventWriter()
createEventReader().each { 
    if (it.startElement && it.name == firstChild.name) {
        if (++elementCount > chunkSize) {
            writer = createNextEventWriter()
            elementCount = 0


Or, You can try this way

 In case you just two huge "top level" tags, it will be immensely hard to be able to split it in a system that makes it feasible to both merge it back together and read it piece by piece as valid xml.

Execute this Pseudo in C#:

int nrOfPieces = 5;
XmlDocument xmlOriginal = some input parameter..

// construct the list we need, and fill it with XmlDocuments..
var xmlList = new List<XmlDocument>();
for (int i = 0; i < nrOfPieces ; i++)
    var xmlDoc = new XmlDocument();
    xmlDoc.ChildNodes.Add(new XmlNode(xmlOriginal.FistNode.Name));

var nodeList = xmlOriginal.GetElementsByTagName("Piece")M
// Copy the nodes from the original into the pieces..
for (int i = 0; i < nodeList .Count; i++)
    var xmlDoc = xmlList[i % nrOfPieces];
    var nodeToCopy = nodeList[i].Clone();

This should provide you n docs with exact xml and the chance to merge them back together. However again, it rely on the xml file.

31,120 points
10 6 4