CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900 The Oracle XDK includes components that help you to determine the differences between the contents of two XML documents and then to apply the differences (patch) to one of the XML documents.
You can use Oracle XmlDiff to determine the differences between two similar XML documents. XmlDiff generates an Xdiff instance document that indicates the differences. The Xdiff instance document is an XML document that conforms to an XML schema, the Xdiff schema. You can then use XmlPatch, which takes the Xdiff instance document and applies the changes to other documents. This can be used to apply the same changes to a large number of XML documents. XmlDiff only supports the DOM API for input and output. XmlPatch also supports the DOM for the input and patch documents. XmlDiff and XmlPatch can be used through a C API or a command-line tool, and they are exposed by two SQL functions. An XmlHash C API is provided to compute the hash value of an XML tree or subtree. If hash values of two trees or subtrees are equal, the trees are identical to a very high probability.
The flow of the process is as follows:
XmlDiff compares the trees that represent the two input documents to determine differences. Both input documents must use the same character-set encoding. The Xdiff (output) instance document has the same encoding as the data encoding (DOM encoding) of the input documents.
There are two options for the comparison, known as optimizations:
Global optimization can take more time and space for large documents but always produces the smallest set of differences (the optimal difference). Local optimization is much faster, but may not produce the optimal difference.
Hashing generally speeds up global optimization with a small possible loss in quality. Hashing improves the quality of the difference output, with local optimization. Using different hash levels may generate both local and global differences. You can specify the use of hashing for both local and global optimization. To specify hashing, provide the hashLevel parameter. If hashLevel is greater than 1, then only the DOMHash values are used for comparing all subtrees at depth >= hashLevel of difference. If the hash values are equal, then the subtrees are presumed to be equal.
XmlDiff ignores differences in the order of attributes while doing the comparison. XmlDiff ignores DocType declarations. Files are not validated against the DTD. XmlDiff ignores any differences in the namespace prefixes as long as the namespace prefixes refer to the same namespace URI. Otherwise, if two nodes have the same local name and content but differ in namespace URI, these differences are indicated.
Table 21-1 describes command-line options:
Table 21-1 XmlDiff Command-Line Options for the C Language
Example 21-1 is a sample xml document that can be used to explain updates resulting from using both XmlDiff and XmlPatch. It is followed by some hypothetical changes.
Example 21-1 book1.xml <?xml version="1.0"?> <booklist xmlns="http://booklist.oracle.com"> <book> <title>Twelve Red Herrings</title> <author>Jeffrey Archer</author> <publisher>Harper Collins</publisher> <price>7.99</price> </book> <book> <title language="English">The Eleventh Commandment</title> <author>Jeffrey Archer</author> <publisher>McGraw Hill</publisher> <price>3.99</price> </book> <book> <title language="English" country="USA">C++ Primer</title> <author>Lippmann</author> <publisher>Harper Collins</publisher> <price>4.99</price> </book> <book> <title>Emperor's New Mind</title> <author>Roger Penrose</author> <publisher>Oxford Publishing Company</publisher> <price>15.9</price> </book> <book> <title>Evening News</title> <author>Arthur Hailey</author> <publisher>MacMillan Publishers</publisher> <price>9.99</price> </book> </booklist>Assume that there is another file, book2.xml, that looks just like the Example 21-1, "book1.xml" except that it results in the following: Deletes "The Eleventh Commandment", a delete-node operation. Changes the country code for the "C++ Primer" to US from USA, an update-node operation. Adds a description to "Emperor's New Mind", an append-node operation. Add the edition to "Evening News", an insert-node-before operation. Updates the price of "Evening News", an update-node operation.
This section shows the Xdiff instance document produced by the comparison of these two XML files described in the previous section. The sections that follow explain the XML processing instructions and the operations on this document. You can invoke XmlDiff as follows: > xmldiff book1.xml book2.xmlYou can also examine the sample application for arguments and flags.
Example 21-2 Sample Xdiff Instance Document <?xml version="1.0" encoding="UTF-8"?> <xd:xdiff xsi:schemaLocation="http://xmlns.oracle.com/xdb/xdiff.xsd xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oraxdfns_0="http://booklist.oracle.com"> <?oracle-xmldiff operations-in-docorder="true" output-model="snapshot" diff-algorithm="global"?> <xd:delete-node xd:node-type="element" xd:xpath="/oraxdfns_0 :booklist[1]/oraxdfns_0:book[2]"/> <xd:update-node xd:node-type="attribute" xd:parent-xpath="/oraxdfns_0:booklist[1]/oraxdfns_0:book[3]/oraxdfns_0 :title[1]" xd:attr-local="country"> <xd:content>US</xd:content> </xd:update-node> <xd:append-node xd:node-type="element" xd:parent-xpath="/oraxdfns_0 :booklist[1]/oraxdfns_0:book[4]"> <xd:content> <oraxdfns_0:description> This is a classic </oraxdfns_0:description> </xd:content> </xd:append-node> <xd:insert-node-before xd:node-type="element" xd:xpath="/oraxdfns_0 :booklist[1]/oraxdfns_0:book[5]/oraxdfns_0:author[1]"> <xd:content> <oraxdfns_0:edition>Hardcover</oraxdfns_0:edition> </xd:content> </xd:insert-node-before> <xd:update-node xd:node-type="text" xd:xpath="/oraxdfns_0 :booklist[1]/oraxdfns_0:book[5]/oraxdfns_0:price[1]/text()[1]"> <xd:content>12.99</xd:content> </xd:update-node> </xd:xdiff>
The Xdiff instance document uses some XML processing instructions (shown in bold in the previous section) that are used to represent certain aspects of the differencing process. See "Xdiff Schema". These instructions and related options are:
XmlDiff captures differences using operations indicated by the Xdiff instance document. Note the following about Xdiff operations:
The Xdiff operations, presented in the Xdiff Instance Document, are as follows:
The output of XmlDiff, the Xdiff instance document, is in XML format and conforms to the Xdiff schema shown in the next section. The output document contains a sequence of operations describing the differences between the two input documents. If you apply the differences from the first document, you get the second document.
Example 21-3 shows the Xdiff schema to which the Xdiff instance document (output) adheres.
Example 21-3 Xdiff Schema: xdiff.xsd <schema targetNamespace="http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd" version="1.0" elementFormDefault="qualified" attributeFormDefault="qualified"> <annotation> <documentation> Defines the structure of XML documents that capture the difference between two XML documents. Changes that are not supported by Oracle XmlDiff may not be expressible in this schema. 'oracle-xmldiff' PI in Xdiff document: We use 'oracle-xmldiff' PI to describe certain aspects of the diff. The PI denotes values for 'operations-in-docorder' and 'output-model'. The output of XmlDiff has the PI always. If the user hand-codes a diff doc then it must also have the PI in it as the first child of top level xdiff element, to be able to call XmlPatch. operations-in-docorder: Can be either 'true' or 'false'. If true, the operations in the diff document refer to the elements of the input doc in the same order as document order. Output of global algorithm meets this requirement while local does not. output-model: output models for representing the diff. Can be either 'Snapshot' or 'Current'. Snapshot model: Each operation uses Xpaths as if no operations have been applied to the input document. (like UNIX diff) This is the model used in the output of XmlDiff. XmlPatch works with this (and the current model too). For XmlPatch to handle this model, "operations-in-docorder" must be true and the Xpaths must be simple. (see XmlDif C API documentation). Current model: Each operation uses Xpaths as if all operations till the previous one have been applied to the input document. Works with XmlPatch even if the 'operations-in-docorder' criterion is not met and the xpaths are not simple. <!-- Example: <?oracle-xmldiff operations-in-docorder="true" output-model= "snapshot" diff-algorithm="global"?> --> </documentation> </annotation> <!-- Enumerate the supported node types --> <simpleType name="xdiff-nodetype"> <restriction base="string"> <enumeration value="element"/> <enumeration value="attribute"/> <enumeration value="text"/> <enumeration value="cdata"/> <enumeration value="entity-reference"/> <enumeration value="entity"/> <enumeration value="processing-instruction"/> <enumeration value="notation"/> <enumeration value="comment"/> </restriction> </simpleType> <element name="xdiff"> <complexType> <choice minOccurs="0" maxOccurs="unbounded"> <element name="append-node"> <complexType> <sequence> <element name="content" type="anyType"/> </sequence> <attribute name="node-type" type="xd:xdiff-nodetype"/> <attribute name="xpath" type="string"/> <attribute name="parent-xpath" type="string"/> <attribute name="attr-local" type="string"/> <attribute name="attr-nsuri" type="string"/> </complexType> </element> <element name="insert-node-before"> <complexType> <sequence> <element name="content" type="anyType"/> </sequence> <attribute name="xpath" type="string"/> <attribute name="node-type" type="xd:xdiff-nodetype"/> </complexType> </element> <element name="delete-node"> <complexType> <attribute name="node-type" type="xd:xdiff-nodetype"/> <attribute name="xpath" type="string"/> <attribute name="parent-xpath" type="string"/> <attribute name="attr-local" type="string"/> <attribute name="attr-nsuri" type="string"/> </complexType> </element> <element name="update-node"> <complexType> <sequence> <element name="content" type="anyType"/> </sequence> <attribute name="node-type" type="xd:xdiff-nodetype"/> <attribute name="parent-xpath" type="string"/> <attribute name="xpath" type="string"/> <attribute name="attr-local" type="string"/> <attribute name="attr-nsuri" type="string"/> </complexType> </element> <element name="rename-node"> <complexType> <sequence> <element name="content" type="anyType"/> </sequence> <attribute name="xpath" type="string"/> <attribute name="node-type" type="xd:xdiff-nodetype"/> </complexType> </element> </choice> <attribute name="xdiff-version" type="string"/> </complexType> </element> </schema>
In an application, XmlDiff takes the source types and locations of the input documents as arguments. The source type can be a URL, file, orastream and stream context pointers, buffer, and buffer_length pointers or the pointer to a DOM document element (docelement). XmlDiff returns the document node for the DOM for the Xdiff instance document. XmlDiff builds the DOM for the two documents, if they are not already provided as DOM, before performing a comparison.
Example 21-4 XMLDiff Application # include <xmldf.h> ... xmlctx *xctx; xmldocnode *doc1, *doc2, *doc3; uword hash_level; oratext *s, *inp1 = "book1.xml", *inp2="book2.xml"; xmlerr err; ub4 flags; flags = 0; /* defaults : global algorithm */ hash_level = 0; /* no hashing */ /* create XML meta context */ if (!(xctx = XmlCreate(&err, (oratext *) "XmlDiff", NULL))) { printf("Failed to create XML context, error %u\n", (unsigned) err); err_exit("Exiting"); } /* Load the two input files */ if (!(doc1 = XmlLoadDom(xctx, &err, "file", inp1, "discard_whitespace", TRUE, NULL))) { printf("Parsing first file failed, error %u\n", (unsigned)err); err_exit((oratext *)"Exiting."); } if (!(doc2 = XmlLoadDom(xctx, &err, "file", inp2, "discard_whitespace", TRUE, NULL))) { printf("Parsing second file failed, error %u\n", (unsigned)err); err_exit((oratext *)"Exiting."); } /* run XmlDiff on the DOM trees. */ doc3 = XmlDiff(xctx, &err, flags, XMLDF_SRCT_DOM, doc1, NULL, XMLDF_SRCT_DOM, doc2, NULL,hash_level, NULL); if(!doc3) printf("XmlDiff Failed, error %u\n", (unsigned)err); else { if(err != XMLERR_OK) printf("XmlDiff returned error %u\n", (unsigned)err); /* Now we have the DOM tree in doc3 which represent the Diff */ ... } XmlFreeDocument(xctx, doc1); XmlFreeDocument(xctx, doc2); XmlFreeDocument(xctx, doc3); XmlDestroy(xctx);
A customized output builder stores differences in any format suitable to the application. You can create your own customized output builder, rather than using the default Xdiff instance document, generated by XmlDiff and which conforms to the Xdiff schema. To do this, you must provide a callback that can be called after XmlDiff determines the differences. The differences are passed to the callback as an array of xmdlfop. The callback may be called multiple times as the differences are being generated. Using a customized output builder may perform better than using the default, because it does not have to maintain the internal state necessary for XPath generation. By default, XmlDiff captures the differences in XML conforming to Xdiff schema. If necessary, plug in your own output builder. The differences are represented as an array xmldfop. You must write an output builder callback function. The function signature is: xmlerr(*xdfobcb)(void *uctx, xmldfop *escript, ub4 escript_siz);uctx is the user specific context. escript is the array of size escript_siz: diff[escript_siz]mctx is the memory context. Supply this memory context through properties to XmlDiff(). Use this memory context to allocate escript. You must later free escript. Invoke the output builder callback after the differences have been found which happens even before the call to XmlDiff() returns. The output builder callback can be called multiple times.
Example 21-5 Customized XMLDiff Output /* Sample useage: */ ... #include <orastruc.h> / * for 'oraprop' * / ... static oraprop diff_props[] = { ORAPROP(XMLDF_PROPN_CUSTOM_OB, XMLDF_PROPI_CUSTOM_OB, POINTER), ORAPROP(XMLDF_PROPN_CUSTOM_OBMCX, XMLDF_PROPI_CUSTOM_OBMCX, POINTER), ORAPROP(XMLDF_PROPN_CUSTOM_OBUCX, XMLDF_PROPI_CUSTOM_OBUCX, POINTER), { NULL } }; ... oramemctx *mymemctx; ... xmlerr myob(void *uctx, xmldfop *escript, ub4 escript_siz) { /* process diff which is available in escript * / /* free escript - the caller has to do this * / OraMemFree(mymemctx, escript); } main() { ... myctxt *myctx; diff_props[0].value_oraprop.p_oraprop_v = myob; diff_props[1].value_oraprop.p_oraprop_v = mymemctx; diff_props[2].value_oraprop.p_oraprop_v = myctx; XmlDiff(xctx, &err, 0, doc1, NULL, 0, doc2, NULL, 0, diff_props); ... }
XmlPatch takes the Xdiff instance document, either as generated by XmlDiff or created by another mechanism, and follows the instructions in the Xdiff instance document to modify other XML documents as specified.
Table 21-2 describes the XmlPatch command-line options:
Table 21-2 XmlPatch for C Command-Line Options
XmlPatch takes the source types and locations of the input document and the diff document as arguments. The source type can be a URL, file, orastream and stream context pointers, buffer and buffer_length pointers, or the pointer to a DOM document element (docelement). The modes that were set by the Xdiff schema affect how XmlPatch works. If the output-model is Snapshot, XmlPatch only works if operations-in-docorder is TRUE. If the output-model is Current, it is not necessary that operations-in-docorder be set to TRUE.
Example 21-6 Sample Application for XmlPatch ... #include <xmldf.h> ... xmlctx *xctx; xmldocnode *doc1, *doc2; oratext *s; oratext *inp1 = "book1.xml"; /* input document */ oratext *inp2 = "diff.xml", /* diff document */ xmlerr err; /* create XML meta context */ if (!(xctx = XmlCreate(&err, (oratext *) "XmlPatch", NULL))) { printf("Failed to create XML context, error %u\n", (unsigned) err); err_exit("Exiting"); } /* Load the two input files */ if (!(doc1 = XmlLoadDom(xctx, &err, "file", inp1, "discard_whitespace", TRUE, NULL))) { printf("Parsing first file failed, error %u\n", (unsigned)err); err_exit((oratext *)"Exiting."); } if (!(doc2 = XmlLoadDom(xctx, &err, "file", inp2, "discard_whitespace", TRUE, NULL))) { printf("Parsing second file failed, error %u\n", (unsigned)err); err_exit((oratext *)"Exiting."); } /* call XmlPatch */ if(!XmlPatch(xctx, &err, 0, XMLDF_SRCT_DOM, doc1, NULL, XMLDF_SRCT_DOM, doc2, NULL, NULL)); printf("XmlPatch Failed, error %u\n", (unsigned)err); else { if(err != XMLERR_OK) printf("XmlPatch returned error %u\n", (unsigned)err); /* Now we have the patched document in doc1 */ ... } XmlFreeDocument(xctx, doc1); XmlFreeDocument(xctx, doc2); XmlDestroy(xctx);
Oracle XDK provides XmlHash, which computes a hash value for an XML tree or subtree. If the hash values of two subtrees are equal, it is highly probable that they are the same XML. This can be used to do a quick comparison, for example, if you want to see if the XML tree is already in the database. You can run XmlDiff again, if necessary, on any matches, to be absolutely certain there is a match. You can compute the hash value of the new document and query the database for it. Example 21-7 shows a sample program that uses XmlHash.
Example 21-7 XmlHash Program sword main(sword argc, char *argv[]) { xmlctx *xctx; xmldfsrct srct; oratext *data_encoding, *input_encoding, *s, *inp1; ub1 flags; xmlerr err; ub4 num_args; xmlhasht digest; flags = 0; /* defaults */ srct = XMLDF_SRCT_FILE; inp1 = "somexml.xml"; xctx = XmlCreate(&err, (oratext *) "XmlHash", NULL); if (!xctx) { /* handle error with creating xml context and exit */ ... } /* run XmlHash */ err = XmlHash(xctx, &digest, 0, srct, inp1, NULL, NULL); if(err) printf("XmlHash returned error:%d \n", err); else txdfha_pd(digest); XmlDestroy(xctx); return (sword )err; } /* print bytes in xml hash */ static void txdfha_pd(xmlhasht digest) { ub4 i; for(i = 0; i < digest.l_xmlhasht; i++) printf("%x ", digest.d_xmlhasht[i]); printf("\n"); } |