JARV is an implementation-independent interface set for validators developed by the RELAX community. There are several implementations available that support this interface.
Although it originally came from the RELAX community, JARV is not limited to RELAX; it can be used with many other schema languages. One of the advantages of JARV is that it allows you to use multiple schema languages with minimal change on your code.
First, you need the latest isorelax.jar
file, which is
available here.
Then, you need actual implementations. Currently, following implementations are available:
You need to set up those jars so that the class loader can find them.
JARV consists of three components. VerifierFactory
, Schema
and Verifier
.
The VerifierFactory
interface is the main interface between the implementation and your application. It has a method to compile a schema into a Schema
object.
The Schema
interface is the internal representation of the schema. This interface is thread-safe, so you can have multiple threads access one Schema
object concurrently. Also, this interface has a method to create a new Verifier
object.
The Verifier
interface represents a so-called "validator"; it has a schema object in it and it validates documents by using that schema.
VerifierFactory
The first thing you would do is to create an instance of VerifierFactory
. To do that, simply create an instance of VerifierFactory
implementation. In case of MSV, it will be:
VerifierFactory factory = new com.sun.msv.verifier.jarv.TheFactoryImpl();
To use Swift RELAX Verifier for Java:
VerifierFactory factory = new jp.xml.gr.relax.swift.SwiftVerifierFactory();
JARV is also capable of finding an implementation that supports a particular schema language at run-time. To learn more about this discovery mechanism, please read this.
Once you get a factory, then you can use it to compile a schema. To compile a schema, call the compileSchema
method of the factory.
Schema schema = factory.compileSchema("http://www.example.org/test.xsd");
This method can accept many types of input. For example, you can pass InputSource
, File
, InputStream
, etc.
Schema
objects are thread-safe. So even if you have more than one threads, you only need one instance of Schema
; you can share that one instance with as many threads as you want.
Schema
is just a compiled schema, so it cannot do anything by itself. Verifier
object is the object that performs the actual validation. To create a Verifier
object, do as follows:
Verifier verifier = schema.newVerifier();
In this way, you can create a Verifier that checks documents against a particular schema.
Verifier is not thread-safe. So typically you want to create one instance per one validation (or one thread.)
Verifier has several methods to validate documents. One way is to call the verify
method, which accepts a DOM tree, File
, URL, etc and returns the validity. For example, to validate a DOM document, simply pass it as an argument:
if(verifier.verify(domDocument)) // the document is valid else // the document is invalid (wrong)
This method will only give you yes/no answer, but you can get more detailed error information by setting an error handler through the setErrorHandler
method.
Just like a parser reports well-formedness errores through org.xml.sax.ErrorHandler
, JARV implementations (like MSV) reports validity errors through the same interface. In this way, you can get the error message, line number that caused the error, etc. For example, in the following code, a custom error handler is set to report error messages to the client.
verifier.setErrorHandler( new MyErrorHandler() ); try { if(verifier.verify(new File("abc.xml"))) // the document is valid else // the execution will never reach here because // if the document is invalid, then an exception should be thrown. } catch( SAXParseException e ) { // if the document is invalid, then the execution will reach here // because we throw an exception for an error. } ... class MyErrorHandler implements ErrorHandler { public void fatalError( SAXParseException e ) throws SAXException { error(e); } public void error( SAXParseException e ) throws SAXException { System.out.println(e); throw e; } public void warning( SAXParseException e ) { // ignore warnings } }
If you throw an exception from the error handler, that exception will not be catched by the verify method. So the validation is effectively aborted there. If you return from the error handler normally, then MSV will try to recover from the error and find as much errors as possible.
Every JARV implementation supports the validation via SAX2 in two ways.
The first one is a validator implemented as ContentHandler
, which can be obtained by calling the getVerifierHandler
method.
This content handler will validate incoming SAX2 events, and you can obtain the validaity through the isValid
method. For example,
XMLReader reader = ... ; // get XML reader from somewhere VerifierHandler handler = verifier.getVerifierHandler(); reader.setContentHandler(handler); reader.parse("http://www.mydomain.com/some/file.xml"); if(handler.isValid()) // the document is correct else // the document is incorrect
The second one is a validator implemented as XMLFilter
, which can be obtained by calling the getVerifierFilter
method.
A verifier implemented as a filter, VerifierFilter
, is particularly useful because you can plug it right in the middle of any SAX event pipeline.
Not only you can validate documents before you process them, you can validate them after your application process them.
In the following example, a verifier filter is used to validate documents before your own handler process it.
VerifierFilter filter = verifier.getVerifierFilter(); // create a new XML reader and setup the pipeline filter.setParent(getNewXMLReader()); filter.setContentHandler( new MyApplicationHandler() ); // parse the document filter.parse("http://www.mydomain.com/some/file.xml"); if(filter.isValid()) // the parsed document was valid else // invalid
SAX-based validation will not make much sense unless you set an error handler, because to know that the document was invalid after you've processed it is too late.
To set an error handler, call the setErrorHandler
method just as you did with the verify
method.
filter = verifier.getXMLFilter(); verifier.setErrorHandler(new MyErrorHandler()); ... filter.parse(...);
In this way, you can abort the processing by throwing an exception in case of an error. If you are using VerifierFilter
you can also set an error handler by calling the setErrorHandler
method of the VerifierFilter
interface.
Some JARV implementations (e.g., MSV, Jing, RELAX Verifier for Java) always runs in the fail-fast manner. So as long as you set an error handler, it is guaranteed that your application will never see incorrect document at all.
A simple, obvious way to create a VerifierFactory
is to create a new instance of appropriate implementation class (like com.sun.msv.verifier.jarv.TheFactoryImpl
.
In this way, you can decide the JARV implementation at the compile time. Especially in case of MSV, it is advantageous to do so because of the support of the "multi-schema" capability. The MSV factory will accept any schema written in any of the supported languages. Thus you can instantly change the schema language without changing your code at all
However, there is one problem in this approach. Specifically, it locks you into a particular JARV implementation, so you need to change your code to use other JARV implementations.
For this reason, you may want to "discover" an implementation (just like you usually do with JAXP) at run-time by calling the static newInstance
method of the VerifierFactory
class. To do that, you need to pass the name of schema language you want to use. This method will find an implementation that supports a given schema language from the class path and returns its VerifierFactory
.
VerifierFactory factory = VerifierFactory.newInstance( "http://relaxng.org/ns/structure/1.0");
Usually, the namespace URI of the schema language is used as the name. For the complete list, plaese consult the javadoc.
One of the problems of some validators (like DTD validator in Xerces) is that it doesn't work in the fail-fast manner. This problem is unique to SAX.
What is "fail-fast"? A fail-fast validator is a validator that can flag an error as soon as an error is found. A non fail-fast validator may let some part of the wrong document slip in (they will flag an error at the later moment.)
When you are using non fail-fast validator, you need to take extra care to write your code because your code may be exposed to bad documents.
For example, imagine a following simple DTD and a bad document:
<!ELEMENT root (a,b)*> <!ELEMENT a #EMPTY> <!ELEMENT b #EMPTY> <root> <b/> <!-- error --> <b/> </root>
Suprisingly, in a typical non-fail-fast validator, the error will be signaled as late as in the end-element event of the root
element. So you have to make sure that your application behaves gracefully when it sees the wrong 'b
'.
Typically, this robs the merit of the validation because you do the validation to protect your application code from unexpected inputs.
Many of JARV implementations (including MSV, Jing, RELAX Verifier for Java) are fail-fast validators; so they will signal an error at the start-element event of the first 'b
'. This guarantees that the application will never see a wrong document.
Note that some other JARV implementations may be non fail-fast validators.
The VerifierFactory
class has the newVerifier
method as a short-cut. It is a short-cut in the sense that the following two code fragments have exactly the same meaning:
Verifier v = factory.compileSchema(x).newVerifier(); Verifier v = factory.newVerifier(x);
This is sometimes useful when you are using only one thread.
JAXP masquerading feature is a wrapper implementation of JAXP. This wrapper enhances another JAXP implementation (such as Aelfred or Crimson) by adding JARV-based validation capability to it. Parsing is done by the wrapped JAXP implementation, and JARV implementation adds advanced validation capability to it.
This is often the easiest way to incorporate the validation into your application. Since it's just so easy to use.
To create a wrapped SAXParserFactory
, do as follows:
Schema schema = /* compile schema */; SAXParserFactory parserFactory = new org.iso_relax.jaxp.ValidatingSAXParserFactory(schema);
This will create a JAXP SAXParserFactory that validates every parsed document by the specified schema. Similarly, to create a wrapped DocumentBuilder
, do as follows:
Schema schema = /* compile schema */; DocumentBuilderFactory dbf = new org.iso_relax.jaxp.ValidatingDocumentBuilderFactory(schema);
Once those instances are created, just use them as you use a normal JAXP implementation.
The VerifierFactory
interface is not thread-safe. This basically means that you cannot use one object from two threads.
The Schema
interface is thread-safe. So once you compile a schema file into a Schema
object, it can be shared by multiple threads and accessed concurrently. This is useful at server-side, where multiple threads process client requests simultaneously.
The Verifier
interface is again not thread-safe. Each thread needs its own copy of Verifier
.
Verifier
objects are still re-usable, as you can use the same object to validate multiple documents one by one. What you cannot do is to validate multiple documents simultaneously.
The thread affinity of JARV is designed after that of TrAX API (javax.transform
package). Familiarity with TrAX will help you understand JARV better.
com.sun.msv.verifier.jarv.TheFactoryImpl
automatically detects the schema language from the schema file. However, there is one important limitation. Currently, the detection of XML DTDs is based on the file extension. Specifically, if the schema name has ".dtd" extension, it is treated as XML DTD and otherwise it is treated as other schema languages.
This causes a problem when you are passing InputStream
as the parameter to the compileSchema
method. Since InputStream
s do not have names, they are always treated as non-DTD schemas.
To avoid this problem, wrap it by an InputSource
and call the setSystemId
method to set the system id. The following example shows how to do that:
InputSource is = new InputSource( MyClass.class.getResourceAsStream("abc.dtd") ); is.setSystemId("abc.dtd"); verifierFactory.compileSchema(is);
This ugly limitation came from the difficulty in correctly detecting XML DTDs, which are written in non-XML syntax, from other schema languages, which are written in XML syntax.
Any input on this restriction is very welcome.
If you need an example that is not listed here, please let me know so that I can add it in the next release.
Have a look at SingleThreadDriver.java
example in
this zip file. It compiles a schema and obtains a verifier object, then use the same verifier to validate multiple documents.
Have a look at MultiThreadDriver.java
example in
this zip file.
This example first compiles a schema, then it launches a lot of threads and let them share one schema object.
This example shows you how to use JARV in the multi-threaded environment and how you can cache a compiled schema into memory.
The following code shows how you can validate DOM by using JARV.
import org.iso_relax.verifier.*; void f( org.w3c.dom.Document dom ) { // create a VerifierFactory VerifierFactory factory = VerifierFactory.newInstance( "http://relaxng.org/ns/structure/1.0"); // compile a RELAX NG schema Schema schema = factory.compileSchema( new File("foo.rng") ); // obtain a verifier Verifier verifier = schema.newVerifier(); // check the validity of a DOM. if( verifier.verify(dom) ) // the document is valid else // the document is not valid // you can use the same verifier object to test multiple DOMs // as long as you don't use it concurrently. if( verifier.verify(anotherDom) ) ... // or you can pass an Element to validate that subtree. Element e = (Element)dom.getDocumentElement().getFirstSibling(); if( verifier.verify(e) ) ... }
The following code shows how you can use JARV together with SAX.
import org.iso_relax.verifier.*; void f( javax.xml.parsers.SAXParserFactory parserFactory ) { // create a VerifierFactory with the default SAX parser VerifierFactory factory = VerifierFactory.newInstance( "http://www.xml.gr.jp/xmlns/relaxCore"); // compile a RELAX schema Schema schema = factory.compileSchema( new File("foo.rxg") ); // obtain a verifier Verifier verifier = schema.newVerifier(); // set an error handler // this error handler will throw an exception if there is an error verifier.setErrorHandler( new MyErrorHandler() ); // get a XMLFilter VerifierFilter filter = verifier.getVerifierFilter(); // set up the pipe-line XMLReader reader = parserFactory.newSAXParser().getXMLReader(); filter.setParent( reader ); filter.setContentHandler( new MyContentHandler() ); // parse the document try { filter.parse( "MyInstance.xml" ); // if the execution reaches here, the document was valid and // there was nothing wrong. } catch( SAXException e ) { // error. // maybe the document is not well-formed, or it's not valid // or some other reasons. } }
The following code shows how you can use JARV via JAXP-masquerading.
import org.iso_relax.verifier.*; import org.iso_relax.jaxp.*; void f() { // create a RELAX NG validator VerifierFactory factory = VerifierFactory.newInstance( "http://relaxng.org/ns/structure/1.0"); // compile a schema Schema schema = factory.compileSchema( new File("myschema.rng") ); // wrap it into a JAXP SAXParserFactory parserFactory = new ValidatingSAXParserFactory(schema); // create a new XMLReader from it parserFactory.setNamespaceAware(true); XMLReader reader = parserFactory.newSAXParser().getXMLReader(); // set an error handler // this error handler will throw an exception if there is an well-formedness // error or a validation error. reader.setErrorHandler( new MyErrorHandler() ); // set the content handler reader.setContentHandler( new MyContentHandler() ); // parse the document try { reader.parse( "MyInstance.xml" ); // if the execution reaches here, the document was valid and // there was nothing wrong. } catch( SAXException e ) { // error. // maybe the document is not well-formed, or it's not valid // or some other reasons. } }