This project has been created to clean up (X)HTML and XML documents as part of a Maven build. It specializes, through configuration, for preparing XHTML files for a Maven generated project site. A case where this is required is when HTML has been generated from DocBook content and needs to be included in a project site.
The cleaning up, also called tidying, of (X)HTML and XML documents is done through JTidy [1] a Java implementation of the Tidy utility.
When a Maven project site is generated, it will incorporate static pages and generated reports. Apart from the static pages in the standard Maven site directory ${project}/src/site/${format} it will also include dynamically generated static pages, which are placed in the directory ${projectDir}/target/generated-site/${format}. Here the ${format} should be one of the Maven Site Plug-in supported formats ( xdoc, apt, fml, xhtml, twiki, confluence ). Both sets of static pages will be pickup automatically by the Maven Site Plug-in and converted together with the generated reports into a project site.
Not all tools create well formatted XHTML, which then causes a problem, when they need to be included in a Maven generated project site. One of the great Maven Plug-ins for generating HTML (and PDF) documents from DocBook content is the Maven DocBkx Plug-in, but the generated HTML is not well formatted. Here this Maven Tidy Plug-in comes into action, to clean up the generate HTML. This is just an example, this Maven Tidy Plug-in can clean up any (X)HTML and XML document, which gets through the JTidy Java API.
JTidy is a nice Java API which is specialized in cleaning up incorrect formatted (X)HTML and XML documents. It comes standard with an Ant task, but a Maven2 Plug-in is missing. The Object Best Group created already a Maven JTidy Plug-in [2] but it is based on an rather old version [3] of JTidy.
Cleaning up formatting and readability of (X)HTML and XML documents, through a Maven2 Plug-in.
When this plug-in is ready, it is possible to clean up (X)HTML and XML documents through the Maven build process. A case in which this immidiatly adds value, is in cleaning up incorrectly formatted HTML files, for inclusion in Maven generated project sites.
The Next to the plug-in itself, a user guide, describing the configuration is required.
The deliverables
The Maven2 Plug-in maven-tidy-plugin
Planning to get a first release 1.0.0 beta-1 out in April 2010, to see if it full fills the required needs.
This document is based on Ready-to-use Software Engineering Templates template Project Overview.
[1] A tool for cleaning up incorrectly formatted (X)HTML and XML documents.
[2] There is a Maven 1 Plug-in and a Maven 2 Plug-in. This project does not support / contain a Maven 1 Plug-in.
[3] As of this writing the OBG - Maven JTidy Plug-in is based on JTidy 4aug2000r7-dev.