Maven HTML Cleaner Plug-in - Project Overview

Making DocBook content available through Maven generated project sites

1.0.0.beta-1


Table of Contents

Mission and Scope
What problem does this project address
What is the goal of this project?
What is the scope of this project?
Status
Project Documents

Mission and Scope

This project has been created to transform HTML into well-formed XML, as part of a Maven build. A case where this is required, is when HTML has been generated from DocBook content and needs to be included in a (by Maven generated) project site.

The transformation, of HTML into well-formed XML, is done through HTML Cleaner .

What problem does this project address

When a Maven project site is generated, it will incorporate static pages and generated reports . Apart from the static pages in the standard Maven site directory ${project}/src/site/${format} it will also include dynamically generated static pages , which are placed in the directory ${projectDir}/target/generated-site/${format} . Here the ${format} should be one of the Maven Site Plug-in supported formats ( xdoc , apt , fml , xhtml , twiki , confluence ). Both sets of static pages will be pickup automatically by the Maven Site Plug-in and converted together with the generated reports into a project site.

Not all tools create well-formed (X)HTML, which then causes a problem, when these document need to be included in a Maven generated project site. One of the great Maven Plug-ins for generating HTML (and PDF) documents from DocBook content is the Maven DocBkx Plug-in , but the generated HTML is not well-formed XML. Here this Maven HTML Cleaner Plug-in comes into action, to transform the generate HTML into well-formed XML. This is just an example, this Maven HTML Cleaner Plug-in can transform any document, which can be handled by HTML Cleaner .

HTML Cleaner is a nice Java API which is specialized in transforming HTML documents into well-formed XML documents. It comes standard with an Ant task, but a Maven2 Plug-in is missing.

What is the goal of this project?

Transform HTML into well-formed XML, through a Maven2 Plug-in.

What is the scope of this project?

When this plug-in is ready, it is possible to transform HTML documents into well-formed XML documents through the Maven build process. A case in which this immediately adds value, is in cleaning up incorrectly generated HTML files, for inclusion in Maven project sites (to include HTML inside a Maven generated project site, the HTML needs to be well-formed).

Next to the plug-in itself, a User Guide , describing the setup and usage of the plug-in.

The deliverables

Status

Planning to get a first release 1.0.0 beta-1 out in April 2010, to see if it full fills the required needs.

Project Documents

For Everyone

For End Users

Note

This document is based on Ready-to-use Software Engineering Templates template Project Overview .