1.0.0.beta-1
Copyright © 2010 Team - Maven HTML Cleaner Plug-in
Table of Contents
This project The Maven HTML Cleaner Plug-in has been created to transform HTML documents into well-formed XML documents, as part of a Maven build. A case where this is required, is when HTML has been generated from DocBook content, through the docbkx-maven-plugin . The generated HTML is not 100% well-formed. To transform the HTML into well-formed XML, HTML Cleaner is used. This project makes HTML Cleaner available, as Maven2 Plug-in.
The the section called “Transform HTML into well-formed XML” describes how to add and configure the plug-in for general use.
The the section called “Making DocBook content available in a Maven project site” shows all the details, from setting up DocBook content, adding the plug-ins to the pom.xml , to explaining the steps which lead to a Maven generated project site, containing the DocBook content.
The maven-html-cleaner-plugin is not yet available through a central Maven repository. Therefor the following steps are required, to prepare the Maven environment, for getting access to the maven-html-cleaner-plugin and its needed libraries.
The Maven HTML Cleaner Plug-in is not yet available through the central Maven Repository, but can be found in the project specific Maven Repository http://docbook-utils.sourceforge.net/maven2 .
This Maven Repository can be added in a few different ways to a Maven build environment. Below are the most common options listed.
Add the <pluginRepository> lines direct to the project pom.xml
Single user - Single project
Add the <pluginRepository> lines to a <profile> in the ~/.m2/setting.xml
Single user - Multiple projects
Add the repository http://docbook-utils.sourceforge.net/maven2 to a Maven Repository Proxy, like Nexus , Artifactory , etc.
Multiple users - Multiple projects
<pluginRepositories> <pluginRepository> <id>docbook-utils</id> <name>DocBook Utils</name> <url>http://docbook-utils.sourceforge.net/maven2</url> <layout>default</layout> <releases> <enabled>true</enabled> <updatePolicy>daily</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> </pluginRepository> </pluginRepositories>
The files can also be downloaded from the project page on SourceForge .
After downloading the file(s), they need to be installed in a local or central Maven repository. FIXME add text
The following lines are needed because the new Plug-ins, used here, are not in the Default Plug-ins Domain <groupId>org.apache.maven.plugins</groupId> . When these lines are added, calling the Plug-ins from the command line, is much easier. It also feels more natural, as they behave then similar as Plug-ins that come from the Default Plug-ins Domain .
<!-- When Plug-ins should be taken into account during the plug-in search, add them --> <!-- to this list, as they are not in the default groupId 'org.apache.maven.plugins'. --> <pluginGroups> <!-- http://docbook-utils.sourceforge.net/maven-htmlCleaner-plugin_1.0 --> <pluginGroup>net.sourceforge.docbook-utils.maven-plugins</pluginGroup> <!-- http://code.google.com/p/docbkx-tools/ --> <pluginGroup>com.agilejava.docbkx</pluginGroup> </pluginGroups>
After adding the above lines the new Plug-ins can be used directly from the command line:
$ mvn docbkx:generate-html
$ mvn html-cleaner:transform
If the settings.xml is not updated, the Plug-ins can still be called, from the command line, but only by specifying their full name:
$ mvn com.agilejava.docbkx:docbkx:generate-html
$ mvn net.sourceforge.docbook-utils.maven-plugins:html-cleaner:transform
<!-- Clean up (X)HTML and XML documents --> <plugin> <groupId>net.sourceforge.docbook-utils.maven-plugins</groupId> <artifactId>maven-html-cleaner-plugin</artifactId> <version>1.0.0.beta-1</version> <executions> <execution> <phase>pre-site</phase> <goals> <goal>transform</goal> </goals> </execution> </executions> <configuration> <!-- REQUIRED: The source directory (input files) --> <sourceDir>${project.build.directory}/generated-html</sourceDir> </configuration> </plugin>
An overview of the Plug-in is given on the Plug-in Usage page.
Details of the configuration options are found on the goal html-cleaner:transform page.
This section describes how to add DocBook content to a Maven project site. For the conversion of DocBook content into HTML the docbkx-maven-plugin is used. Once the HTML is generated form the DocBook content, it needs to be transformed into well-formed XML, through the maven-html-cleaner-plugin . After that the maven-site-plugin can generate the project site.
The example that follows, is available as Maven Archetype and as Downloadable Archive file.
This only works, after setting up the the section called “Adding the Maven Repository” .
FIXME: add text
The files can be found on DocBook Publishing Utilities - Files Section.
The docbkx-maven-plugin requires that the DocBook sources are placed in the directory ${project}/src/docbkx . Create the directory and add the following file article-hello-world.xml to it:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"> <article lang="en"> <title>Hello World!</title> <section> <title>Welcome</title> <para> A small DocBook example. </para> <para> The details of this example can be found in <ulink url="http://docbook-utils.sourceforge.net/maven-html-cleaner-plugin_1.0/docbook/article-user-guide.html#section-maven-html-cleaner-plugin-for-docbook-content"> Making DocBook content available in a Maven project site </ulink> . </para> </section> </article>
There are three Maven Plug-ins involved, in the conversion of DocBook content into a HTML based project site. In Table 1, “The DocBook Process Flow” an overview is given, of how the DocBook source files are processed in three steps.
Table 1. The DocBook Process Flow
Process | Input Directory | Output Directory |
---|---|---|
dobkx-maven-plugin | src/docbkx | target/docbkx/html |
maven-html-cleaner-plugin | target/docbkx/html | target/generated-site/xhtml/docbook |
maven-site-plugin | target/generated-site/xhtml/docbook | target/site/docbook |
The only plug-in which needs to be configured for this process setup is the maven-html-cleaner-plugin . This is because the plug-in can transform any HTML document into a well-formed XML document. The plug-in is not pre-configured for the DocBook specific setup. Therefore the configuration fields sourceDir and destinationDir need to be set. What also needs to be configured here is the replaceExtensionMap , this changes the file extension from html to xhtml . Which is needed, as the maven-site-plugin only picks up files with the extension xhtml .
There is an additional directory docbook added, at the end of the destination directory. This is done so all the DocBook documents will have their own docbook directory under the site directory target/site/docbook . This is more clear, but also reduces the possibility of duplicate file names, as the maven-site-plugin generates multiple pages, from different static files, and through some report plug-ins.
The best place to add the three Maven Plug-ins in the pom.xml is under the <pluginManagement> section as seen below.
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>test</groupId> <artifactId>hello-world</artifactId> <version>1.0.0.alpha-1</version> <name>DocBook Hello World</name> <description>A small example, to show how DocBook content can be added to an Maven project site.</description> <pluginRepositories> <pluginRepository> <id>docbook-utils</id> <name>DocBook Utils</name> <url>http://docbook-utils.sourceforge.net/maven2</url> <layout>default</layout> <releases> <enabled>true</enabled> <updatePolicy>daily</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> </pluginRepository> </pluginRepositories> <build> <pluginManagement> <plugins> <!-- Generate (X)HTML / PDF documents from DocBook content --> <plugin> <groupId>com.agilejava.docbkx</groupId> <artifactId>docbkx-maven-plugin</artifactId> <version>2.0.10</version> <executions> <execution> <phase>pre-site</phase> <goals> <goal>generate-html</goal> <goal>generate-xhtml</goal> <goal>generate-pdf</goal> </goals> </execution> </executions> <configuration> <xincludeSupported>true</xincludeSupported> </configuration> <dependencies> <dependency> <groupId>org.docbook</groupId> <artifactId>docbook-xml</artifactId> <version>4.4</version> <scope>runtime</scope> </dependency> </dependencies> </plugin> <!-- Clean up the, by docbkx-maven-plugin, generated HTML --> <plugin> <groupId>net.sourceforge.docbook-utils.maven-plugins</groupId> <artifactId>maven-html-cleaner-plugin</artifactId> <version>1.0.0.beta-1</version> <executions> <execution> <phase>pre-site</phase> <goals> <goal>transform</goal> </goals> </execution> </executions> <configuration> <sourceDir>${project.build.directory}/docbkx/html</sourceDir> <destinationDir>${project.build.directory}/generated-site/xhtml/docbook</destinationDir> <replaceExtensionMap> <html>xhtml</html> </replaceExtensionMap> </configuration> </plugin> <!-- Generate Project Site --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-site-plugin</artifactId> <version>2.1</version> <dependencies> <dependency> <groupId>org.apache.maven.doxia</groupId> <artifactId>doxia-module-xhtml</artifactId> <version>1.1.2</version> </dependency> </dependencies> </plugin> </plugins> </pluginManagement> <!-- Plug-in used in this project --> <plugins> <!-- Generate (X)HTML / PDF documents from DocBook content --> <plugin> <groupId>com.agilejava.docbkx</groupId> <artifactId>docbkx-maven-plugin</artifactId> </plugin> <!-- Clean up the, by docbkx-maven-plugin, generated HTML --> <plugin> <groupId>net.sourceforge.docbook-utils.maven-plugins</groupId> <artifactId>maven-html-cleaner-plugin</artifactId> </plugin> <!-- Generate Project Site --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-site-plugin</artifactId> </plugin> </plugins> </build> <reporting> <plugins> <!-- Add the Maven project information reports --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-project-info-reports-plugin</artifactId> <version>2.1.2</version> <reportSets> <reportSet> <reports> <report>index</report> <!-- <report>dependencies</report> <report>project-team</report> <report>mailing-list</report> <report>cim</report> <report>issue-tracking</report> <report>license</report> <report>scm</report> --> </reports> </reportSet> </reportSets> </plugin> </plugins> </reporting> </project>
The docbkx-maven-plugin can indeed also generate XHTML, but that generate XHTML also causes problems for the maven-site-plugin . De default phase under which the maven-html-cleaner-plugin is excuted is the pre-site phase. Therefore the docbkx-maven-plugin also executes in that phase.
The maven-html-cleaner-plugin must always run after the docbkx-maven-plugin , so if they are executed in the same phase , make shore that the maven-html-cleaner-plugin is placed after the docbkx-maven-plugin in the pom.xml .
The reporting section of the maven-project-info-reports-plugin used here, is only added to reduce the generation of empty report pages. Many of the left out reports depend on details of the pom.xml , which have been left out from this example. Or Java code, which is not included in this example.
An overview of the Plug-in is given on the Plug-in Usage page. Details of the configuration options are found on the goal htmlCleaner:transform page.
Create a menu item in the site.xml , so the DocBook content can be directly found on the project web site.
<?xml version="1.0" encoding="UTF-8"?> <!-- Configuring the Site Descriptor http://maven.apache.org/plugins/maven-site-plugin/examples/sitedescriptor.html --> <project xmlns="http://maven.apache.org/xsd/decoration-1.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/xsd/decoration-1.0.0 http://maven.apache.org/xsd/decoration-1.0.0.xsd" name="${project.name}"> <bannerLeft> <name>${project.name}</name> <href>${project.url}</href> </bannerLeft> <body> <menu name="${project.name}"> <item name="About" href="index.html" /> <item name="Hello World" href="docbook/article-hello-world.html" /> </menu> <!-- <menu ref="reports" /> --> </body> </project>
The <menu ref="reports" /> was left out, as the number of reports was reduced to a minimal. The only generated report is the About ( index.html ). Which is directly referenced through the <item name="About" href="index.html" /> .
Because in the projects pom.xml the dependency is made on maven-site-plugin v2.1 , the minimal version of Maven to execute this example is Maven v2.1 .
Procedure 1. Steps to generate the Maven Project Site
Below are the steps in detail given. This shows the steps which need to take place and in the case something goes wrong, it is wise to run these steps, in this order, one by one, to see in which phase the problem arrises.
Generate HTML from the DocBook content
$ mvn docbkx:generate-html
Transform the generated HTML into well-formed XML
$ mvn html-cleaner:transform
Generate the Maven project site
$ mvn site
Procedure 2. The Normal and Quick way for generating the Maven Project Site
Generate the Maven project site
$ mvn clean site
This document is based on Ready-to-use Software Engineering Templates template User Guide .