Introduction

The Maven Tidy Plug-in has been created to clean up (X)HTML and XML documents as part of a Maven build. It specializes, through configuration, for preparing XHTML files for a Maven generated project site. A case where this is required is when HTML has been generated from DocBook content through the docbkx-maven-plugin. The generated HTML is not 100% correctly formatted. To clean it up JTidy is used. This plug-in makes JTidy available is part of the Maven build process.

The Maven Tidy Plug-in can clean up HTML, XHTML and XML documents [1] . The the section called “ Cleaning up (X)HTML and XML source files” describes how to add and configure the plug-in for general use.

The the section called “Making DocBook content available in a Maven project site” shows all the details, from setting up DocBook content, to adding the plug-ins to the pom.xml. And generating DocBook content in a project site.

Getting the Plug-in

Adding the Maven Repository

The Maven Tidy Plug-in is not yet available through the central Maven Repository, but can be found in the project specific Maven Repository http://docbook-utils.sourceforge.net/maven2.

This Maven Repository can be added in a few different ways to a Maven build environment. Below are the most common options listed.

  • Add the <pluginRepository> lines direct to the project pom.xml

    Single user - Single project

  • Add the <pluginRepository> lines to a <profile> in the ~/.m2/setting.xml

    Single user - Multiple projects

  • Add the repository http://docbook-utils.sourceforge.net/maven2 to a Maven Repository Proxy, like Nexus, Artifactory, etc.

    Multiple users - Multiple projects

  <pluginRepositories>
    <pluginRepository>
      <id>docbook-utils</id>
      <name>DocBook Utils</name>
      <url>http://docbook-utils.sourceforge.net/maven2</url>
      <layout>default</layout>
      <releases>
        <enabled>true</enabled>
        <updatePolicy>daily</updatePolicy>
        <checksumPolicy>warn</checksumPolicy>
      </releases>
    </pluginRepository>
  </pluginRepositories>

Download the Plug-in

The files can also be downloaded from the project page on SourceForge.

Cleaning up (X)HTML and XML source files

Adding the Plug-in to a project

FIXME: Test if this indeed works (without exection part) FIXME: Add TIP how to let the plug-in run in another phase.

      <!--  Clean up (X)HTML and XML documents  -->
      <plugin>
        <groupId>net.sourceforge.docbook-utils.maven-plugin</groupId>
        <artifactId>maven-tidy-plugin</artifactId>
        <version>1.0.0.beta-1</version>
        <configuration>
          <!--  REQUIRED: The source directory (input files)  --> 
          <sourceDir>${project.build.directory}/generated-html</sourceDir>
        </configuration>
      </plugin>

An overview of the Plug-in is given on the Plug-in Usage page. Details of the configuration options are found on the goal tidy:tidy page.

Making DocBook content available in a Maven project site

When project documentation is written in the DocBook format This section describes how to add DocBook content to a Maven project site. For the conversion of DocBook content into HTML the docbkx-maven-plugin is used. Once the DocBook content is converted, the generated HTML needs to be clean up with the maven-tidy-plugin. After that the maven-site-plugin can generate the project site.

The XML code, for the DocBook content, the pom.xml and site.xml shown in the sections below, is also available as a bundle docbook-example-hello-world_1.0.0.alpha-1.tar.gz .

Adding DocBook content

The docbkx-maven-plugin requires that the DocBook sources are placed in the directory ${project}/src/docbkx. Create the directory and add the following file article-hello-world.xml to it:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
    "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

<article  lang="en">
  <title>Hello World!</title>
  
  <section>
    <title>Welcome</title>
    <para>
        A small DocBook example. 
    </para>

    <para>
        The details of this example can be found in
        <ulink url="http://docbook-utils.sourceforge.net/maven-tidy-plugin_1.0/docbook/article-user-guide.html#section-maven-tidy-plugin-for-docbook-content">
          Making DocBook content available in a Maven project site
        </ulink>
        .
    </para>
  </section>
</article>

Adding the Required Plug-ins

There are three Maven Plug-ins involved, in the conversion of DocBook content into a HTML based project site. In Table 1, “The DocBook Process Flow” an overview is given, of how the DocBook source files are processed in three steps.

Table 1. The DocBook Process Flow

Process Input Directory Output Directory
dobkx-maven-plugin src/docbkx target/docbkx/html
maven-tidy-plugin target/docbkx/html target/generated-site/xhtml/docbook
maven-site-plugin target/generated-site/xhtml/docbook target/site/docbook

The only plug-in which needs to be configured for this process setup is the maven-tidy-plugin. This is because the plug-in can convert any (X)HTML and XML document, and is not restricted to this specific setup only. Therefore the sourceDir and destinationDir need to be set. What also needs to be configured here is the replaceExtensionMap, this changes the file extension from html to xhtml. Which is needed, as the maven-site-plugin only picks up files with the extension xhtml.

Note

There is an additional directory docbook added, at the end of the destination directory. This is done so all the DocBook documents will have their own docbook directory under the site directory target/site/docbook. This is more clear, but also reduces the possibility of duplicate file names, as the maven-site-plugin uses different locations and Maven reports, to generate pages from.

The best place to add the three Maven Plug-ins in the pom.xml is under the <pluginManagement> section as seen below.

<?xml  version="1.0"  encoding="UTF-8"?>

<project  xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0  http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  
  <groupId>test</groupId>
  <artifactId>hello-world</artifactId>
  <version>1.0.0.alpha-1</version>
  
  <name>DocBook Hello World</name>
  <description>A small example, to show how DocBook content can be added to an Maven project site.</description>

  <pluginRepositories>
    <pluginRepository>
      <id>docbook-utils</id>
      <name>DocBook Utils</name>
      <url>http://docbook-utils.sourceforge.net/maven2</url>
      <layout>default</layout>
      <releases>
        <enabled>true</enabled>
        <updatePolicy>daily</updatePolicy>
        <checksumPolicy>warn</checksumPolicy>
      </releases>
    </pluginRepository>
  </pluginRepositories>

  <build>
    <pluginManagement>
      <plugins>
    
        <!--  Generate (X)HTML / PDF documents from DocBook content  -->
        <plugin>
          <groupId>com.agilejava.docbkx</groupId>
          <artifactId>docbkx-maven-plugin</artifactId>
          <version>2.0.10</version>
          <executions>
            <execution>
              <phase>pre-site</phase>
              <goals>
                <goal>generate-html</goal>
                <goal>generate-xhtml</goal>
                <goal>generate-pdf</goal>
              </goals>
            </execution>
          </executions>
          <configuration>
            <xincludeSupported>true</xincludeSupported>
          </configuration>
          <dependencies>
            <dependency>
              <groupId>org.docbook</groupId>
              <artifactId>docbook-xml</artifactId>
              <version>4.4</version>
              <scope>runtime</scope>
            </dependency>
          </dependencies>
        </plugin>

        <!--  Clean up the, by docbkx-maven-plugin, generated HTML  -->
        <plugin>
          <groupId>net.sourceforge.docbook-utils.maven-plugin</groupId>
          <artifactId>maven-tidy-plugin</artifactId>
          <version>1.0.0.beta-1</version>
          <executions>
            <execution>
              <phase>pre-site</phase>
              <goals>
                <goal>tidy</goal>
              </goals>
            </execution>
          </executions>
          <configuration>
            <sourceDir>${project.build.directory}/docbkx/html</sourceDir>
            <destinationDir>${project.build.directory}/generated-site/xhtml/docbook</destinationDir>
            <replaceExtensionMap>
              <html>xhtml</html>
            </replaceExtensionMap>
            <jtidyConfiguration>
              <property>
                <name>output-encoding</name>
                <value>UTF-8</value>
              </property>
              <property>
                <name>output-xhtml</name>
                <value>true</value>
              </property>
              <property>
                <name>indent</name>
                <value>true</value>
              </property>
              <property>
                <name>wrap</name>
                <value>120</value>
              </property>
              <property>
                <name>write-back</name>
                <value>true</value>
              </property>
            </jtidyConfiguration>
          </configuration>
        </plugin>

        <!--  Generate Project Site  -->
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-site-plugin</artifactId>
          <version>2.1</version>
          <dependencies>
            <dependency>
              <groupId>org.apache.maven.doxia</groupId>
              <artifactId>doxia-module-xhtml</artifactId>
              <version>1.1.2</version>
            </dependency>
          </dependencies>
        </plugin>
      </plugins>
    </pluginManagement>
    
    <!--  Plug-in used in this project  -->
    <plugins>

      <!--  Generate (X)HTML / PDF documents from DocBook content  -->
      <plugin>
        <groupId>com.agilejava.docbkx</groupId>
        <artifactId>docbkx-maven-plugin</artifactId>
      </plugin>

      <!--  Clean up the, by docbkx-maven-plugin, generated HTML  -->
      <plugin>
        <groupId>net.sourceforge.docbook-utils.maven-plugin</groupId>
        <artifactId>maven-tidy-plugin</artifactId>
      </plugin>

      <!--  Generate Project Site  -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-site-plugin</artifactId>
      </plugin>
    </plugins>
  </build>

  <reporting>
    <plugins>
    
      <!--  Add the Maven project information reports  -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-project-info-reports-plugin</artifactId>
        <version>2.1.2</version>
        <reportSets>
          <reportSet>
            <reports>
              <report>index</report>
              <!--
              <report>dependencies</report>
              <report>project-team</report>
              <report>mailing-list</report>
              <report>cim</report>
              <report>issue-tracking</report>
              <report>license</report>
              <report>scm</report>
               -->
            </reports>
          </reportSet>
        </reportSets>
      </plugin>
    </plugins>
  </reporting>
</project>

Note

The docbkx-maven-plugin can indeed also generate XHTML, but that generate XHTML also causes problems for the maven-site-plugin. De default phase under which the maven-tidy-plugin is excuted is the pre-site phase.

Warning

The maven-tidy-plugin must always run after the docbkx-maven-plugin, so if they are executed in the same phase, make shore the maven-tidy-plugin is placed after the docbkx-maven-plugin in the pom.xml.

Note

The reporting section the maven-project-info-reports-plugin was, only to reduce the generation of reports, as they are not relevant for this example project.

An overview of the Plug-in is given on the Plug-in Usage page. Details of the configuration options are found on the goal tidy:tidy page.

Updating the Project Site Menu

Create a menu item in the site.xml, so the DocBook content can be directly found on the project web site.

<?xml  version="1.0"  encoding="UTF-8"?>

<!--
  Configuring the Site Descriptor
  http://maven.apache.org/plugins/maven-site-plugin/examples/sitedescriptor.html 
 -->
<project  xmlns="http://maven.apache.org/xsd/decoration-1.0.0.xsd"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        xsi:schemaLocation="http://maven.apache.org/xsd/decoration-1.0.0  http://maven.apache.org/xsd/decoration-1.0.0.xsd"
        name="${project.name}">
  <bannerLeft>
    <name>${project.name}</name>
    <href>${project.url}</href>
  </bannerLeft>
  
  <body>
    <menu  name="${project.name}">
      <item  name="About"        href="index.html" />
      <item  name="Hello World"  href="docbook/article-hello-world.html" />
    </menu>

    <!--
    <menu ref="reports" />
     -->
  </body>  
  
</project>

Note

The <menu ref="reports" /> was left out, as the number of reports was reduced to a minimal. The only generated report is the About (index.html). Which is directly referenced through the <item name="About" href="index.html" />.

Generating the Maven Project Site

As in the projects pom.xml the dependency on maven-site-plugin v2.1, is maven, this project requires Maven v2.1 or above.

Procedure 1. Steps to generate the Maven Project Site

Below are the steps in detail given. This shows the steps which need to take place and in the case something goes wrong, it is wise to run these steps one by one, to see where something goes wrong.

  1. Generate HTML from the DocBook content

    $  mvn  docbkx:generate-html
    
  2. Convert the through docbkx generated HTML into XHTML

    $  mvn  tidy:tidy
    
  3. Generate the Maven project site

    $  mvn  site
    

Procedure 2. The Normal and Quick way for generating the Maven Project Site

  • Generate the Maven project site

    $  mvn  clean  site
    

JTidy

JTidy Configuration

An overview of the JTidy configuration keys and their default values, as in JTidy r938 (2009.12.01).

Name                        Type       Current Value
=========================== =========  ========================================
add-xml-decl                Boolean    no
add-xml-pi                  Boolean    no
add-xml-space               Boolean    no
alt-text                    String     
ascii-chars                 Boolean    yes
assume-xml-procins          Boolean    no
bare                        Boolean    no
break-before-br             Boolean    no
char-encoding               Encoding   ISO8859_1
clean                       Boolean    no
css-prefix                  Name       
doctype                     DocType    auto
drop-empty-paras            Boolean    yes
drop-font-tags              Boolean    no
drop-proprietary-attributes Boolean    no
enclose-block-text          Boolean    no
enclose-text                Boolean    no
error-file                  Name       
escape-cdata                Boolean    yes
fix-backslash               Boolean    yes
fix-bad-comments            Boolean    yes
fix-uri                     Boolean    yes
force-output                Boolean    no
gnu-emacs                   Boolean    no
hide-comments               Boolean    no
hide-endtags                Boolean    no
indent                      Indent     false
indent-attributes           Boolean    no
indent-cdata                Boolean    no
indent-spaces               Integer    2
input-encoding              Encoding   ISO8859_1
input-xml                   Boolean    no
join-classes                Boolean    no
join-styles                 Boolean    yes
keep-time                   Boolean    yes
language                    Name       
literal-attributes          Boolean    no
logical-emphasis            Boolean    no
lower-literals              Boolean    yes
markup                      Boolean    yes
ncr                         Boolean    yes
new-blocklevel-tags         Tag names  
new-empty-tags              Tag names  
new-inline-tags             Tag names  
new-pre-tags                Tag names  
newline                     Enum       lf
numeric-entities            Boolean    no
only-errors                 Boolean    no
output-encoding             Encoding   ASCII
output-html                 Boolean    no
output-raw                  Boolean    no
output-xhtml                Boolean    no
output-xml                  Boolean    no
quiet                       Boolean    no
quote-ampersand             Boolean    yes
quote-marks                 Boolean    no
quote-nbsp                  Boolean    yes
repeated-attributes         Enum       keep-last
replace-color               Boolean    no
show-body-only              Boolean    no
show-errors                 Integer    6
show-warnings               Boolean    yes
slide-style                 Name       
split                       Boolean    no
tab-size                    Integer    8
tidy-mark                   Boolean    yes
trim-empty-elements         Boolean    yes
uppercase-attributes        Boolean    no
uppercase-tags              Boolean    no
word-2000                   Boolean    no
wrap                        Integer    68
wrap-asp                    Boolean    yes
wrap-attributes             Boolean    no
wrap-jste                   Boolean    yes
wrap-php                    Boolean    yes
wrap-script-literals        Boolean    no
wrap-sections               Boolean    yes
write-back                  Boolean    no

Note

This document is based on Ready-to-use Software Engineering Templates template User Guide.



[1] See for the exact details the JTidy site.