1.0.0.beta-1
Copyright © 2010 Team - Maven Tidy Plug-in
Table of Contents
The Maven Tidy Plug-in has been created to clean up (X)HTML and XML documents as part of a Maven build. It specializes, through configuration, for preparing XHTML files for a Maven generated project site. A case where this is required is when HTML has been generated from DocBook content through the docbkx-maven-plugin. The generated HTML is not 100% correctly formatted. To clean it up JTidy is used. This plug-in makes JTidy available is part of the Maven build process.
The Maven Tidy Plug-in can clean up HTML, XHTML and XML documents [1] . The the section called “ Cleaning up (X)HTML and XML source files” describes how to add and configure the plug-in for general use.
The the section called “Making DocBook content available in a Maven project site” shows all the details, from setting up DocBook content, to adding the plug-ins to the pom.xml. And generating DocBook content in a project site.
The Maven Tidy Plug-in is not yet available through the central Maven Repository, but can be found in the project specific Maven Repository http://docbook-utils.sourceforge.net/maven2.
This Maven Repository can be added in a few different ways to a Maven build environment. Below are the most common options listed.
Add the <pluginRepository> lines direct to the project pom.xml
Single user - Single project
Add the <pluginRepository> lines to a <profile> in the ~/.m2/setting.xml
Single user - Multiple projects
Add the repository http://docbook-utils.sourceforge.net/maven2 to a Maven Repository Proxy, like Nexus, Artifactory, etc.
Multiple users - Multiple projects
<pluginRepositories> <pluginRepository> <id>docbook-utils</id> <name>DocBook Utils</name> <url>http://docbook-utils.sourceforge.net/maven2</url> <layout>default</layout> <releases> <enabled>true</enabled> <updatePolicy>daily</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> </pluginRepository> </pluginRepositories>
The files can also be downloaded from the project page on SourceForge.
FIXME: Test if this indeed works (without exection part) FIXME: Add TIP how to let the plug-in run in another phase.
<!-- Clean up (X)HTML and XML documents --> <plugin> <groupId>net.sourceforge.docbook-utils.maven-plugin</groupId> <artifactId>maven-tidy-plugin</artifactId> <version>1.0.0.beta-1</version> <configuration> <!-- REQUIRED: The source directory (input files) --> <sourceDir>${project.build.directory}/generated-html</sourceDir> </configuration> </plugin>
An overview of the Plug-in is given on the Plug-in Usage page. Details of the configuration options are found on the goal tidy:tidy page.
When project documentation is written in the DocBook format This section describes how to add DocBook content to a Maven project site. For the conversion of DocBook content into HTML the docbkx-maven-plugin is used. Once the DocBook content is converted, the generated HTML needs to be clean up with the maven-tidy-plugin. After that the maven-site-plugin can generate the project site.
The XML code, for the DocBook content, the pom.xml and site.xml shown in the sections below, is also available as a bundle docbook-example-hello-world_1.0.0.alpha-1.tar.gz .
The docbkx-maven-plugin requires that the DocBook sources are placed in the directory ${project}/src/docbkx. Create the directory and add the following file article-hello-world.xml to it:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"> <article lang="en"> <title>Hello World!</title> <section> <title>Welcome</title> <para> A small DocBook example. </para> <para> The details of this example can be found in <ulink url="http://docbook-utils.sourceforge.net/maven-tidy-plugin_1.0/docbook/article-user-guide.html#section-maven-tidy-plugin-for-docbook-content"> Making DocBook content available in a Maven project site </ulink> . </para> </section> </article>
There are three Maven Plug-ins involved, in the conversion of DocBook content into a HTML based project site. In Table 1, “The DocBook Process Flow” an overview is given, of how the DocBook source files are processed in three steps.
Table 1. The DocBook Process Flow
Process | Input Directory | Output Directory |
---|---|---|
dobkx-maven-plugin | src/docbkx | target/docbkx/html |
maven-tidy-plugin | target/docbkx/html | target/generated-site/xhtml/docbook |
maven-site-plugin | target/generated-site/xhtml/docbook | target/site/docbook |
The only plug-in which needs to be configured for this process setup is the maven-tidy-plugin. This is because the plug-in can convert any (X)HTML and XML document, and is not restricted to this specific setup only. Therefore the sourceDir and destinationDir need to be set. What also needs to be configured here is the replaceExtensionMap, this changes the file extension from html to xhtml. Which is needed, as the maven-site-plugin only picks up files with the extension xhtml.
There is an additional directory docbook added, at the end of the destination directory. This is done so all the DocBook documents will have their own docbook directory under the site directory target/site/docbook. This is more clear, but also reduces the possibility of duplicate file names, as the maven-site-plugin uses different locations and Maven reports, to generate pages from.
The best place to add the three Maven Plug-ins in the pom.xml is under the <pluginManagement> section as seen below.
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>test</groupId> <artifactId>hello-world</artifactId> <version>1.0.0.alpha-1</version> <name>DocBook Hello World</name> <description>A small example, to show how DocBook content can be added to an Maven project site.</description> <pluginRepositories> <pluginRepository> <id>docbook-utils</id> <name>DocBook Utils</name> <url>http://docbook-utils.sourceforge.net/maven2</url> <layout>default</layout> <releases> <enabled>true</enabled> <updatePolicy>daily</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> </pluginRepository> </pluginRepositories> <build> <pluginManagement> <plugins> <!-- Generate (X)HTML / PDF documents from DocBook content --> <plugin> <groupId>com.agilejava.docbkx</groupId> <artifactId>docbkx-maven-plugin</artifactId> <version>2.0.10</version> <executions> <execution> <phase>pre-site</phase> <goals> <goal>generate-html</goal> <goal>generate-xhtml</goal> <goal>generate-pdf</goal> </goals> </execution> </executions> <configuration> <xincludeSupported>true</xincludeSupported> </configuration> <dependencies> <dependency> <groupId>org.docbook</groupId> <artifactId>docbook-xml</artifactId> <version>4.4</version> <scope>runtime</scope> </dependency> </dependencies> </plugin> <!-- Clean up the, by docbkx-maven-plugin, generated HTML --> <plugin> <groupId>net.sourceforge.docbook-utils.maven-plugin</groupId> <artifactId>maven-tidy-plugin</artifactId> <version>1.0.0.beta-1</version> <executions> <execution> <phase>pre-site</phase> <goals> <goal>tidy</goal> </goals> </execution> </executions> <configuration> <sourceDir>${project.build.directory}/docbkx/html</sourceDir> <destinationDir>${project.build.directory}/generated-site/xhtml/docbook</destinationDir> <replaceExtensionMap> <html>xhtml</html> </replaceExtensionMap> <jtidyConfiguration> <property> <name>output-encoding</name> <value>UTF-8</value> </property> <property> <name>output-xhtml</name> <value>true</value> </property> <property> <name>indent</name> <value>true</value> </property> <property> <name>wrap</name> <value>120</value> </property> <property> <name>write-back</name> <value>true</value> </property> </jtidyConfiguration> </configuration> </plugin> <!-- Generate Project Site --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-site-plugin</artifactId> <version>2.1</version> <dependencies> <dependency> <groupId>org.apache.maven.doxia</groupId> <artifactId>doxia-module-xhtml</artifactId> <version>1.1.2</version> </dependency> </dependencies> </plugin> </plugins> </pluginManagement> <!-- Plug-in used in this project --> <plugins> <!-- Generate (X)HTML / PDF documents from DocBook content --> <plugin> <groupId>com.agilejava.docbkx</groupId> <artifactId>docbkx-maven-plugin</artifactId> </plugin> <!-- Clean up the, by docbkx-maven-plugin, generated HTML --> <plugin> <groupId>net.sourceforge.docbook-utils.maven-plugin</groupId> <artifactId>maven-tidy-plugin</artifactId> </plugin> <!-- Generate Project Site --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-site-plugin</artifactId> </plugin> </plugins> </build> <reporting> <plugins> <!-- Add the Maven project information reports --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-project-info-reports-plugin</artifactId> <version>2.1.2</version> <reportSets> <reportSet> <reports> <report>index</report> <!-- <report>dependencies</report> <report>project-team</report> <report>mailing-list</report> <report>cim</report> <report>issue-tracking</report> <report>license</report> <report>scm</report> --> </reports> </reportSet> </reportSets> </plugin> </plugins> </reporting> </project>
The docbkx-maven-plugin can indeed also generate XHTML, but that generate XHTML also causes problems for the maven-site-plugin. De default phase under which the maven-tidy-plugin is excuted is the pre-site phase.
The maven-tidy-plugin must always run after the docbkx-maven-plugin, so if they are executed in the same phase, make shore the maven-tidy-plugin is placed after the docbkx-maven-plugin in the pom.xml.
The reporting section the maven-project-info-reports-plugin was, only to reduce the generation of reports, as they are not relevant for this example project.
An overview of the Plug-in is given on the Plug-in Usage page. Details of the configuration options are found on the goal tidy:tidy page.
Create a menu item in the site.xml, so the DocBook content can be directly found on the project web site.
<?xml version="1.0" encoding="UTF-8"?> <!-- Configuring the Site Descriptor http://maven.apache.org/plugins/maven-site-plugin/examples/sitedescriptor.html --> <project xmlns="http://maven.apache.org/xsd/decoration-1.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/xsd/decoration-1.0.0 http://maven.apache.org/xsd/decoration-1.0.0.xsd" name="${project.name}"> <bannerLeft> <name>${project.name}</name> <href>${project.url}</href> </bannerLeft> <body> <menu name="${project.name}"> <item name="About" href="index.html" /> <item name="Hello World" href="docbook/article-hello-world.html" /> </menu> <!-- <menu ref="reports" /> --> </body> </project>
The <menu ref="reports" /> was left out, as the number of reports was reduced to a minimal. The only generated report is the About (index.html). Which is directly referenced through the <item name="About" href="index.html" />.
As in the projects pom.xml the dependency on maven-site-plugin v2.1, is maven, this project requires Maven v2.1 or above.
Procedure 1. Steps to generate the Maven Project Site
Below are the steps in detail given. This shows the steps which need to take place and in the case something goes wrong, it is wise to run these steps one by one, to see where something goes wrong.
Generate HTML from the DocBook content
$ mvn docbkx:generate-html
Convert the through docbkx generated HTML into XHTML
$ mvn tidy:tidy
Generate the Maven project site
$ mvn site
Procedure 2. The Normal and Quick way for generating the Maven Project Site
Generate the Maven project site
$ mvn clean site
An overview of the JTidy configuration keys and their default values, as in JTidy r938 (2009.12.01).
Name Type Current Value =========================== ========= ======================================== add-xml-decl Boolean no add-xml-pi Boolean no add-xml-space Boolean no alt-text String ascii-chars Boolean yes assume-xml-procins Boolean no bare Boolean no break-before-br Boolean no char-encoding Encoding ISO8859_1 clean Boolean no css-prefix Name doctype DocType auto drop-empty-paras Boolean yes drop-font-tags Boolean no drop-proprietary-attributes Boolean no enclose-block-text Boolean no enclose-text Boolean no error-file Name escape-cdata Boolean yes fix-backslash Boolean yes fix-bad-comments Boolean yes fix-uri Boolean yes force-output Boolean no gnu-emacs Boolean no hide-comments Boolean no hide-endtags Boolean no indent Indent false indent-attributes Boolean no indent-cdata Boolean no indent-spaces Integer 2 input-encoding Encoding ISO8859_1 input-xml Boolean no join-classes Boolean no join-styles Boolean yes keep-time Boolean yes language Name literal-attributes Boolean no logical-emphasis Boolean no lower-literals Boolean yes markup Boolean yes ncr Boolean yes new-blocklevel-tags Tag names new-empty-tags Tag names new-inline-tags Tag names new-pre-tags Tag names newline Enum lf numeric-entities Boolean no only-errors Boolean no output-encoding Encoding ASCII output-html Boolean no output-raw Boolean no output-xhtml Boolean no output-xml Boolean no quiet Boolean no quote-ampersand Boolean yes quote-marks Boolean no quote-nbsp Boolean yes repeated-attributes Enum keep-last replace-color Boolean no show-body-only Boolean no show-errors Integer 6 show-warnings Boolean yes slide-style Name split Boolean no tab-size Integer 8 tidy-mark Boolean yes trim-empty-elements Boolean yes uppercase-attributes Boolean no uppercase-tags Boolean no word-2000 Boolean no wrap Integer 68 wrap-asp Boolean yes wrap-attributes Boolean no wrap-jste Boolean yes wrap-php Boolean yes wrap-script-literals Boolean no wrap-sections Boolean yes write-back Boolean no