Mr. XML Publisher provides a Java web-app interface to server-side XML processing capabilities and exposes those capabilities as formatting services.
XML processing has traditionally been performed using command line tools. The use of those tools is the most powerful way to process XML, and that is unlikely to change soon. The trick has always been how to expose the power of those tools to end users over a networked server. From within a web container, Mr. XML Publisher does exactly that, and it makes doing so easy by providing a rich set of administrative controls and data access to nearly every native XML database or relational database on the market.
When you use any XML publishing system, you make possible the enforcement of consistent formatting and the removal of formatting duties from your writers' responsibilities. Depending on the system's data access mechanisms, XML publishing systems can also make it easy to re-use content.
When you use Mr. XML Publisher, you centralize on a server the executables and config files that might otherwise exist on the desktop machines of many writers. Without a centralized server, the individual efforts required for installation, configuration, and any necessary synchronization can be very expensive.
You can try the Mr. XML Publisher for DocBook demo at http://swhitlat.com/XML_Publisher. The demo allows optional use of XSL customization layers and all its data pullers are activated.
Features
Mr. XML Publisher imposes almost no limits of any kind. It will access whatever libraries and run whatever executables it's instructed to access or run, and it imposes no per-user or per-CPU licensing restrictions.
End users select a format and upload an XML project. Mr. XML Publisher accepts the uploaded project and runs an XML processing tool chain that an administrator has specified for the format. Mr. XML Publisher runs each tool-chain command as an external subprocess managed by a Java virtual machine and then returns the formatted output.
Mr. XML Publisher is designed to work with any DTD, schema, or XSL, but it comes "out-of-the-box" ready to process DocBook XML using DocBook XSL. Mr. XML Publisher will work "out-of-the-box" only if you already have a DocBook XML publishing system and a web container in place and working on the target server. Therefore, with each license I provide up to fifteen hours of consultation, configuration, and custom programming to get your Mr. XML Publisher installation running the way you want it to run (cumulative limit of fifteen hours, by phone, email, and remote login). Note: I have not run Mr. XML Publisher with any XML/XSL combination other than DocBook XML formatted with DocBook XSL. If you have your own custom DTD/schema and XSL, Mr. XML Publisher can be made to work for you too. But it may take me more than fifteen hours to make that happen and I will need to charge you for the extra time.
Source code files for all JSP pages, the Java applet, JavaScript, HTML, and CSS are provided with a license. You may change them in any way you like, including re-naming "Mr. XML Publisher" to "Mrs. XML Publisher" or anything else. Customers are encouraged to use their local programming talent for modifications to the user interface. Source code for servlets and other classes is not provided under normal licensing terms but may be negotiated for sites with multiple licenses.
Any DTD, Schema, or XSL
Mr. XML Publisher imposes no limitations on what DTD, schema, or XSL is used. It can use XSL uploaded by users, pull the XSL from a database/repository, or use XSL residing on a server disk.
Any Transformer
Use any XSL transformer (xsltproc, Xalan, Saxon, etc.) to create HTML, FO, PostScript, PDF, CHM, or any other format for which you have XSL. Note, you will need the appropriate XSL to perform the transform. Mr. XML Publisher does not provide any XSL.
Data Access
As part of the formatting process, Mr. XML Publisher can include XML or textual data pulled from a native XML or relational database server. The data pulled can then be included in an XML project via <include> elements in the project's XML.
Mr. XML Publisher provides custom data pullers that have been tested with:
- IBM DB2 9.1
- Oracle 10g 10.2
- MS SQL Server 2005
- Sybase 15.0.2
- mySQL 5.0.41
- Tamino XML Server 4.4.1
- XHive 7.5.6
- MarkLogic 3.2-1
- TigerLogic XDMS 2.6.4
- XStreamDB 3.2
- eXist 1.1.1
- Xindice 1.1
- Sedna 2.0
For each server type listed above, a custom data puller takes full advantage of any XML-specific features, such as running XQuery queries. To pull data from an XML or relational database not listed, I can probably write a custom puller if the target database provides a Java API.
All custom data pullers are available with every Mr. XML Publisher installation. There are no required <ejb-ref>, <ejb-local-ref>, or <env-entry> elements to configure in Mr. XML Publisher's web.xml file. Mr. XML Publisher's data pullers operate independently of any JNDI mechanisms you have set up for your container or JEE server. Activating the data pullers requires no configuration related to the container or JEE app server in which Mr. XML Publisher runs.
When end users need to include pulled data in a project, they include a special file named "IncludeMap.xml" that specifies exactly what data needs to be pulled and what filenames are needed to satisfy the project's <include> elements. Mr. XML Publisher parses the project's IncludeMap.xml file and takes instructions from the file's content. Those instructions include a connection string and an SQL/XQuery string. As you plan your Mr. XML Publisher deployment, you may want to plan for providing assistance to those writers who will need help constructing connection and SQL/XQuery strings.
Automatic Connection Pooling
Connection pooling works automatically. There is no need for special integration with your container or JEE server, and there is no special configuration required in Mr. XML Publisher's web.xml. Mr. XML Publisher uses the Apache Jakarta DBCP libraries internally to automatically provide connection pooling for the following:
- IBM DB2
- Oracle
- MS SQL Server
- Sybase
- mySQL
Several of the supported native XML databases provide their own connection pooling, and Mr. XML Publisher takes advantage of those facilities whenever available. For example, the custom data puller for Tamino XML Server provides connection pooling through Tamino's proprietary connection pooling libraries.
Rich Administrative Controls
Administrators control Mr. XML Publisher by editing its web.xml file. Every Java web app has a web.xml file, and that file is called the "deployment descriptor." Most administration of Mr. XML Publisher is performed via the deployment descriptor's <context-param> elements.
Using <context-param> elements in the deployment descriptor, administrators can specify:
- A limit to the maximum number of concurrent requests.
The maximum number of concurrent requests should be less than or equal to the number of CPUs in that machine. Hyperthreading counts as 1/2 CPU.
- A limit to the maximum number of data pulls per-request.
Mr. XML Publisher does not impose any hard-coded limit on the number of data pulls and it does not dictate any restrictions as to the combination or order of data pulls.
- A limit to the maximum size allowed for any single data pull.
Once Mr. XML Publisher detects that a returned result set is over the limit, it ceases servicing of the request and sends the user an error message.
- A log level and a log file location.
Mr. XML Publisher uses log4j to log all application-generated messages. Administrators can specify a location for the log4j.properties file in Mr. XML Publisher's web.xml file. The log4j.properties file instructs log4j as to what log level to use ("info", "warn", "error", or "debug") and all other log4j details.
- Whether logging for performance timing is on/off.
Performance timing includes log entries for the time taken to fully service each request and for the time taken to complete each data pull. So as to get more accurate results, performance timing log entries are sent to System.out, not the log4j log file.
- The name and location of a temporary directory.
Uploaded projects are unpacked and processed in this directory. I recommend that this temporary directory be located on a RAM disk.
- An upper file-size threshold beyond which projects are written to disk before unpacking.
Whenever possible, Mr. XML Publisher processes uploaded projects in memory. However, depending on how much memory your system has, you might want to write uploaded project data to disk earlier than Mr. XML Publisher would otherwise if memory were abundant.
- A limit for the maximum size of an uploaded zipped project.
By default, the maximum size allowed for a zipped project is 1048576000 (~1GB), but it's an arbitrary limit that in most cases administrators are expected to override.
- Allowed filename extensions.
Based on filename extensions, Mr. XML Publisher can accept or reject files contained in uploaded projects. Rejected files are never written to disk and projects containing prohibited files are deleted before beginning any processing.
- Allowed mime types.
Based on a file's mime type, Mr. XML Publisher can accept or reject it. As with filename extension checks, files failing a mime type check are never written to disk and projects containing prohibited files are deleted before beginning any processing (by default, but can be changed). Mr. XML Publisher uses jMimeMagic to determine the mime type of each file in an uploaded project.
- Permissions and ownership of unpacked project files.
Setting file permissions and ownership is done differently for Windows and Unix/Linux. On Windows, the administrator specifies a
CACLScommand to be run. On Unix/Linux, the administrator specifies both achmodcommand and achowncommand. These commands are run immediately after unpacking an uploaded project. - An entire custom ENVIRONMENT in which to run each command.
By default, each command that Mr. XML Publisher runs in an XML processing tool-chain inherits the ENVIRONMENT of the process that started it. That ENVIRONMENT is the default ENVIRONMENT of the user under whose name the container or JEE server is running. Administrators can change that by defining a different set of ENVIRONMENT variables in web.xml. Mr. XML Publisher can then pass the new ENVIRONMENT to the external subprocess.
- Whether to retain uploaded projects.
Administrators can instruct Mr. XML Publisher to retain uploaded projects for all users, just for specific users, or not at all. By default, Mr. XML Publisher deletes all project files upon completing servicing of a request. However, administrators may want to retain uploaded projects along with the results of processing.
- Whether to allow data pulls.
Data pulls are allowed or not allowed for all users as a group. You cannot turn on/off Mr. XML Publisher's data access mechanisms for specific users. If a user has a valid connection string and a runnable query, by simply including an IncludeMap.xml file in an uploaded project, that user can include pulled data as part of the formatting process.
- A default timeout for data queries, after which they are cancelled.
Once a database query is executed, the thread running the query is mostly beyond control until the method call returns. But it is entirely possible for a query to never return or just take longer than is acceptable. Some database drivers provide a timeout setting, as is the case with Oracle's driver. However, even for JDBC-compliant drivers, each vendor's implementation is incomplete and unreliable. Mr. XML Publisher solves this problem. Administrators just specify a time limit in seconds beyond which a long-running query is cancelled or abandoned and local resources are recovered.
- Turn internal connection pooling on or off.
Administrators can disable automatic internal connection pooling for the JDBC-compliant databases. However, the speed at which data pulls are performed is optimized when internal connection pooling is turned on, and I recommend that it always be on. Internal connection pooling for Tamino XML Server and some of the other native XML databases cannot be turned off.
- XML, XSL, or any textual data to pull on web-app startup.
For some data pulls that are performed repeatedly, it can be useful to pull the data once and write it to disk. Pulling the data just once upon web-app startup can help to more effectively use your local resources.
Requirements
Automated publishing of XML in a client-server environment requires a bit more software, hardware, and administrative expertise than would be required if users were to publish XML from desktop machines.
Software
Mr. XML Publisher is currently developed on Linux and Windows using Tomcat 5.5/6.0. It will run under any operating system that hosts an appropriate servlet container. Mr. XML Publisher requires a servlet container compliant with the 2.4 specification (or higher). It does not require a full JEE application server; nonetheless, Mr. XML Publisher has been successfully tested on the following JEE application servers:
- JBoss 4.2.1.GA
- IBM WebSphere Application Server 6.1
- BEA WebLogic Server 10.0
- GlassFish V2 / Sun Java System Application Server 9.1
- Geronimo 2.0.1 (both Jetty6 and Tomcat6)
Mr. XML Publisher depends on several open-source software packages, all of which are governed by the terms of the Apache Software Foundation License or the GNU Lesser General Public License.
Hardware
Mr. XML Publisher simultaneously runs multiple processes, and it runs multiple threads within many of those processes. Thus, it takes full advantage of all your server's CPUs, CPU cores, and hyper-threading capabilities.
Mr. XML Publisher will run on a desktop class machine, or even a notebook, but it is not designed for those machines. Nor is it designed to run in a distributed environment. I have never even attempted to run Mr. XML Publisher in a distributed environment. Mr. XML Publisher is designed to run on a single server with multiple CPUs. If it is expected to service continuous requests, that server should be a dedicated server.
At the low end, for example, I suggest a dual Xeon machine running hyper-threaded CPUs at 2.4GHz, with 2GB of RAM. Mr. XML Publisher was developed on a dual 1.7GHz Xeon machine with 2GB of RAM. The Mr. XML Publisher for DocBook demo at http://swhitlat.com/XML_Publisher runs on a dual 2.4GHz Xeon machine with hyper-threading and 4GB of RAM.
At the high end, perhaps the server would use multiple quad-core CPUs.
Administrative Expertise
All Java web apps require some degree of administration. Mr. XML Publisher needs to be administered by someone familiar with the following:
- XML processing tool chains
- Java web app authentication
- The use of <context-param> values in the web.xml file
- Constructing connection strings and SQL/XQuery queries
An XML processing tool chain generally consists of command-line executables in which the output of one executable becomes the input for the next. After a series of transformations, the last executable in the chain outputs an HTML, PDF, CHM file, or possibly something else. If you do not have personnel who can set up the various XML processing tool chains needed for your publishing environment, I can set them up for you or coach someone through it.
Authenticating users via LDAP, a database, or a proprietary security system is, of course, probably a good idea; however, it is unlikely that I can help much with that. As you plan your Mr. XML Publisher deployment, plan to use your local talent if you intend to integrate user authentication with an LDAP, database, or similar security system. Mr. XML Publisher's default web.xml file specifies the use of BASIC authentication. BASIC is the least secure authentication method, but it is easy and it is available in all web containers. If your container provides DIGEST authentication services, by changing the value of the <auth-method> element in Mr. XML Publisher's web.xml file you can switch to DIGEST authentication. Doing so will greatly improve security, but you must consult the documentation for your container and follow its directions for setting up encrypted passwords. To authenticate users against your local LDAP or database server, you would probably want to use a custom HTML form and force the login to occur over HTTPS; thus, you would need an administrator/developer with knowledge of your LDAP or database server, knowledge of Java web-app security as administered via a web.xml file, and fundamental knowledge of HTML forms.
Administrators control Mr. XML Publisher through <context-param> values. So, a Mr. XML Publisher administrator will need to read and understand the documentation for all of Mr. XML Publisher's <context-param> elements.
To use Mr. XML Publisher's data pullers, end users include an IncludeMap.xml file in their project. Writers may need help with their IncludeMap.xml files or have them written for them. Thus, someone, most likely the administrator, needs the ability to write connection strings and SQL/XQuery queries.
Support
With the purchase of each license, I include one year of unlimited technical support by email. Thereafter, unlimited technical support by email costs $200 per year. Technical support includes free access to all bug fixes and component upgrades. If I become unable to provide technical support (disability, death, etc.) and no legal entity has assumed the obligation, all source code will be provided to all customers with current support contracts.
Suggestions
If reasonably applicable to your publishing environment, plan for your initial Mr. XML Publisher deployment to be similar to the Mr. XML Publisher for DocBook demo at: http://swhitlat.com/XML_Publisher. If you want to use a different or an additional DTD/schema and XSL combination, I can help you with the decision and I can probably modify Mr. XML Publisher to accommodate any necessary changes.
I designed Mr. XML Publisher so that it could accommodate any DTD/schema and XSL combination, requiring changes only to the files for which source code is provided. But that design has not been thoroughly tested. A Mr. XML Publisher license does include fifteen free hours of consultation and custom programming and I'm willing to use that time to make Mr. XML Publisher use whatever DTD/schema and XSL you have. However, I do not promise that I can make Mr. XML Publisher work with your DTD/schema and XSL combination within the free fifteen hours. Custom programming beyond the fifteen hours included with the license will cost extra.
I suggest that you use your local programming talent for most GUI modifications, especially for those that are mainly visual. You will have all the source code necessary to do that. If you save the fifteen hours of consultation and custom programming that I provide, I can use that time for any necessary modifications to the servlets and class files. And whether you are modifying the GUI or developing the tool-chain commands for your formats, use a web container that makes the job easy. I recommend Tomcat. Later, if desired, you can move Mr. XML Publisher to your production environment and onto a JEE server.
The default Mr. XML Publisher GUI operates the same as the Mr. XML Publisher for DocBook demo at http://swhitlat.com/XML_Publisher, but it looks a bit different because some of the visual elements in the demo are not appropriate for customer installations. You can view the way Mr. XML Publisher would look for you "out-of-the-box" here.
Price and Licensing Terms
The price is $1750 in US dollars. Included in this price is up to fifteen hours of custom Java programming and/or consultation via phone, email, or remote login. Additional custom programming costs $75/hour. For example, it would probably require many hours to get Mr. XML Publisher to take full advantage of the resources in a distributed environment.
A single Mr. XML Publisher license entitles you to run Mr. XML Publisher on a single machine. There is no limit to the number of CPUs that machine can have.
Licensing terms include the following restrictions:
- You are not allowed to make use of Mr. XML Publisher publicly available over the Internet. Specifically, you are not allowed to use any of Mr. XML Publisher's servlets or class files in any application accessible outside of your local area network, regardless of whether you have renamed Mr. XML Publisher and regardless of any user interface changes you may have made.
- You are not allowed to re-sell, deconstruct, or reverse engineer any of the servlets or class files for which source code is not provided.
I accept payment only by check or money order. I do not accept credit card payments.
Steve Whitlatch
P.O. Box 32841
Phoenix, Arizona
85064
602-956-2966
swhitlat@getnet.net
You are welcome to contact me to discuss deployment of Mr. XML Publisher.