Eric S. Raymond wrote:
Howard Chu hyc@symas.com:
The patch changes much more than your bug report mentions. The error message you provide is pretty ambiguous, in particular you haven't mentioned exactly which markup element is out of order. Without this information we can't confirm what you're fixing.
Globally, I'm trying to fix up the entire Linux manual page corpus so it can be automatically lifted to clean HTML. I've been working in this intermittently since 2002, and am not far from the goal. In approximately 12,000 pages carried in stock Ubuntu, 94% now lift without errors or warnings. For all but 17 of the 6% remaining I have fix patches which I'm trying to get merged upstream.
I entered one of those patches in your bugtracker, but failed to notice that the error code in my bug database entry for slapd.conf.5 was incorrect (failed to match the patch). I apologize for this; full explanation follows.
My conversion tool is doclifter, which you can read about at http://www.catb.org/~esr/doclifter/. It lifts manual pages to XML-DocBook, which in turn is very easily rendered to HTML. This sequence produces better-quality HTML than tools like man2html can generate, because of the amount of content analysis done in the doclifter intermediate step.
There are some legal troff constructions that doclifter cannot parse well. These are the same sorts of things that any man-page renderer other than groff itself is likely to get wrong; thus, they are likely to confuse tools such as XMan, TkMan, and Rosetta as well as doclifter.
Hm. I use my own man2html http://highlandsun.com/hyc/man2html.c which gives pretty good looking output for us. Really, if you're developing a tool that claims to read troff input, it has to actually do so. I mean, the point of tools such as this is to be able to convert existing documents without modifying them, isn't it?
Thhe patch I sent you introduces an .EX/.EE macro pair for framing code examples, a common extension often found on older Unix manual pages (I believe it originated in DEC Ultrix). This macro encapsulates a bunch of low-level troff operations that are hard to lift well in isolation. There's a ruleset in doclifter that recognizes .EX/.EE, analyzes the example content, and generates <programlisting> or <literallayout> tags as appropriate.
In some cases .EX/.EE replaces .RS/.RE pairs; in others it replaces .nf/.fi pairs, in still others it replaces .TP, and I think there are a couple of odd combinations of .RS/.RE with .TP in there too. The fact that it has to mess with all of these is what makes the patch so complex. But the result is simple: to express the "example" intention better so it can be structurally translated.