Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 31, 2014

DOCX -> HTML/CSS

Filed under: Conversion,Microsoft,XML — Patrick Durusau @ 2:04 pm

Transform DOCX to HTML/CSS with High-Fidelity using PowerTools for Open XML by Eric White.

From the post:

Today I am happy to announce the release of HtmlConverter version 2.06.00, which is a high fidelity conversion from DOCX to HTML/CSS. HtmlConverter is a module in the PowerTools for Open XML project.

….
HtmlConverter.cs 2.06.00 supports:

  • Paragraph styles, character styles, and table styles, including styles that are based on other styles.
  • Table styles includes support for conditional table style options (header row, total row, banded rows, first column, last column, and banded columns.
  • Fonts, including font styles such as bold, italic, underline, strikethrough, foreground and background colors, shading, sub-script, super-script, and more.  HtmlConverter is, in effect, guidance on how to correctly determine the font and formatting for each paragraph and text run in a document.
  • Numbered and bulleted lists.  Current support is only for en-US and fr-FR; however, HtmlConverter is factored and parameterized so that you can support other languages without altering the source code.  In the near future, I’ll be publishing guidance and instructions on how to support additional languages, and I’ll be asking for volunteers to write and contribute the bits of code to generate canonical (one, two, three) and ordinal (first, second, third) implementations for your native language, as well as the various Asian and RTL numbering systems.
  • Tabs, including left tabs, right tabs, centered tabs, and decimal tabs.  HtmlConverter takes the approach of using font metrics to calculate the exact width of the various pieces of text in a line, and inserts <span> elements with precisely calculated widths.
  • High fidelity support for vertical white space and horizontal white space, including indented text, hanging indents, centered text, right justified text, and justified text.
  • Borders around paragraphs, and high fidelity for borders of tables.
  • Horizontally and vertically merged cells in tables.
  • External hyperlinks, and internal hyperlinks to bookmarks within the document.
  • You have much more control over the conversion when compared to other approaches to converting to HTML.  There are already a number of parameters that enable you to control the transformation, and in the future I’ll be adding many more knobs and levers to fine tune the conversion.  And of course, you have the source code, so you can customize the conversion for your scenario.

See Eric’s post for questions about what priority desired features should have for addition to HtmlConverter.

BTW:

PowerTools for Open XML is licensed under the Microsoft Public License (Ms-PL), which gives you wide latitude in how you use the code, including its use in commercial products and open source projects.

It won’t be long until “not open source” software will be worthy of comment.

I first saw this in a tweet by Open Microsoft.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress