Skip to main content
2025 Python Packaging Survey is now live!  Take the survey now

A Python interface to XSL-FO libraries (Conversion HTML to PDF, RTF, DOCX, WML and ODT)

Project description

=======================================
A Python interface to XSL-FO libraries.
=======================================

The zopyx.convert package helps you to convert HTML to PDF, RTF, ODT, DOCX and
WML using XSL-FO technology.


Requirements
============

- Java 1.5.0 or higher (FOP 0.94 requires Java 1.6 or higher)

- `csstoxslfo`__ (included)

__ http://www.re.be/css2xslfo

- `XFC-4.0`__ (XMLMind) for ODT, RTF, DOCX and WML support (if needed)

__ http://www.xmlmind.com/foconverter

- `XINC 2.0`__ (Lunasil) for PDF support (commercial)

__ http://www.lunasil.com/products.html

- or `FOP 0.94`__ (Apache project) for PDF support (free)

__ http://xmlgraphics.apache.org/fop/download.html#dist-type

- `BeautifulSoup`__ (will be installed automatically through easy_install. See Installation.)

__ http://www.crummy.com/software/BeautifulSoup/

- `ElementTree`__ (will be installed automatically through easy_install. See Installation.)

__ http://effbot.org/zone/element-index.html

Installation
============

- install **zopyx.convert** either using ``easy_install`` or by downloading the sources from the Python Cheeseshop.
This will install automatically the Beautifulsoup and Elementree modules if necessary.
- the environment variable *XFCDIRmustbesetandpointtotherootofyourXFCinstallationdirectorytheenvironmentvariableXINC_HOME* must be set and to point to the root of your XINC installation directory
- the environment variable *$FOP_HOME* must be set and point to the root of your FOP installation directory

Supported platforms
===================

Windows, Unix


Subversion repository
=====================

- http://svn-public.zopyx.com/viewvc/python-projects/zopyx.convert/trunk/


Usage
=====

Some examples from the Python command-line::

from zopyx.convert import Converter
C = Converter('/path/to/some/file.html')
pdf_filename = C('pdf') # using XINC
pdf2_filename = C('pdf2') # using FOP
rtf_filename = C('rtf')
pdt_filename = C('odt')
wml_filename = C('wml')
docx_filename = C('docx')

A very simple command-line converter is also available::

xslfo-convert --format rtf --output foo.rtf sample.html


`xslfo-convert` has a --test option that will convert some
sample HTML. If everything is ok then you should see something like that::

>xslfo-convert --test
Entering testmode
pdf: /tmp/tmpuOb37m.html -> /tmp/tmpuOb37m.pdf
rtf: /tmp/tmpuOb37m.html -> /tmp/tmpuOb37m.rtf
docx: /tmp/tmpuOb37m.html -> /tmp/tmpuOb37m.docx
odt: /tmp/tmpuOb37m.html -> /tmp/tmpuOb37m.odt
wml: /tmp/tmpuOb37m.html -> /tmp/tmpuOb37m.wml
pdf: /tmp/tmpZ6PGo9.html -> /tmp/tmpZ6PGo9.pdf
rtf: /tmp/tmpZ6PGo9.html -> /tmp/tmpZ6PGo9.rtf
docx: /tmp/tmpZ6PGo9.html -> /tmp/tmpZ6PGo9.docx
odt: /tmp/tmpZ6PGo9.html -> /tmp/tmpZ6PGo9.odt
wml: /tmp/tmpZ6PGo9.html -> /tmp/tmpZ6PGo9.wml


How zopyx.convert works internally
==================================

- The source HTML file is converted to XHTML using mxTidy
- the XHTML file is converted to FO using the great "csstoxslfo" converter
written by Werner Donne.
- the FO file is passed either to the external XINC or XFC converter to
generated the desired output format
- all converters are based on Java technology make the conversion solution
highly portable across operating system (including Windows)

Known issues
============

- If you are using zopyx.convert together with FOP: use the latest FOP 0.94
only. Don't use any packaged FOP version like the one from MacPorts which is
known to be broken.

- Ensure that you have read the ``csstoxslfo`` documentation. ``csstoxslfo`` has
several requirements about the HTML markup. Don't expect that it is the ultimate
HTML converter. Any questions regarding the necessary markup are documented in the
``csstoxslfo`` documentation and will not be answered.

Author
======

**zopyx.convert** was written by Andreas Jung for ZOPYX Ltd. & Co. KG, Tuebingen, Germany.


License
=======

**zopyx.convert** is published under the Zope Public License 2.1 (ZPL).
See LICENSE.txt.


Contact
=======

| ZOPYX Ltd. & Co. KG
| c/o Andreas Jung,
| Charlottenstr. 37/1
| D-72070 Tuebingen, Germany
| E-mail: info at zopyx dot com
| Web: http://www.zopyx.com

Changes:
========

1.1.11 (07.06.2009)
------------------
- moved code repository to svn.zope.org
- changed license to ZPL

1.1.10 (29.05.2009)
------------------
- support for USE_OS_SYSTEM environment variable (as workaround
for hanging Java processes)

1.1.9 (04.01.2009)
------------------
- fixed packaging issue

1.1.8 (26.06.2008)
------------------
- changed logging levels
- reorganized files

1.1.7 (20.06.2008)
------------------
- better support for csstoxslfo commandline options

1.1.6 (19.04.2008)
------------------
- call 'fop' using bash
- better logger configuration
- minor code cleanup

1.1.5 (01.03.2008)
------------------

- updated documentation

1.1.4 (05.02.2008)
------------------

- remove duplicate ID attributes

1.1.3 (31.01.2008)
------------------

- clearified Java requirements for FOP

1.1.2 (22.01.2008)
------------------

- removed some nasty debugging code

1.1.1 (22.01.2008)
------------------

- supporting FOP on Windows

1.1.0 (20.01.2008)
------------------

- support for free FOP PDF converter


1.0.6 (14.10.2007)
------------------

- html2fo: added workaround for generated FO code for
PRE tags

1.0.5 (05.10.2007)
------------------

- minor bugfixes

1.0.4 (05.10.2007)
------------------

- Windows support added

1.0.3 (04.10.2007)
------------------

- passing -Duser.language=en to java in order to
prevent corrupted FO code caused by locales


1.0.2 (03.10.2007)
------------------

- bugfix

1.0.1 (03.10.2007)
------------------

- added --test option to command-line frontend


1.0.0 (30.09.2007)
------------------

- update to css2xslfo V 1.5.0

- official 1.0.0 release

0.5.0 (09.09.2007)
------------------

- replaced mxTidy related code with the BeautifulSoup
module (no longer requires any compiling)

- html2fo checks the existence of images


0.4.9 (25.07.2007)
------------------

- support for utidy lib (which is the preferred tidy library).
Using mx.Tidy only as fallback

0.4.8 (unreleased)
------------------

- unreleased

0.4.7 (08.07.2007)
------------------

- reSTified documentation

0.4.6 (08.07.2007)
------------------

- fixes in availableFormats()

0.4.5 (07.07.2007)
------------------

- various FO fixes

0.4.4 (06.07.2007)
------------------

- using logging module

0.4.3 (05.07.2007)
------------------

- html2fo: using ElementTree for most FO modifications

0.4.2 (30.06.2007)
------------------

- converting page-break-after: always back into break-after: page

0.4.1 (24.06.2007)
------------------

- various fixes

0.4.0 (24.06.2007)
------------------

- added zope interfaces
- converters are now classes
- added unittests

0.3.1 (18.06.2007)
------------------

- html2fo() and the converter constructor got a new 'encoding'
parameter in order to specify the input encoding of the
HTML file. This parameter will be passed down to Tidy in order
to perform a proper conversion of non-ascii characters.

0.3.0 (unreleased)
------------------

- using subprocess module of Python
- new Convert() class for high-level XSLFO access
- logger added
- better checks for XINC, XFC
- updated documentation


0.2.0 (16.06.2007)
------------------

- PDF support added
- command line interface added
- mxTidy integration


0.1.0 (16.06.2007)
------------------

- initial release

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page