Skip to content

Personal tools
You are here: Home » Christian Zagrodnick » PortalTransforms: PDF with Images

PortalTransforms: PDF with Images

When you upload a PDF to a Plone document Plone's standard portal transforms render a HTML with broken image refernces. This document describes how to enable the images.

The default pdf_to_html tranform from PortalTransforms only reads out the HTML-Code generated by the pdftohtml programm. Also loading the images is not much work – the office converter already does it.

class pdf_to_html(commandtransform):
...

def convert(self, data, cache, **kwargs):
name = self.name()
tmpdir, fullname = self.initialize_tmpdir(data, filename=name)
target_name = '%s/%s.html' % (tmpdir, name)

command = 'cd "%s" && pdftohtml -noframes -enc UTF-8 %s %s' % (
tmpdir, fullname, target_name)
log('PortalTransforms: Calling %s' % command)
os.system(command)

html = self.html(target_name)
path, images = self.subObjects(tmpdir)
objects = {}
if images:
self.fixImages(path, images, objects)
self.cleanDir(tmpdir)

cache.setData(html)
cache.setSubObjects(objects)
return cache

def html(self, html_file_name):
htmlfile = file(html_file_name, 'r')
html = htmlfile.read()
htmlfile.close()
html = scrubHTML(html)
body = bodyfinder(html)
return body

The whole file is available on gocept's subversion server as part of glome.

To enable the transform you point your browser to the portal_transforms of your Plone site. Delete the pdf_to_html transform and add a new transform. Use pdf_to_html as Id. The module name depends on where you installed the transform. In the case of glome it would be Products.glome.transforms.pdf_to_html.

Note, that there is a bug in the PortalTransforms which ships with Plone 2.0.5 which doesn't unregister the transform if you delete it.


Created by zagy
Last modified 04.10.2005 19:02
 

Powered by Plone

This site conforms to the following standards: