Using Ht://Dig more efficiently for PDF

Indexing PDF for Ht://Dig and keeping page number information

Here you'll find a Python script called pdftodig.py to index PDF files for Ht://dig and add 'page=XX' anchors in the index information. Such anchors can be used by Acrobat reader's plugin for Netscape to tell it that is should download the corresponding xml document (usually generated by a CGI script like the one on  http://www.volkspost.de/cgihffen.html )that tells Acrobat what page to open the document at.

Searching with Ht://Dig and providing precise links

If you want such anchors to occur in the search results of htsearch, you need to add anchors to excerpts ( add_anchor_to_excerpt: true ). This way the htsearch results will contains links like : "http://server/mydoc.pdf#page=234" on words matching your search keywords if the excerpt is showing page 234 of your document.

Viewing the results with Xpdf

Of course you don't need to use Acrobat Reader to make the most of such URLs, you can use a little wrapper script for the excellent Xpdf  program like my little  pdf_helper . To use it under netscape you must specify pdf_helper %s %u as your PDF application helper. Then Xpdf will open the right page every time you click on a link in an excerpt.

Gaillard Pierre-Olivier

Last modified: Mon Dec 13 22:37:47 CET 1999