PDF2HTM offers you the possibility of converting your PDF files into web-compatible documents almost instantly. This program can create either an HTML file with the complete text and all the images present in the original document, or a text-only HTML file. When you choose to include all the images, these can be converted from BMP to JPEG to optimize the size of the output file, and, besides, they are also saved separately into individual image files.
The program can convert either individual PDF files or all the PDF documents contained in a given folder. So, to make use of this “batch processing” functionality, you will need to put all the input files in the same folder so they can all be processed in just one go. Password-protected files can be also transformed into HTML, as long as you can provide the corresponding password.
Its performance speed is certainly its most notable asset, though the information you get from the program does not always match reality. When converting a PDF file containing nearly 700 pages (filled with images, tables, etc.) the process seem to take –virtually- less than a second. The full process happens so fast that it makes you doubt if the program has actually converted the file or not. Actually, it has not. You will be immediately asked if you want to view the HTML file, and if you decide to do so, you will find that the document is not complete. PDF2HTM is probably still busy saving the individual JPEG files and the final HTML document. What you are seeing is a temporary file with the first hundred pages or so. However, the whole conversion process will not take more than a few seconds.
This high performance speed does not correspond with the level of quality of the final HTML file. Even though PDF2HTM claims to be able to reproduce the “exact layout” of the original PDF document, most (if not all) of the non-linear elements are lost in the process. Even if the original PDF document has been created taking into account the definition of the different textual elements (columns, tables, and so on), their structure, and the correct reading order, the program tends to ignore all this useful information, creating a line-by-line HTML clone of the PDF document. So, it is mainly when using PDF files with basic textual structures (standard paragraphs) that this tool shows its true potential. Hyperlinks, cross-reference elements, etc., are faithfully represented in the HTML file, which will include a “Document Outline” section at the end which reproduces the Bookmarks panel of the original PDF file (when present).
Comments