14 posts tagged with "OCR"

Quality of text recognition in OCR webservice improved | webPDF

August 3, 2020

Minimum technical requirements

Java version: 11
webPDF version: 8 (revision 2159)

text recognition

Also existing functions, like the recognition of text (OCR) in PDF documents or graphics will be improved with the latest update of webPDF (Revision No. 2159). We now offer the possibility to prepare (optimize) your document before recognition in order to optimize the result.

webPDF 8: Erstes Update mit neuen Funktionen

February 19, 2020

Update

Nach dem Release von webPDF 8.0 im November letzten Jahres, steht jetzt das erste Update zur Verfügung. Mit der Revisionsnummer 2058 steht ab sofort im Download die neue Version zur Verfügung, die neben Bugfixes auch ein einige neue Funktionen bringt:

How-to: Using the OCR webservice of webPDF 7

September 11, 2018

Minimum technical requirements

Java version: 7
webPDF version: 7
wsclient version: 1

Light bulb image: guide and tutorial

This example explains how to use the OCR webservice of webPDF. OCR in webPDF is based on Tesseract. By default, German, English, French, Spanish, and Italian are supported. Additional languages can be installed in the Tesseract folder (see the webPDF manual for details).

Languages using a multibyte character set are currently not supported, for example Arabic and several Far Eastern languages. OCR is mainly useful for documents that contain text visually, but not as embedded searchable text. For extracting already embedded text from PDF documents, webPDF provides an option in the Toolbox webservice.

Minimum technical requirements​

Minimum technical requirements