KGS Optical Character Recognition Plugin
About
Some KGS OSGi products are capable to do an OCR on given documents. This is done by this plugin.
Content
Introduction
The plugin in used by some products like KGS Migration and KGS scan server.
The bundle is designed as multi instance bundle. This means, several instances can be configured. By default a local OCS recognizer instance is configured, which can be disabled by configuration.
Since OCS is power and time consuming task, for huge document amounts it is recommended to configure additional remote instances.
This article describes how this plugin works and how to configure it.
Precondition
If you are installing on linux systems, please install tesseract first. For windows systems it is not required.
CentOS 8
sudo dnf config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_8/
sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key
sudo dnf install tesseract
sudo dnf install tesseract-langpack-deu
RHEL 7
sudo yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/RHEL_7/
sudo yum update
sudo yum install tesseract
sudo yum install tesseract-langpack-deu
CentOS 7
sudo yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key
sudo yum update
sudo yum install tesseract
sudo yum install tesseract-langpack-deu
Debian (als Root ausführen)
How it works
The plugin have an built in recognizer. In addition it is capable to configure remote instances. How they are working is out of scope of this article. Here only the built it OCR recognizer is explained.
After calling the recognizePage methode of this plugin, it tries to get a free service. This can be a remote service as well as the local one.
After selecting a service it is called.
How the local recognizer works
The local recognizer is based on Tesseract. It maintains a pool of local instances. If not configured the pool has