/
KGS Optical Character Recognition Plugin

KGS Optical Character Recognition Plugin

About

Some KGS OSGi products are capable to do an OCR on given documents. This is done by this plugin.

Content

Introduction

The plugin in used by some products like KGS Migration and KGS scan server.

The bundle is designed as multi instance bundle. This means, several instances can be configured. By default a local OCS recognizer instance is configured, which can be disabled by configuration.

Since OCS is power and time consuming task, for huge document amounts it is recommended to configure additional remote instances.

This article describes how this plugin works and how to configure it.

Precondition

If you are installing on linux systems, please install tesseract first. For windows systems it is not required.

CentOS 8

sudo dnf config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_8/ sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key sudo dnf install tesseract sudo dnf install tesseract-langpack-deu

RHEL 7

sudo yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/RHEL_7/ sudo yum update sudo yum install tesseract sudo yum install tesseract-langpack-deu

CentOS 7

sudo yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/ sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key sudo yum update sudo yum install tesseract sudo yum install tesseract-langpack-deu

Debian (als Root ausführen)

How it works

The plugin have an built in recognizer. In addition it is capable to configure remote instances. How they are working is out of scope of this article. Here only the built it OCR recognizer is explained.

After calling the recognizePage methode of this plugin, it tries to get a free service. This can be a remote service as well as the local one.

After selecting a service it is called.

How the local recognizer works

The local recognizer is based on Tesseract. It maintains a pool of local instances. If not configured the pool has