Condense 1 5 – Optical Character Recognition (ocr) Application

Typing out has come to a end. Say ‚Hello‘ to Condense. Your easy to use optical character recognition (OCR) application. Imagine one of the following scenarios: You’re writing an essay for University and you’ve to quote from a PDF, an EBook or one of your professor’s presentations? That’s a huge amount of text to extract.

Introduction

In this article, we will create an optical character recognition (OCR) application using Angular and the Azure Computer Vision Cognitive Service.

Jan 23, 2020 OCR or Optical Character Recognition is a sophisticated software technique that allows a computer to extract text from images. In the early days OCR software was pretty rough and unreliable. Now, with the tons of computing power on tap, it’s often the fastest way to convert text in an image into something you can edit with a word processor. Condense 1.6.1 released! Lot’s of new features, ready for Yosemite! Typing out has come to an end. Say ‚Hello‘ to Condense. Your easy to use optical character recognition (OCR) application. Imagine one of the following scenarios. This is an OCR (Optical Character Recognition) software based on a proprietary OCR engine. This application works locally on your device. It does not send your documents to a server. So it can work without a network connection and also you don't have to worry about your privacy. This application can recognize English or Chinese text in images captured by your device's camera or from image.

Computer Vision is an AI service that analyzes content in images. We will use the OCR feature of Computer Vision to detect the printed text in an image. The application will extract the text from the image and detects the language of the text.

Currently, the OCR API supports 25 languages.

Prerequisites

  • Install the latest LTS version of Node.JS from https://nodejs.org/en/download/
  • Install the Angular CLI from https://cli.angular.io/
  • Install the .NET Core 3.1 SDK from https://dotnet.microsoft.com/download/dotnet-core/3.1
  • Install the latest version of Visual Studio 2019 from https://visualstudio.microsoft.com/downloads/
  • An Azure subscription account. You can create a free Azure account at https://azure.microsoft.com/en-in/free/

Source Code

You can get the source code from GitHub.

We will use an ASP.NET Core backend for this application. The ASP.NET Core backend provides a straight forward authentication process to access Azure cognitive services. This will also ensure that the end-user won’t have direct access to cognitive services.

Create the Azure Computer Vision Cognitive Service resource

Log in to the Azure portal and search for the cognitive services in the search bar and click on the result. Refer to the image shown below.

On the next screen, click on the Add button. It will open the cognitive services marketplace page. Search for the Computer Vision in the search bar and click on the search result. It will open the Computer Vision API page. Click on the Create button to create a new Computer Vision resource. Refer to the image shown below.

On the Create page, fill in the details as indicated below.

  • Name: Give a unique name for your resource.
  • Subscription: Select the subscription type from the dropdown.
  • Pricing tier: Select the pricing tier as per your choice.
  • Resource group: Select an existing resource group or create a new one.

Click on the Create button. Refer to the image shown below.

After your resource is successfully deployed, click on the “Go to resource” button. You can see the Key and the endpoint for the newly created Computer Vision resource. Refer to the image shown below.

Make a note of the key and the endpoint. We will be using these in the latter part of this article to invoke the Computer Vision OCR API from the .NET Code. The values are masked here for privacy.

Creating the ASP.NET Core application

Open Visual Studio 2019 and click on “Create a new Project”. A “Create a new Project” dialog will open. Select “ASP.NET Core Web Application” and click on Next. Now you will be at “Configure your new project” screen, provide the name for your application as ngComputerVision and click on create. Refer to the image shown below.

You will be navigated to “Create a new ASP.NET Core web application” screen. Select “.NET Core” and “ASP.NET Core 3.1” from the dropdowns on the top. Then, select the “Angular” project template and click on Create. Refer to the image shown below.

This will create our project. The folder structure of the application is shown below.

The ClientApp folder contains the Angular code for our application. The Controllers folders will contain our API controllers. The angular components are present inside the ClientAppsrcapp folder.

The default template contains a few Angular components. These components won’t affect our application, but for the sake of simplicity, we will delete fetchdata and counter folders from ClientApp/app/components folder. Also, remove the reference for these two components from the app.module.ts file.

Installing Computer Vision API library

We will install the Azure Computer Vision API library which will provide us with the models out of the box to handle the Computer Vision REST API response. To install the package, navigate to Tools >> NuGet Package Manager >> Package Manager Console. It will open the Package Manager Console. Run the command as shown below.

You can learn more about this package at the NuGet gallery.

Create the Models

Right-click on the ngComputerVision project and select Add >> New Folder. Name the folder as Models. Again, right-click on the Models folder and select Add >> Class to add a new class file. Put the name of your class as LanguageDetails.cs and click Add.

Open LanguageDetails.cs and put the following code inside it.

Similarly, add a new class file AvailableLanguage.cs and put the following code inside it.

We will also add two classes as DTO (Data Transfer Object) for sending data back to the client.

Create a new folder and name it DTOModels. Add the new class file AvailableLanguageDTO.cs in the DTOModels folder and put the following code inside it.

Add the OcrResultDTO.cs file and put the following code inside it.

Adding the OCR Controller

Condense 1 5 – Optical Character Recognition (ocr) Application

We will add a new controller to our application. Right-click on the Controllers folder and select Add >> New Item. An “Add New Item” dialog box will open. Select “Visual C#” from the left panel, then select “API Controller Class” from templates panel and put the name as OCRController.cs. Click on Add.

Refer to the image below.

The OCRController will handle the image recognition requests from the client app. This controller will also return the list of all the languages supported by OCR API.

Open the OCRController.cs file and put the following code inside it.

In the constructor of the class, we have initialized the key and the endpoint URL for the OCR API.

The Post method will receive the image data as a file collection in the request body and return an object of type OcrResultDTO. We will convert the image data to a byte array and invoke the ReadTextFromStream method. We will deserialize the response into an object of type OcrResult. We will then form the sentence by iterating over the OcrWord object.

Inside the ReadTextFromStream method, we will create a new HttpRequestMessage. This HTTP request is a Post request. We will pass the subscription key in the header of the request. The OCR API will return a JSON object having each word from the image as well as the detected language of the text.

The GetAvailableLanguages method will return the list of all the language supported by the Translate Text API. We will set the request URI and create a HttpRequestMessage which will be a Get request. This request URI will return a JSON object which will be deserialized to an object of type AvailableLanguage.

Why do we need to fetch the list of supported languages?

The OCR API returns the language code (e.g. en for English, de for German, etc.) of the detected language. But we cannot display the language code on the UI as it is not user-friendly. Therefore, we need a dictionary to look up the language name corresponding to the language code.

The Azure Computer Vision OCR API supports 25 languages. To know all the languages supported by OCR API see the list of supported languages. These languages are a subset of the languages supported by the Azure Translate Text API.

Since there is no dedicated API endpoint to fetch the list of languages supported by OCR API, we are using the Translate Text API endpoint to fetch the list of languages. We will create the language lookup dictionary using the JSON response from this API call and filter the result based on the language code returned by the OCR API.

Working on the Client side of the application

The code for the client-side is available in the ClientApp folder. We will use Angular CLI to work with the client code.

Using Angular CLI is not mandatory. I am using Angular CLI here as it is user-friendly and easy to use. If you don’t want to use CLI then you can create the files for components and services manually.

Navigate to the ngComputerVisionClientApp folder in your machine and open a command window. We will execute all our Angular CLI commands in this window.

Create the client-side models

Create a folder called models inside the ClientAppsrcapp folder. Now we will create a file availablelanguage.ts in the models folder. Put the following code in it.

Similarly, create another file inside the models folder called ocrresult.ts. Put the following code in it.

You can observe that both these classes have the same definition as the DTO classes we created on the server-side. This will allow us to bind the data returned from the server directly to our models.

Create the Computervision Service

Condense 1 5 – Optical Character Recognition (ocr) Application Status

We will create an Angular service which will invoke the Web API endpoints, convert the Web API response to JSON and pass it to our component. Run the following command.

This command will create a folder name as services and then create the following two files inside it.

  • computervision.service.ts — the service class file.
  • computervision.service.spec.ts — the unit test file for service.

Open computervision.service.ts file and put the following code inside it.

We have defined a variable baseURL which will hold the endpoint URL of our API. We will initialize the baseURL in the constructor and set it to the endpoint of the OCRController.

The getAvailableLanguage method will send a Get request to the GetAvailableLanguages method of the OCRController to fetch the list of supported languages for OCR.

The getTextFromImage method will send a Post request to the OCRController and supply the parameter of type FormData. It will fetch the detected text from the image and language code of the text.

Create the Ocr component

Run the following command in the command prompt to create the OcrComponent.

The --module flag will ensure that this component will get registered at app.module.ts.

Open ocr.component.html and put the following code in it.

Optical Character Recognition Ocr Software

We have defined a text area to display the detected text and a text box for displaying the detected language. We have defined a file upload control which will allow us to upload an image. After uploading the image, the preview of the image will be displayed using an <img> element.

Open ocr.component.ts and put the following code in it.

We will inject the ComputervisionService in the constructor of the OcrComponent and set a message and the value for the max image size allowed inside the constructor.

We will invoke the getAvailableLanguage method of our service in the ngOnInit and store the result in an array of type AvailableLanguage.

The uploadImage method will be invoked upon uploading an image. We will check if the uploaded file is a valid image and within the allowed size limit. We will process the image data using a FileReader object. The readAsDataURL method will read the contents of the uploaded file.

Upon successful completion of the read operation, the reader.onload event will be triggered. The value of imagePreview will be set to the result returned by the fileReader object, which is of type ArrayBuffer.

Inside the GetText method, we will append the image file to a variable for type FormData. We will invoke the getTextFromImage of the service and bind the result to an object of type OcrResult. We will search for the language name from the array availableLanguage, based on the language code returned from the service. If the language code is not found, we will set the language as unknown.

We will add the styling for the text area in ocr.component.css as shown below.

Adding the links in Nav Menu

We will add the navigation links for our components in the nav menu. Open nav-menu.component.html and remove the links for Counter and Fetch data components. Add the following lines in the list of navigation links.

Execution Demo

Press F5 to launch the application. Click on the Computer Vision button on the nav menu at the top. You can upload an image and extract the text from the image as shown in the image below.

Summary

We have created an optical character recognition (OCR) application using Angular and the Computer Vision Azure Cognitive Service. The application is able to extract the printed text from the uploaded image and recognizes the language of the text. The OCR API of the Computer Vision is used which can recognize text in 25 languages.

I just released a free eBook on Angular and Firebase. You can download the free book from Build a Full-Stack Web Application Using Angular & Firebase

See Also

If you like the article, share with you friends. You can also connect with me on Twitter and LinkedIn.

In version 12.60, the Optical Character Recognition feature was upgraded to a new plugin powered by Google Cloud Vision API. To learn more, see Optical Character Recognition.
The deprecated OCR plugin was removed from TestComplete in version 12.60. If you need to use the deprecated OCR plugin with this version of TestComplete, please contact our Customer Care team. The deprecated OCR plugin was restored in TestComplete version 14.0. To use the plugin with this or later TestComplete version, you need to install and enable the plugin manually.

The TestComplete OCR engine uses specific algorithms to “read” text from an onscreen region character by character. These algorithms depend on several factors: font, text and background colors, text size and others. All of these factors make optical recognition prone to errors. TestComplete includes the Text Recognition plugin that uses other principles for text recognition. The plugin intercepts calls to certain Windows API functions that output text on screen and tries to create special test objects for this text. This plugin works faster than the OCR engine and, if used, provides 100% recognition accuracy.

This topic describes how you can use the Text Recognition plugin to perform the actions which the OCR engine performs and explains when you use this or that technology.

Typical OCR Tasks

Typically, testers use the TestComplete OCR subsystem when they cannot obtain text from an object for some reason or when they needed to find a test object that contains text. Let’s consider these typical tasks in detail:

  • Finding an object by text

    Suppose you are testing an application that uses a third-party menu system which is not supported by TestComplete. Since the menu is not supported, TestComplete will not provide methods and objects for simulating user actions over the menu.

    To work around the problem, you can simulate selecting the menu item by calling the Click method of the tested form and passing the coordinates of desired items to this method as parameters. However, it is difficult to create and maintain such tests. Also, hard-coded coordinates will make the test less stable: if the position of the desired menu item changes, the test will fail.

    The OCR subsystem offers a better solution. It helps you find the rectangular area of screen that contains the desired text. After you get the area, you can pass the desired coordinates to the Click method and simulate the desired action.

    The Text Recognition plugin also helps you solve this problem. It recognizes menu items by their text and creates the TextObject object for each item. You can easily simulate selecting an item by calling the Click method of the TextObject object.

  • Getting text of an object

    Let’s consider another example: a tested form contains a label and your test should simulate a click over it. This task is not a problem if you are testing an Open Application. However, it may become a problem if you have to test a black-box application. It is quite possible that the tested application does not create a special window to display the label’s text, but draws the text on the surface of the tested form. The TestComplete object hierarchy will not contain an object that corresponds to this label, so it may be difficult to determine the label’s text and compare it with some baseline value.

    Using the OCR engine you can recognize this text and perform the desired comparison. The Text Recognition plugin also helps you solve this problem. See below for more information.

Finding an Object by Text

To find the coordinates of text on screen, you can use the FindRectByText method of an OCRObject object. This method recognizes the whole text within some visual region and then searches for the rectangle that surrounds the desired text.

As an alternative to the FindRectByText method, you can use the Text Recognition plugin:

  • Enable the Text Recognition support for your tested form or controls in TestComplete project properties. For more information about this, see the description of the Text Recognition subsystem.

    After this, TestComplete will use the Text Recognition engine to recognize text drawn in your tested application and will create TextObject objects for them.

  • If you know the text the desired control contains, then after you enable the Text Recognition support, you can use code like TextObject('My Text') to obtain the object holding the desired text. For instance:

    JavaScript, JScript

    var myWnd, textObj;
    // Get the tested window
    myWnd = Sys.Process('MyProcess').Window('MyWindowClass', 'My Window Caption', 1);
    // Get the TextObject object
    textObj = myWnd.TextObject('My Desired Text');

    Python

    VBScript

    ' Get the tested window
    Set myWnd = Sys.Process('MyProcess').Window('MyWindowClass', 'My Window Caption', 1)
    ' Get the TextObject object
    Set textObj = myWnd.TextObject('My Desired Text')

    DelphiScript

    var
    myWnd, textObj : OleVariant;
    begin
    // Get the tested window
    myWnd := Sys.Process('MyProcess').Window('MyWindowClass', 'My Window Caption', 1);
    // Get the TextObject object
    textObj := myWnd.TextObject('My Desired Text');
    end;

    C++Script, C#Script

    var myWnd, textObj;
    // Get the tested window
    myWnd = Sys['Process']('MyProcess')['Window']('MyWindowClass', 'My Window Caption', 1);
    // Get the TextObject object
    textObj = myWnd['TextObject']('My Desired Text');

    If you do not know the exact text, then you can enumerate child objects of the tested window and compare their names and text with the desired values.

    To enumerate child objects, you use the Child method and ChildCount property of the form. The names of the objects created by the Text Recognition plugin look like TextObject('desired text'). To obtain the text, use the Text property of TextObject objects.

    If the form contains several TextObject objects having similar text, then to find the needed object, you can check its Index property or check the properties that return the object's coordinates: Top, Left, ScreenTop and ScreenLeft.

    To find a child object by its property values, you can also use the Find and FindChild methods of the tested form.

After you obtain the text object, you can perform the needed actions with it. For instance, you can use the Click and ClickR methods to simulate mouse clicks over the object.

Getting Text of an Object

Another common task for the OCR engine is to obtain text of some control. To do this, you can use the OCR.CreateObject(...).GetText method. Alternatively, you can use Text Recognition services:

  • First, enable Text Recognition services in your project’s properties. For more information on this, see the description of the Text Recognition subsystem.

  • Your further actions depend on the application under test:

    • If you know the screen coordinates of some point within the text, you can pass them to the Sys.ObjectFromPoint or Sys.Desktop.ObjectFromPoint method and the method will return the TextObject object that provides a scripting interface to the desired text.

      You can find the coordinates in the Object Browser panel before running tests.

    • If the text coordinates may change from one test run to another, then you need to enumerate through child objects of the window that contains the desired text and check the properties of the child objects. For instance, you can use the Index property or the properties that return the coordinates of the text object: Top, Left, ScreenTop and ScreenLeft.

      To find a child object by its property values, you can also use the Find or FindChild method of the tested form.

When to Use OCR and When to Use Text Recognition

Since the Text Recognition plugin uses other principles to determine object text, it works faster than the OCR engine and, if used, provides 100% recognition accuracy. We recommend that you always use Text Recognition services to work with text controls. If Text Recognition does not help, then use OCR.

Note however that there are situations when the Text Recognition plugin will not be helpful:

Optical Character Recognition Software

  • A tested window displays an image that includes some text. Since this text is part of the image, the Text Recognition plugin will not be able to create a TextObject object for it. Such images are typically used by web pages. If you need to obtain text shown in an image, you have to use the OCR engine.

  • The way the tested application draws text is not typical for Windows applications. For instance, if the application draws text pixel by pixel rather than calls the DrawText function of Windows API, the Text Recognition plugin will not be able to determine the drawn text and will not create a TextObject object for it. To recognize such text, use the OCR engine.

See Also

Optical Character Recognition Tools

Using Optical Character Recognition - Overview
Using Text Recognition Technology