Using Tesseract with Selenium WebDriver for checking text on images using OCR

Recently a team approached me looking for a solution to extract text from an image displayed on a web page and verify it’s contents as part of Selenium tests.

This post explains the solution using Tesseract, Tess4J along with Selenium for checking text displayed on images.

Tesseract is a famous open source OCR engine. It uses the Leptonica Image Processing Library. Tesseract support a wide variety of image formats and convert them to text in over 60 languages.

Tesseract works on Linux, Windows and Mac OSX. Please refer Readme page for installation instructions.

This sample is built on Mac. You can install Tesseract on Mac using homebrew:

brew install tesseract

In addition to Tesseract (written in C++), we need a Java wrapper called Tess4J which provides JNA wrapper for Tesseract OCR API.

Here is a sample page which has a barcode displayed as image. We will extract the barcode number and assert it’s value.

ocr_example

Since I am using Maven for this project, I added Tess4j dependency to my pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>me.unmesh</groupId>
    <artifactId>selenium-ocr-example</artifactId>
    <version>1.0-SNAPSHOT</version>
    <dependencies>
        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>2.0.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-java</artifactId>
            <version>2.46.0</version>
        </dependency>
    </dependencies>
</project>

Here’s JUnit test which navigates to the sample page and checks the number displayed on the barcode image:

package me.unmesh.selenium.ocr.example;

import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.By;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import static org.junit.Assert.*;

import net.sourceforge.tess4j.*;
import java.io.File;

/**
 * A demo test to verify text from an image using Tesseract OCR API
 *
 * @author  upgundecha
 *
 */
public class BarcodeTest {
    private WebDriver driver;

    @Before
    public void setUp() {
        driver = new FirefoxDriver();
        // navigate to the dummy page with a barcode image
        driver.get("https://dl.dropboxusercontent.com/u/55228056/barcode.html");
    }

    @After
    public void tearDown() {
        driver.quit();
    }

    @Test
    public void testBarcodeNumber() throws Exception {
        // get and capture the picture of the img element used to display the barcode image
        WebElement barcodeImage = driver.findElement(By.id("barcode"));
        File imageFile = WebElementExtender.captureElementPicture(barcodeImage);

        // get the Tesseract direct interace
        Tesseract instance = new Tesseract();

        // the doOCR method of Tesseract will retrive the text
        // from image captured by Selenium
        String result = instance.doOCR(imageFile);

        // check the the result
        assertEquals("Application number did not match", "123-45678", result.trim());
    }
}

Instead of capturing screenshot of the entire page using Selenium, I captured screenshot of the image element where the barcode is displayed on the page.

<html>
  <head>
    <title>Barcode Sample</title>
   </head>
  <body>
    <table>
      <tr>
        <td style="padding:10px; font-size:15px; font-family:Arial, Helvetica; text-align:center;">
          <p> Please write down your application id</p>
        <td>
          <img id="barcode" src="barcode.png" />
        </td>
      </tr>
  </table>
  </body>
 </html>

The captured image is then passed to doOCR() method of Tesseract instance to retrieve the text.

To capture the image of a WebElement I used captureElementPicture() method from WebElementExtender class which is described in my book Selenium Testing Tools Cookbook:

package me.unmesh.selenium.ocr.example;

import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;

import javax.imageio.ImageIO;

import org.openqa.selenium.OutputType;
import org.openqa.selenium.Point;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.internal.WrapsDriver;

/**
 * This class provides various additional helper methods on elements
 *
 * @author upgundecha
 *
 */

public class WebElementExtender {

    /**
     * Gets a picture of specific element displayed on the page
     * @param element The element
     * @return File
     * @throws Exception
     */
    public static File captureElementPicture(WebElement element)
            throws Exception {

        // get the WrapsDriver of the WebElement
        WrapsDriver wrapsDriver = (WrapsDriver) element;

        // get the entire screenshot from the driver of passed WebElement
        File screen = ((TakesScreenshot) wrapsDriver.getWrappedDriver())
                .getScreenshotAs(OutputType.FILE);

        // create an instance of buffered image from captured screenshot
        BufferedImage img = ImageIO.read(screen);

        // get the width and height of the WebElement using getSize()
        int width = element.getSize().getWidth();
        int height = element.getSize().getHeight();

        // create a rectangle using width and height
        Rectangle rect = new Rectangle(width, height);

        // get the location of WebElement in a Point.
        // this will provide X & Y co-ordinates of the WebElement
        Point p = element.getLocation();

        // create image  for element using its location and size.
        // this will give image data specific to the WebElement
        BufferedImage dest = img.getSubimage(p.getX(), p.getY(), rect.width,
                rect.height);

        // write back the image data for element in File object
        ImageIO.write(dest, "png", screen);

        // return the File object containing image data
        return screen;
    }
}

Tesseract is clean, fast and accurate for OCR testing needs. Similar approach can be followed for .NET using Emgu library