Cozmo, IFTTT and Jenkins Build Notifications

CozmoJenkins.png

Recently I bought Anki Cozmo. This little tiny robot is amazing and a great companion on your desk. Cozmo comes with a great personality. He is full of fun, he plays games with you, do tricks, animate and makes a lot of joy. Cozmo comes with a Camera that can detect faces and greet people and pets. It also has a SDK which allows you to customize Cozmo and create cool Apps and IoT connected programs.

In this post, I’ll show how to use Cozmo with my Jenkins IFTTT Build Notification plugin to send build notification to Cozmo. He will animate to these notifications and do some tricks like lighting up his cubes. This will make Cozmo an eXtreme Feedback Device

The Cozmo SDK comes with a bundle of examples and apps that you can play and modify. There is a bunch of examples that connect Cozmo to IFTTT and use different channels like Gmail or Sports news. I took the Gmail example and modified it to get Jenkins notifications. Here’s how this works on a high level:

Cozmo.png

Step 1 – Connecting Cozmo from IFTTT

In order to connect Cozmo from IFTTT, we need a computer running Cozmo exposed to the internet. You can do this either by using a static IP or using a tool like ngrok which sets up a secure tunnel to localhost running on your computer. To set up ngrok follow instructions from https://ngrok.com/download

Run this command to create a secure public URL for port 8080:

 ./ngrok http 8080 

Note the HTTP forwarding address shown in the terminal (e.g., http://4916890d.ngrok.io). This is required while creating the IFTTT applet.

ngrok.png

WARNING: Using ngrok exposes your local web server to the internet. See the ngrok   documentation for more information: https://ngrok.com/docs

Step 2 – IFTTT Jenkins Cozmo Script

Cozmo SDK is presently available in Python. The IFTTT examples use aiohttp module to create a web server with an endpoint with a handler to call the Cozmo SDK:

Complete source code is available at https://github.com/upgundecha/cozmo-python-sdk

Step 3 – Creating IFTTT Recipe

IFTTT is a web service that lets you create chains of simple conditional statements, called applets. An applet is triggered by changes that occur within other web services such as Gmail, Facebook, Instagram, or Twitter. An applet may send an e-mail message if the user tweets using a hashtag or copy a photo on Facebook to a user’s archive if someone tags a user in a photo or it can trigger an support IoT device to a specific action.

In this example, we will create custom IFTTT Trigger and Action using Maker Webhooks feature.

  1. Sign up and sign into https://ifttt.com
    1. Create an applet: https://ifttt.com/create
    2. Set up your trigger.
      1.         Click “this”.
      2.         Select “Maker Webhooks” as your service.
      3.         Under “Choose a Trigger”, select “Receive a Web request”.
      4.         In “Receive a Web Request”, enter “JenkinsBuild” as “Event Name”
      5.         Click “Create Trigger” button
    3.     Set up your action.
      1.        Click “that”.
      2.        Select “Maker Webhooks” to set it as your action channel. Connect to the Maker channel if prompted.
      3.        Click “Make a web request” and fill out the fields as follows. Remember your publicly accessible URL from above (e.g., http://55e57164.ngrok.io) and use it in the URL field, followed by “/iftttJenkins” as shown below:

URL: http: // 55e57164.ngrok.io / iftttJenkins
Method: POST
Content Type: application / json
Body: {“project”: “{{Value1}}”, “build”: “{{Value2}}”, “status”: “{{Value3}}”

Click “Create Action” then “Finish”.

Here is video loop for above settings:

ifttt_maker.gif

Step 4 – Configuring Jenkins Build Job

Setup Jenkins Job – requires IFTTT Build Notification Plugin

In Jenkins job “Post-build Action” section add a new “IFTTT Build Notifier” action with following values:

  1. Event Name: JenkinsBuild
  2. Key: <Make Webhooks Key>

Note: You can get your unique Maker Webhooks Key from https://ifttt.com/services/maker_webhooks/setting

Finally, run the Jenkins job to test the setup. In response to the ifttt web request, Cozmo should roll off the charger, raise and lower his lift, announce the status, and then animate and light-up the cubes.

Here is video loop for above settings:

jenkins.gif

Running it together

Here’s video with Cozmo’s reaction to a passed build vs. a failed build:

You can also connect CI tools like Travis or Circle CI using curl command to Maker endpoint.
We can add more actions to this web server and make Cozmo thrill.

API Testing with Postman Collections in AWS CodePipeline

This post explains how to setup an AWS CodePipeline to run Postman collections for testing REST APIs using AWS CodeCommit and AWS CodeBuild.

Step 1 – Creating a Git repository with AWS CodeCommit

AWS CodeCommit is a version control service hosted by AWS. You can create, manage Git repositories on CodeCommit.

For this project, we will create a new repository named postman-sample from AWS Console > Developer Tools > CodeCommit

Picture1.png

Clone the newly created repository on your computer

Step 2 – Exporting Postman Collection

Next, we need to export the Postman collection so we can run this using newman cli

Picture2.png

Select Collection v2 option from Export Collection dialogue box and click Export

Picture3.png

Save the collection in cloned repository folder from Step 1 above

Creating buildspec file

We need to create a buildspec file to tell AWS CodeBuild how we want to run the collection. Before running the collection we need to install newman npm package in pre_build phase and then call newman cli with the collection we want to run in the build phase. We can also specify report options to generate HTML file at the end. We will upload this file to S3 after CodeBuild executes the collection:

Picture5.png

Commit & Push the collection and buildspec file to CodeCommit in master branch

You can find the collection and buildspec file used for this example at https://github.com/upgundecha/postman-sample

Step 3 – Creating an S3 Bucket to save report

Let’s create an S3 bucket to save the report file generated by the newman. We can use this file to publish the results.

From AWS Console > Storage > S3 create a new bucket named postmanreport (you will need to use a unique name). You can enable version control on S3 bucket to see historical reports.

Picture6.png

Step 4 – Creating an AWS CodeBuild Project

We will use CodeBuild to fetch the changes from CodeCommit and run the collection using newman. We already have buildspec file which has a sequence for execution.

From AWS Console > Developer Tools > CodeBuild create a project named postman-sample.

Set Source Provider as AWS CodeCommit and Repository as postman-sample:

Picture8.png

We need an environment to run the build job. Let’s configure the Ubuntu/Node.js environment and Artefacts settings as shown in below screenshot:

Picture9.png

Next, we need to configure a Service role. Create a new service role and click Continue

Picture10.png

The CodeBuild project is ready. We can test this project by manually starting the build.

Picture11.png

Step 5 – Bringing it together with AWS CodePipeline

Finally, we need to create a new AWS CodePipleine to trigger the CodeBuild when a new change is pushed to CodeCommit.

From AWS Console > Developer Tools > CodePipeline create a pipeline named postman-sample:

Picture12.pngConfigure the Source Location as AWS CodeCommit as shown in below screenshot:

Picture13.pngConfigure the Build provider as AWS CodeBuild with the CodeBuild project created in Step 4 above as shown in below screenshot:

Picture14.pngWe just want to run tests and stop there for now. We will not deploy anything in this project so we will select No Deployment option in Deploy section as shown in below screenshot:

Picture15.pngFinally, we need to configure AWS Service role as shown in below screenshot:

Picture16.pngThis will take you to IAM to define the new role:

Picture17.pngBack in CodePipeline make sure the role is specified:

Picture18.png

Review the CodePipeline configuration and create the pipeline.

Picture19.png

You will see the Pipeline created success message. You can try invoking the newly created Pipeline by clicking the Release Change button:

Picture20

Once the Pipeline is executed you will see a both Source and Build stages in Green (unless there are any errors) as shown in below screenshot:

Picture21

After a successful run, you can go and check the S3 bucket to view the report.

Picture22.png

Newman generates a nicely formatted report as shown in below screenshot:

Picture23.png

You can also configure a Lambda function or SNS to send a notification along with the report.


Generating Test Data with Faker and friends

During my recent endeavours to build a REST API, I wanted to test the bulk upload feature of the API which required a large set of test data. I was looking for a quick way to create a fake test data set and found an interesting Ruby library called Faker. This a port of Perl’s Data::Faker library.

Faker is a popular choice in Rails community. It is also ported to Java, Python and JavaScript.

Faker provides a number of categories for test data generation. For example Names, Addresses, Pictures, Business & Finance (Credit Card Numbers, IBAN numbers, Swift Codes etc.), Text placeholders and much more.

Let’s say we want to create a list of records on the fly. We want a list of names and contact phone number. In Ruby, all we would have to do is:

First install Faker Gem with:


gem install faker

and create a simple script to generate records:


require 'faker'
require 'json'

landlords = []

100.times do
  landlord = Hash.new
  landlord["name"] = Faker::Name.name
  landlord["contact_number"] = Faker::PhoneNumber.cell_phone
  landlords.push(landlord)
end

puts JSON.pretty_generate(landlords)

This will create records similar to below output. You can copy or store the output and use it seed to database or during the testing


[
  {
    "name": "Ms. Adelia Ortiz",
    "contact_number": "1-443-107-3897"
  },
  {
    "name": "Delores Cassin",
    "contact_number": "868.775.6054"
  },
  {
    "name": "Rose Klocko",
    "contact_number": "759-090-9777"
  },
  {
    "name": "Josiah Langworth I",
    "contact_number": "1-308-497-4606"
  },
  {
    "name": "Kattie Hamill",
    "contact_number": "266-980-1233"
  },
]

We can also use Faker in automated tests. For example here’s Capybara test using a fake user data to test Sign-Up feature:


name = Faker::Name.name
email =  Faker::Internet.email

visit("/signup")

fill_in('account_name', with: name)
fill_in('account_email', with: email)
click_button('Sign Up')

page.has_content?("Dear ${name}")
page.has_content?('Verify Your Email Address')

Here’s JavaScript version of the Ruby script made with Faker.js


var faker = require('faker')

landlords = []
for(i=0; i<100; i++) {
  var user = {
    name: faker.name.findName(),
    email: faker.phone.phoneNumber(),
  };
  landlords.push(user)
}

console.log(landlords);

By default, these libraries generate values which may not be unique. However, you can set options to generate unique values. You can also extend/customise the output.

You can find Java port of the Faker called Java Faker
There is also Python port available at https://github.com/joke2k/faker

Faker and friends are must have power tool for Developers and testers.


Adding Swagger UI support to Spring Boot REST API Project

Recently I was working on a project to build REST API using Spring Boot framework. I was looking for a way to document the service and all the operations supported by the service in an easy way. I used Swagger to document the API with really simple configurations.

Swagger does an awesome job to document your APIs directly from the source code.

The documentation and Swagger’s interactive Web UI helps users to explore the API, try supported operations with different input values and look at the output from the operation or any validations that are configured for the operation. My tester colleagues really liked this API playground.

Let’s explore steps to configure Swagger UI on an existing project.

In order to setup Swagger UI with the project, we will use Springfox library. The Springfox library enables Swagger by scanning the application, at runtime to infer API semantics based on Spring configurations, class structure and various compile time java Annotations. To get started with Springfox, we need to add following dependencies to POM (I’m using Maven for this project):

<dependency>
  <groupId>io.springfox</groupId>
  <artifactId>springfox-swagger2</artifactId>
  <version>2.6.1</version>
  <scope>compile</scope>
</dependency>
<dependency>
  <groupId>io.springfox</groupId>
  <artifactId>springfox-swagger-ui</artifactId>
  <version>2.6.1</version>
  <scope>compile</scope>
</dependency>

Next, we need to configure Docket bean in the Application class which will do the rest of the magic:

@Bean
public Docket simpleDiffServiceApi() {
  return new Docket(DocumentationType.SWAGGER_2)
  .groupName("calculator")
  .apiInfo(apiInfo())
  .select()
  .apis(RequestHandlerSelectors.any())
  .paths(PathSelectors.any())
  .build()
  .pathMapping("/");

}

This will scan the service (in this example the Calculator service request handlers). We also need to pass the API info object which is used to configure API details like Tile, Description, Version, Developer Information etc. See the Springfox docs for more configuration options.

private ApiInfo apiInfo() {
  return new ApiInfoBuilder()
  .title("A simple calculator service")
  .description("A simple calculator REST service made with Spring Boot in Java")
  .contact(new Contact("Unmesh Gundecha", "http://unmesh.me", "upgundecha@gmail.com"))
  .version("1.0")
  .build();
}

Here is the Controller class for the Calculator Service with request handler for the add operation:

@RestController
@RequestMapping("api/v1/")
public class CalculatorController {

  @Autowired
  private Calculator calculatorService;

  @RequestMapping(value = "calculator/add", method = RequestMethod.POST)
  public Result add(@RequestBody Values values) {
  int result = calculatorService.add(values.getFirstNumber(),
  values.getSecondNumber());
  return new Result(result);
  }
}

That’s it. Let’s run this service using

mvn spring-boot:run

This will launch the application. Navigate to http://localhost:8080/swagger-ui.html in a Browser window. This will show the Swagger UI with the service details. In this example it will show the Calculator service with add operation as shown in below screenshot:

swagger1

Let’s open the /api/v1/calculator/add operation. This will show details about this operation as shown in below screenshot:

swagger2

Users can try the add operation or any other operation provided by the service by submitting parameters and using the Try it out! button on the form. Here is add operation in action:

calc_service

Operations can also be further explained using various Swagger Annotations like @ApiOperation, @ApiResponse etc.

You can also define a new API from scratch in Swagger format and generate source code in over dozen different frameworks supported by the Swagger. You can play with Swagger Editor and Swagger Codegen for building new APIs at Swagger Editor


Automated Acceptance testing for Mainframe with Cucumber and Jagacy

 

fullsizerender-1I often get questions around Mainframe Automation using Selenium. However Selenium automates browsers, that’s it! Selenium does not automate Mainframe Green screens, and it’s completely different technology.

Automating Mainframe Green screen is primarily needed to test front to back scenarios in complex transaction processing systems with Web and Mobile integration.

There are tools available that can be used to automate Mainframe Green screen interaction. In this post, we will see how to use Jagacy3270 from Jagacy Software along with Cucumber for writing Automated Acceptance Tests on Mainframe Green Screens also known as CICS interface.

About Jagacy3270

Jagacy3270 is a 3270 screen-scraping library written entirely in Java. As described on Jagacy product website it supports SSL, TN3270E, Internationalization, and over thirty languages. This library can be used to create highly reliable and faster screen-scraping applications. However, this tool comes with a cost and details are available on the website.

We can use this same screen-scraping capability to create automated tests on Mainframe Green screens.

Jagacy provides Session3270 class through which we can connect to a Mainframe host and read or write to screens to perform actions.

Writing Automated Acceptance Tests for Green Screens

We can use Jagacy along with testing frameworks like Cucumber in Agile Software development to automate acceptance criteria for Mainframe user stories involving Green screen interaction. These tests can be integrated into CI/CD pipeline and run in headless mode (Jagacy provides an Emulator screen mode or headless mode).

In this post, we will use a Mainframe host used in Jagacy examples to create a simple Automated Acceptance test for a Phonebook application. Here is our example user story

As I university mainframe user
I should be able to search faculty members in Phonebook
So I can contact them for help

Here is Feature file with one of the acceptance criteria or scenario that searches for Faculty phone number on Phonebook application:

Feature: Phonebook

  As I university mainframe user
  I should be able to search faculty members in Phonebook
  So I can contact them for help

  Scenario: Search faculty phone number using name
    Given I start a new emulator session
    When I open phonbook application
    And search for faculty name "NAME"
    Then I should see the results matching with my search criteria
      |NAME1              111-111-1111  LIBRARIES                    5000|
      |NAME2             UNAVAILABLE  MARY KAY O'CONNOR PROCESS    3122|

To automate Screen entry and retrieve values, we need to use screen coordinates by using Row and Column numbers (Typically 3270 sessions are 24×80 rows and columns long).

We can use PageObject Pattern to abstract Mainframe Green screens and provide screen actions and state to tests. For example, here is HomeScreen object which provides feature on the Home Screen:

package com.example.screens;

import com.example.Fields.EntryField;
import com.example.session.Session;
import com.example.Fields.LabelField;
import com.jagacy.Key;
import com.jagacy.util.JagacyException;

/**
 * Created by upgundecha on 14/10/16.
 */
public class HomeScreen {

    private Session session;
    private String screenCrc = "0xb0c10358";

    // Screen fields
    private LabelField waitForLabel =
            new LabelField(17, 6, "TEXAS A & M UNIVERSITY");
    private EntryField applicationEntryField = new EntryField(23, 1);

    public HomeScreen(final Session s) throws JagacyException {
        this.session = s;
        if (!session.waitForTextLabel(waitForLabel)) {
            throw new IllegalStateException("Not Home screen!");
        }

        if (session.getCrc32() != Long.decode(screenCrc)) {
            throw new IllegalStateException("Home Screen has been changed!");
        }
    }

    /**
     * Open Phonbook Menu screen.
     * @return Phonbook Menu Screen
     * @throws JagacyException JagacyException
     */
    public final PhonbookMenuScreen openPhonbook() throws JagacyException {
        session.setEntryFieldValue(applicationEntryField, "PHONBOOK");
        session.writeKey(Key.ENTER);
        session.waitForChange(10000);
        return new PhonbookMenuScreen(session);
    }
}

We can check if a correct page is displayed in the emulator by using the arbitrary text displayed on the screen. We can also use CRC of the screen to make sure it is not updated otherwise tests might fail as fields or text values on screen are not found at specified locations.

Finally, we will call the PageObjects in step definitions to perform the scenario and validate the output:

package com.example.test;

import com.example.screens.HomeScreen;
import com.example.screens.PhonbookMenuScreen;
import com.example.screens.PhonbookSearchScreen;
import com.example.session.Session;
import com.jagacy.util.JagacyException;
import cucumber.api.Scenario;
import cucumber.api.java.After;
import cucumber.api.java.Before;
import cucumber.api.java.en.Given;
import cucumber.api.java.en.Then;
import cucumber.api.java.en.When;

import static org.junit.Assert.*;

import java.util.List;


/**
 * Created by upgundecha on 14/10/16.
 */
public class Stepdefs {

    private Session session;
    private HomeScreen homeScreen;
    private PhonbookMenuScreen phonbookMenuScreen;
    private PhonbookSearchScreen phonbookSearchScreen;
    private Scenario scenario;

    @Before
    public void setUp(Scenario scenario){
        this.scenario = scenario;
    }

    @Given("^I start a new emulator session$")
    public void i_start_a_new_emulator_session() throws Throwable {

        session = new Session("test");
        session.open();

    }

    @When("^I open phonbook application$")
    public void i_open_phonbook_application() throws Throwable {

        homeScreen = new HomeScreen(session);
        scenario.embed(session.getScreenshot(), "image/png");
        phonbookMenuScreen = homeScreen.openPhonbook();

    }

    @When("^search for faculty name \"([^\"]*)\"$")
    public void search_for_faculty_name(String q) throws Throwable {

        scenario.embed(session.getScreenshot(), "image/png");
        phonbookSearchScreen = phonbookMenuScreen.openFacultyStaffListing();
        phonbookSearchScreen.searchByFirstOrMiddleName(q);

    }

    @Then("^I should see the results matching with my search criteria$")
    public void i_should_see_the_results_matching_with_my_search_criteria(List<String> records) throws Throwable {

        scenario.embed(session.getScreenshot(), "image/png");
        assertEquals(records, phonbookSearchScreen.getResults());

    }

    @After
    public void tearDown() throws JagacyException {
        session.close();
    }
}

We can also get the screenshot (not available as part of Jagacy API) embedded in the reports.

And if you already have a well established Selenium framework (on JVM) and want to automate Mainframe Green screens, you can use above approach with Cucumber or any other xUnit testing framework that you might be using.

The complete source code for this post is available here. Please drop me a message for access with your Github username. I have built this simple framework on top of Jagacy API. If you need any further information or have suggestions, please do reach out to me.


Setting up minimal Selenium Grid with Docker

Here’s simple guide to setup a minimal Selenium Grid with Docker. For running Docker on your machine you will need Docker toolbox installed from https://www.docker.com/products/docker-toolbox. Below steps are done on a Mac.

We will use Hub and Node images from Selenium project hosted at Docker Hub https://hub.docker.com/r/selenium/

Next we need to create a docker-compose file describing how we want to run the Selenium Grid Hub and connect nodes to the Hub. In this example we will launch a multi-container setup with a Hub connected to Firefox and Chrome nodes:

If you don’t have Docker running, then start the Docker daemon with default machine by using following command:

docker-machine start default

To connect to the Docker shell run following command:

docker-machine env

and then:

eval $(docker-machine env)

This will connect the terminal session to the Docker shell

Finally run the docker-compose command from the directory where docker-compose.yml file is stored:

docker-compose up

This will get required images from the Docker hub and launch the Hub node followed by Firefox and Chrome nodes which will be registered to the Hub. Now we have a minimal Selenium Grid up and running. We can point Selenium tests to this Grid for execution. In the next post we’ll see some advanced options and integration with Maven and Cloud tools.


Using Tesseract with Selenium WebDriver for checking text on images using OCR

Recently a team approached me looking for a solution to extract text from an image displayed on a web page and verify it’s contents as part of Selenium tests.

This post explains the solution using Tesseract, Tess4J along with Selenium for checking text displayed on images.

Tesseract is a famous open source OCR engine. It uses the Leptonica Image Processing Library. Tesseract support a wide variety of image formats and convert them to text in over 60 languages.

Tesseract works on Linux, Windows and Mac OSX. Please refer Readme page for installation instructions.

This sample is built on Mac. You can install Tesseract on Mac using homebrew:

brew install tesseract

In addition to Tesseract (written in C++), we need a Java wrapper called Tess4J which provides JNA wrapper for Tesseract OCR API.

Here is a sample page which has a barcode displayed as image. We will extract the barcode number and assert it’s value.

ocr_example

Since I am using Maven for this project, I added Tess4j dependency to my pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>me.unmesh</groupId>
    <artifactId>selenium-ocr-example</artifactId>
    <version>1.0-SNAPSHOT</version>
    <dependencies>
        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>2.0.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-java</artifactId>
            <version>2.46.0</version>
        </dependency>
    </dependencies>
</project>

Here’s JUnit test which navigates to the sample page and checks the number displayed on the barcode image:

package me.unmesh.selenium.ocr.example;

import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.By;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import static org.junit.Assert.*;

import net.sourceforge.tess4j.*;
import java.io.File;

/**
 * A demo test to verify text from an image using Tesseract OCR API
 *
 * @author  upgundecha
 *
 */
public class BarcodeTest {
    private WebDriver driver;

    @Before
    public void setUp() {
        driver = new FirefoxDriver();
        // navigate to the dummy page with a barcode image
        driver.get("https://dl.dropboxusercontent.com/u/55228056/barcode.html");
    }

    @After
    public void tearDown() {
        driver.quit();
    }

    @Test
    public void testBarcodeNumber() throws Exception {
        // get and capture the picture of the img element used to display the barcode image
        WebElement barcodeImage = driver.findElement(By.id("barcode"));
        File imageFile = WebElementExtender.captureElementPicture(barcodeImage);

        // get the Tesseract direct interace
        Tesseract instance = new Tesseract();

        // the doOCR method of Tesseract will retrive the text
        // from image captured by Selenium
        String result = instance.doOCR(imageFile);

        // check the the result
        assertEquals("Application number did not match", "123-45678", result.trim());
    }
}

Instead of capturing screenshot of the entire page using Selenium, I captured screenshot of the image element where the barcode is displayed on the page.

<html>
  <head>
    <title>Barcode Sample</title>
   </head>
  <body>
    <table>
      <tr>
        <td style="padding:10px; font-size:15px; font-family:Arial, Helvetica; text-align:center;">
          <p> Please write down your application id</p>
        <td>
          <img id="barcode" src="barcode.png" />
        </td>
      </tr>
  </table>
  </body>
 </html>

The captured image is then passed to doOCR() method of Tesseract instance to retrieve the text.

To capture the image of a WebElement I used captureElementPicture() method from WebElementExtender class which is described in my book Selenium Testing Tools Cookbook:

package me.unmesh.selenium.ocr.example;

import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;

import javax.imageio.ImageIO;

import org.openqa.selenium.OutputType;
import org.openqa.selenium.Point;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.internal.WrapsDriver;

/**
 * This class provides various additional helper methods on elements
 *
 * @author upgundecha
 *
 */

public class WebElementExtender {

    /**
     * Gets a picture of specific element displayed on the page
     * @param element The element
     * @return File
     * @throws Exception
     */
    public static File captureElementPicture(WebElement element)
            throws Exception {

        // get the WrapsDriver of the WebElement
        WrapsDriver wrapsDriver = (WrapsDriver) element;

        // get the entire screenshot from the driver of passed WebElement
        File screen = ((TakesScreenshot) wrapsDriver.getWrappedDriver())
                .getScreenshotAs(OutputType.FILE);

        // create an instance of buffered image from captured screenshot
        BufferedImage img = ImageIO.read(screen);

        // get the width and height of the WebElement using getSize()
        int width = element.getSize().getWidth();
        int height = element.getSize().getHeight();

        // create a rectangle using width and height
        Rectangle rect = new Rectangle(width, height);

        // get the location of WebElement in a Point.
        // this will provide X & Y co-ordinates of the WebElement
        Point p = element.getLocation();

        // create image  for element using its location and size.
        // this will give image data specific to the WebElement
        BufferedImage dest = img.getSubimage(p.getX(), p.getY(), rect.width,
                rect.height);

        // write back the image data for element in File object
        ImageIO.write(dest, "png", screen);

        // return the File object containing image data
        return screen;
    }
}

Tesseract is clean, fast and accurate for OCR testing needs. Similar approach can be followed for .NET using Emgu library