More Selenium4 Goodies

After my previous post on Selenium 4 Relative Locators, I further explored Selenium4 features and found a few more goodies in WebElement and WebDriver interfaces.

Element Screenshots

Yes, now we can capture screenshot of an individual element or group of elements. This is a very useful feature. I talked about capturing element screenshots in my Selenium Testing Tools Cookbook. However, the new feature added in Selenium 4 (alpha-3) is inbuilt and much simpler.

The WebElement interface now supports getScreenShotAs() method by implementing the TakesScreenshot to capture a screenshot of the element.

This method accepts the OutputType argument and screenshots can be captured as FILE, BYTES or BASE64 string.

Let’s try to capture screenshot of a link and the search box displayed on Google Search Home page:


// find the Images link on Google Search home page
WebElement imagesLink = driver.findElement(By.linkText("Images"));

// take a screenshot of the link element
File linkScr = imagesLink.getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(linkScr, new File("./target/linkScr.png"));

We can also capture a group of elements by taking a screenshot of the parent element. Here is a complete example capturing the Images link and the search box:

carbon (4).png

The new getRect() method

The new getRect() method is introduced in WebDriver interface which is essentially a combination of previous getSize() and getLocation() methods. Here’s a difference between previous methods and new the getRect() method which returns a Rectangle object:

carbon (5).png

New additions in WebDriver

In addition, to maximize() method, the browser window can now be made fullscreen by using the new fullscreen() method:


A new parentFrame() method is added for navigating between frames.


I’m not really sure if this is completely new feature (or maybe I’m too lazy to go through the changes) but we can now create a new empty tab or new browser window by using the newWindow() method.


That’s it for now. I’ll deep dive into new Selenium Grid features in an upcoming post.

Closing note

These features are in alpha release and subject to change in future. Please use with caution. You can find the complete code example from this post in my GitHub repo

Gherkin Dialects

One of the core principles of Behaviour Driven Development (BDD) is having meaningful conversations to describe the behaviour of the software with concrete examples.

…having conversations are more important than capturing conversations is more important than automating conversations. – Liz Keogh

BDD practitioners have a choice to select a language that is commonly used and understood by the team to describe the behaviour of the software.

Gherkin, the ubiquitous language used by BDD practitioners to describe the behaviour of software, has been translated to over 70 languages.

In order to allow Gherkin to be written in a number of languages, the keywords such as Feature, BackgroundScenario, Scenario OutlineGiven, When, Then, And, But, Examples have been translated into multiple languages.  Find more about the languages supported by Gherkin and keywords translated in these languages at

Some of these keywords have more than one translation to improve readability and flow.

Let’s take a personal loan calculator application. The feature and scenario for this software are described in the English language as below:


Now, let’s translate the behaviour of a personal loan calculator in Hindi (one of my native languages)


Automating conversations with Cucumber

Cucumber framework, the widely used tool by BDD practitioners supports Gherkin dialects.

# language: header on the first line of a feature file tells Cucumber what spoken language to use.

To automate this feature and scenario, Cucumber will generate step definitions in the selected language. In this example, it uses Hindi as shown in below code:


Both Cucumber and Java support internationalization and this example is automated with Selenium WebDriver, navigating to the Hindi version of the personal loan calculator and checking the behaviour of the software.

Along with Java, the Gherkin dialect (i18N) support is available in supporting programming languages such as Ruby, Python, Go, DotNet, etc.

The working example is available in GitHub Repo


Selenium4 Relative Locators

Selenium 4 alpha-3 is released yesterday with much-awaited friendly locators, now called as relative locators. These new locator methods will find elements based on their location relative to other elements, visually!  You can find the desired element or elements by passing withTagName method along with near, above, below, toRight and toLeft methods. These methods take the relative WebElement or By locator as an argument.  The overloaded near method takes the pixel distance as an additional argument. I did a trial run of this cool new feature on a sample app:

In the sample application, to find the input field which is right of the label Hight in Centimeters, we’ll first locate the label using the By.cssSelector method. Next, we will use the located element to find the input field. For this we will call the withTagName method along with rightOf method passing the label WebElement as shown in below snippet:


WebElement heightLabel = driver.findElement(By.cssSelector("label[for='heightCMS']"));
WebElement heightInput =  driver.findElement(withTagName("input")


We can also chain the relative locator method to narrow down the search as shown in below code to find the input field to enter weight. The weight field is below hight input and right of weight label:

WebElement weightInput =   driver.findElement(withTagName("input")

You can find the sample code in my GitHub repo

Selenium uses JavaScript method getBoundingClientRect to find the elements using Relative Locators.

If you want to know more about these new locators and sample usage, find more details in Selenium test base (tests are living documentation)

Closing note

The relative locators should be used wisely when other methods won’t work. The features in alpha releases may change in future releases. Also, these methods may not work well on overlapping elements.

This is not an entirely new concept. There are other tools in both commercial and open-source space offering similar features to locate element based on visual cues.

Using Tesseract with Selenium WebDriver for checking text on images using OCR

Recently a team approached me looking for a solution to extract text from an image displayed on a web page and verify it’s contents as part of Selenium tests.

This post explains the solution using Tesseract, Tess4J along with Selenium for checking text displayed on images.

Tesseract is a famous open source OCR engine. It uses the Leptonica Image Processing Library. Tesseract support a wide variety of image formats and convert them to text in over 60 languages.

Tesseract works on Linux, Windows and Mac OSX. Please refer Readme page for installation instructions.

This sample is built on Mac. You can install Tesseract on Mac using homebrew:

brew install tesseract

In addition to Tesseract (written in C++), we need a Java wrapper called Tess4J which provides JNA wrapper for Tesseract OCR API.

Here is a sample page which has a barcode displayed as image. We will extract the barcode number and assert it’s value.


Since I am using Maven for this project, I added Tess4j dependency to my pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns=""


Here’s JUnit test which navigates to the sample page and checks the number displayed on the barcode image:

package me.unmesh.selenium.ocr.example;

import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.By;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import static org.junit.Assert.*;

import net.sourceforge.tess4j.*;

 * A demo test to verify text from an image using Tesseract OCR API
 * @author  upgundecha
public class BarcodeTest {
    private WebDriver driver;

    public void setUp() {
        driver = new FirefoxDriver();
        // navigate to the dummy page with a barcode image

    public void tearDown() {

    public void testBarcodeNumber() throws Exception {
        // get and capture the picture of the img element used to display the barcode image
        WebElement barcodeImage = driver.findElement("barcode"));
        File imageFile = WebElementExtender.captureElementPicture(barcodeImage);

        // get the Tesseract direct interace
        Tesseract instance = new Tesseract();

        // the doOCR method of Tesseract will retrive the text
        // from image captured by Selenium
        String result = instance.doOCR(imageFile);

        // check the the result
        assertEquals("Application number did not match", "123-45678", result.trim());

Instead of capturing screenshot of the entire page using Selenium, I captured screenshot of the image element where the barcode is displayed on the page.

    <title>Barcode Sample</title>
        <td style="padding:10px; font-size:15px; font-family:Arial, Helvetica; text-align:center;">
          <p> Please write down your application id</p>
          <img id="barcode" src="barcode.png" />

The captured image is then passed to doOCR() method of Tesseract instance to retrieve the text.

To capture the image of a WebElement I used captureElementPicture() method from WebElementExtender class which is described in my book Selenium Testing Tools Cookbook:

package me.unmesh.selenium.ocr.example;

import java.awt.Rectangle;
import java.awt.image.BufferedImage;

import javax.imageio.ImageIO;

import org.openqa.selenium.OutputType;
import org.openqa.selenium.Point;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.internal.WrapsDriver;

 * This class provides various additional helper methods on elements
 * @author upgundecha

public class WebElementExtender {

     * Gets a picture of specific element displayed on the page
     * @param element The element
     * @return File
     * @throws Exception
    public static File captureElementPicture(WebElement element)
            throws Exception {

        // get the WrapsDriver of the WebElement
        WrapsDriver wrapsDriver = (WrapsDriver) element;

        // get the entire screenshot from the driver of passed WebElement
        File screen = ((TakesScreenshot) wrapsDriver.getWrappedDriver())

        // create an instance of buffered image from captured screenshot
        BufferedImage img =;

        // get the width and height of the WebElement using getSize()
        int width = element.getSize().getWidth();
        int height = element.getSize().getHeight();

        // create a rectangle using width and height
        Rectangle rect = new Rectangle(width, height);

        // get the location of WebElement in a Point.
        // this will provide X & Y co-ordinates of the WebElement
        Point p = element.getLocation();

        // create image  for element using its location and size.
        // this will give image data specific to the WebElement
        BufferedImage dest = img.getSubimage(p.getX(), p.getY(), rect.width,

        // write back the image data for element in File object
        ImageIO.write(dest, "png", screen);

        // return the File object containing image data
        return screen;

Tesseract is clean, fast and accurate for OCR testing needs. Similar approach can be followed for .NET using Emgu library