How test automation with Selenium can fail

Posted by Matt Archer on November 29, 2010

Introduction

I like automated tests. They’re typically quick to run and assuming they acquire the correct actual and expected result they rarely fail to produce an accurate comparison. Don’t get me wrong, I’m no fanatic, but I have seen enough success to know that the right tool, in the right tester’s hand, can be a valuable addition to a project.

That said, I can see how test automation can fall out of favour within a project or organisation. Tools have changed significantly over the years, however, for many projects their tool’s knack to do just about anything ironically remains its greatest strength and weakness. Even if a project decides to use one of the latest tools for their particular technology, such as Selenium or Watir for GUI-level web testing, it is easy to erode the expected benefits if either tool is used inefficiently (something that is easier to do than you may think).

As I’m sure many of you already know, automated tests are small pieces of software that test (typically) larger pieces of software. This makes them great for comparing one thing to another, assuming the comparison isn’t subjective. Provide two numbers and an automated test will tell you whether they are equal. Provide two pieces of text and an automated test will tell you whether they are the same. Provide two pictures and an automated test will even tell you whether they contain the same pixels. Bottom line – if an automated test gets the opportunity to compare two things we would otherwise compare by hand we can feel relatively confident that the final outcome (pass or fail) is the correct one.

So why can so many of us relate to the feeling I am about to describe? You know, that feeling you sometimes get when you start to run your automation, and as the results are produced, not knowing whether it is the software you are testing that is failing or a problem with your automated tests. For me, one of the goals of testing is to help a team feel confident that the software we are building is ready for a given event (such as a demo, a release or maybe just the next phase of the project). On occasion, I find this desire for greater confidence to be in direct conflict with test automation. This typically occurs if the automation is “flaky” and consequently I end up feeling less confident in my results compared to if I had performed the tests manually. Have you ever tried making others feel confident, when you’re not feeling confident yourself? It’s not easy. In fact, easy or not, passing on false hope (or despair) is not the sort of habit you want to embrace.

At first glace, the root cause of this uncertainty can be difficult to track down, often leading to the conclusion that all automation is bad. For me this is not true. There are just a few approaches to automation that whilst in the right situation can prove useful, more often than not tend to cause the team an unnecessary headache. One of these approaches is calculating expected results on the fly. Another is including unnecessary navigation.

All example code that follows is a mix of pseudo code and Selenium C#.

Calculating expected results on the fly

Imagine for a moment that we are testing an online dispatch system for a logistics company. Each morning the system receives a CSV file containing a list of items to dispatch that day. There is a simple business rule that we want to test. If an item weighs more than 50kg then mark the item as a two-man lift, otherwise mark the item as a one-man lift. This sounds easy enough so we create an automated test to pass a file containing fictional items to the dispatch system and check each item has been labelled correctly on the website.

public void CheckWeightCategory_Version1()
{
    // For each item in the file, check it has been labelled correctly
    foreach (Item currentItem in testFile)
    {
        // Navigate to the item tracking page on the website
        selenium.Open("/ItemTrackingPage.html");

       // Get the weight category for the current item
        string acutalWeightCategory = selenium.GetText("weightCategory" + currentItem.ID);

        // Calculate the expected result
        string expectedWeightCategory;

        // If an item weighs more than 50kg then mark the item as a two-man lift,
        // otherwise mark the item as a one-man lift
        if (currentItem.weight > 50)
        {
            expectedWeightCategory = "two-man lift";
        }
        else
        {
            expectedWeightCategory = "one-man lift";
        }

        // Compare actual to expected result
        Assert.AreEqual(acutalWeightCategory, expectedWeightCategory);
    }
}

So that we can calculate the expected result of the fly, we have coded the if-then-else business rule in a similar way to the dispatch system itself. On the up side this means that we can test as many input files as we like without having to worry about the expected result, however, we now have a mixture of code to prepare the expected result and code to perform the test, all intermingled together. As the example stands at present, this doesn’t feel like a huge problem, but we are standing at the top of a very slippery slope. Consider the following;

Q: What happens if the business rule is made more complex?

A: We can add some more code to represent it.

Q: What happens if a particular item is missing a weight?

A: We can add some more code to trap it.

Adding more code is fine in principal, but it can have an unpleasant side effect. Below is an updated example based upon the two questions above. Notice we now have eight lines of code related to calculating the expected result and only four lines of code that relate to performing the test itself. In essence, we have created a test with three times as many lines of code than is absolutely necessary to perform the test. By embedding the business rule, we have also made (or are at least are well on our way towards making) the code for our automated test as complicated as the system we are testing. And this is one reason I can personally lose confidence in automation. Crudely speaking, when the odds of finding a bug are equally divided between the system we are testing and our own test automation code we must think carefully about the return on investment we are likely to receive.

public void CheckWeightCategory_Version2()
{
    // For each item in the file, check it has been labelled correctly
    foreach (Item currentItem in testFile)
    {
        // Navigate to the item tracking page on the website
        selenium.Open("/ItemTrackingPage.html");

        // Get the weight category for the current item
        string acutalWeightCategory = selenium.GetText("weightCategory" + currentItem.ID);

       // Calculate the expected result
        string expectedWeightCategory;

        // If the customer wishes to collect the item from the warehouse,
        // the weight category should be marked as N/A
        if (currentItem.customerCollect == true)
        {
            expectedWeightCategory = "N/A";
        }
        else
        {
            // If the item does not have a weight, fail the test
            if (customer.weight != null)
            {
                // If an item weighs more than 50kg then mark the item as a two-man lift,
                // otherwise mark the item as a one-man lift
                if (currentItem.weight > 50)
                {
                    expectedWeightCategory = "two-man lift";
                }
                else
                {
                    expectedWeightCategory = "one-man lift";
                }
            }
            else
            {
                Assert.Fail("No weight available to calculate expected result");
            }
        }

       // Compare actual to expected result
        Assert.AreEqual(acutalWeightCategory, expectedWeightCategory);
    }
}

That said, any automated test that performs a comparison needs to know the expected results. I can’t argue against this fact, but my recommendation would be to avoid calculating any expected results automatically, especially if that automatic calculation takes place as part of the test itself. Like all aspects of testing there are countless ways of achieving a particular task and maintaining expected results is no exception. If you believe that automatically calculating expected results is for you then I would at least consider separating the code that calculates the expected results from the code that performs the actual test. And I don’t just mean splitting the code into two different methods that are executed one after the other by a single piece of code we call the test. I’m talking about a much harder divide where one piece of code saves the expected results to a known location that can be read (maybe even edited) by a member of the team, before a second piece of code takes the persisted values and performs the comparison. Not only does this keep a clean divide between the two pieces of code, it also allows for easier debugging based on human inspection (and where necessary manipulation) of the expected result before they are used as part of the test.

Another option is to work out any expected results “by hand” in the same way as if we were performing the test manually. Once we have the expected results (assuming they are a sensible format) we can use them as the basis for both manual and automated tests. I can tell what some of you are thinking, “my aim is to automate the majority of my tests and do very little testing manually”. That may be a great aim for your project, but I guarantee that you will end up performing a great deal of manual testing, even if you do not call it by that name. Whilst debugging automated tests (both as part of their creation and later as part of their maintenance) sometimes the best approach is to step through the lines of code, placing various break points and watching the value of variables change. That said, on other occasions it’s far easier to perform the test manually to double check that the application is behaving as we expect, often proving both the quality of the website and the quality of the automation at the same time. It can also be easier to explain a bug to somebody when you can demonstrate it manually rather than attempting to commentate on a screen that is racing past at one-hundred miles per hour.

Unnecessary Navigation

Let us now take a look at another potential destroyer of automated tests, unnecessary navigation. Imagine for a moment that we are testing an online book store. The customer is keen to ensure that every book is displayed with the correct title, description and price and has provided this data for a sample of 100 books. To automatically check our sample books we could write something similar to the code below. As part of its creation we would need to give it a meaningful name, such as CheckBookTitleDescriptionAndPrice.

public void CheckBookTitleDescriptionAndPrice_Version1()
{
    foreach (Book currentBook in testFile)
    {
        // Open the homepage
        selenium.Open("/HomePage.html");
        selenium.WaitForPageToLoad("30000");

        // Login
        selenium.Click("link=Login");
        selenium.WaitForPageToLoad("30000");
        selenium.Type("TxtUserName", "TestAccount1");
        selenium.Type("TxtPassword", "TestPassword1");
        selenium.Click("Submit");
        selenium.WaitForPageToLoad("30000");

        // Search for the book we want to check
        selenium.Click("link=Search");
        selenium.WaitForPageToLoad("30000");
        selenium.Type("TxtSearchTerm", currentBook.Title);
        selenium.Click("Submit");
        selenium.WaitForPageToLoad("30000");

       // Open the Detailed Information page for the book we want to check<br />
        selenium.Click(currentBook.ID);
        selenium.WaitForPageToLoad("30000");

        // Check that the Detailed Information page has loaded
        Assert.AreEqual("Detailed Information page", selenium.GetTitle());
        
        // Compare the information on the Detailed Information page to our expected results
        string actualBookTitle = selenium.GetText("BookTitle" + currentItem.ID);
        Assert.AreEqual(actualBookTitle, currentBook.Title);
        
        string actualBookDescription = selenium.GetText("BookDescription" + currentItem.ID);
        Assert.AreEqual(actualBookDescription, currentBook.Description);
        
        string actualBookPrice = selenium.GetText("BookPrice" + currentItem.ID);
        Assert.AreEqual(actualBookPrice, currentBook.Price);
    }
}

This code is fine in principle, but just how much of it is actually related to the things we want to check? Do we really need to login to the system, start this test from the homepage and use the search functionality to find the relevant book? Probably not is the answer. But what is the problem with having these actions in the test, they’re surely not causing any harm and you never know we may stumble across a bug that we weren’t expecting to find by mistake?

Whilst this sounds nice in principle, for me, there are two problems with the philosophy of if I’m lucky I might find some extra bugs by chance, for free! Even though this sounds great of the surface, automated tests that stumble across a bug (other than the one(s) they were deliberately trying to detect) can take a considerable time to diagnose as we ponder over whether it is the website that is broken, or dare I say it, the test itself. The frustration, however, does not stop there. If the homepage, search or login happened to be broken then there is a strong possibility that any test that uses those features as part of their navigation will fail to reach the part of the system we are trying to test. Not ideal if we need a quick assessment of the website’s quality as we prepare for a release later that day. At this point in time, our automated tests are next to useless, as all they tell us (and this may not even be directly) is that the homepage is broken. It is also misleading for a test entitled CheckBookTitleDescriptionAndPrice to fail even though there is nothing potentially wrong with anything the title suggests.

Fortunately the solution is easy. We put this (in my opinion, unnecessary) navigation in by choice, but if it’s surplus to requirements why not leave it out or replace it with something more succinct? We could rewrite our test to look something similar to the code below. Notice how we have reduced the amount of unnecessary navigation by replacing much of our previous example with a single, parameterised deep link.

public void CheckBookTitleDescriptionAndPrice_Version2()
{
    foreach (Book currentBook in testFile)
    {
        // Navigate directly to the Detailed Information Page for the book we want to check
        selenium.Open("/DetailedInformationPage?bookId" + currentBook.ID);
        selenium.WaitForPageToLoad("30000");

        // Check that the Detailed Information page has loaded
        Assert.AreEqual("Detailed Information page", selenium.GetTitle());

        // Compare the information on the Detailed Information page to our expected results
        string actualBookTitle = selenium.GetText("BookTitle" + currentItem.ID);
        Assert.AreEqual(actualBookTitle, currentBook.Title);

        string actualBookDescription = selenium.GetText("BookDescription" + currentItem.ID);
        Assert.AreEqual(actualBookDescription, currentBook.Description);

        string actualBookPrice = selenium.GetText("BookPrice" + currentItem.ID);
        Assert.AreEqual(actualBookPrice, currentBook.Price);
    }
}

Summary

I am not suggesting that calculating expected results on the fly or tests with large amounts of navigation should never be used, but I do believe they should be treated with caution. Both approaches leave tests less focused, harder to maintain and open to a variety of false failures. As I mentioned at the beginning, I like automated tests. They’re typically quick to run and assuming they acquire the correct actual and expected result they rarely failed to produce an accurate comparison. And here lies the problem. Over time, calculating expected results of the fly can reduce the probability of a test acquiring the correct expected results and unnecessary navigation can make it much harder for a test to capture the actual result. If we keep both of these things to a minimum, we provide each test with a much greater chance of getting to the part is does best – the comparison.

This entry was posted on November 29, 2010 at 9:01 pm and is filed under Test Automation, Testing. Tagged: Automated Software Testing, Automation Framework, Functional Testing, Selenium, Selenium And Watir, Selenium Ide, Selenium Rc, Selenium Rc Automation, Selenium Remote Control, Selenium Test Automation, Selenium Tests, Selenium Tips, Softare Test Automation, software quality, Software Testing, Software Testing Tools, Test Automation, Test Automation Tools, Testing, testing tool, testing tools, Watir And Selenium. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “How test automation with Selenium can fail”

A Smattering of Selenium #35 « Official Selenium Blog said

December 6, 2010 at 12:29 pm
[…] How test automation with Selenium or Watir can fail provides two code smells for automation […]

Reply

Matt Archer's Blog

LinkedIn

Twitter

Twitter Updates

New Book

Follow via Email

Subscribe