Discover more from Software Field Notes
Verify expectations; Do not just test correctness
A subtler, everyday approach to software testing.
tl;dr: Think of software testing (and coding) in terms of verifying expectations: expected values, expected behaviors; and examine how you arrive at a value or behavior.
As software testers we are always looking at values returned by our code: methods, functions, and individual instructions. Such values are either “right” or “wrong”. Software testing frameworks provide capabilities to assert that two values are equal/not, when comparing the “right” value with a potentially “wrong” value.
I wager that this “wrong vs. right” framing in software testing stems from our habit with print statements and debugging: where we routinely probe our program’s variables to ensure if they have the “correct” values.
Thinking about software tests in terms of “right” and “wrong” masks two essential subtleties:
Something/one has to define “correctness”.
Programs can stutter, stammer, and accidentally stumble towards the “right” value/answer.
Let’s unpack those subtleties and offer modifications to the “right vs. wrong” value-based testing.
Defining Correctness and Testing Expectations
Correctness is such a rigid and absolute notion that presents a deep issue: someone or something has to define or establish it.
Umm… good luck with that. Depending on the frequency of light and sound waves, different people see and hear the same thing differently ( I heard “laurel”).
My point is that defining correctness is hard in practice. And I stay away from discussions around correctness.
“Expectations” is such a nicer word, with so much wiggle room! And you can manage expectations!
To me, software development is managing a complex web of, often conflicting, expectations, often stemming from the people running your program.
And so, instead of thinking about the correctness of the output produced by my code, I think in terms of how/if my program’s values match my expectations, and the expectations of my program’s target audience.
And so when writing test cases, I make it a point to have two variables with the following names:
As the tester, I define and establish the value/state of `expectedValue`:
I am not thinking about the correctness of the `expectedValue`.
I am thinking about what the users would expect the value to be.
I source my definition/understanding of “expectations” from user interviews, and talking to other stakeholders.
I try to ground the `expectedValue` in the peculiarities and subtleties of practical use and application of the code that I am writing.
Then, as the coder, I derive the `actualValue` by invoking my code/method/function/program. This is the value that my program is actually spitting out.
I then compare the two — `actualValue` and `expectedValue` — to figure out if my program is doing the “right” … sorry, the expected thing. 😅
But that’s just half the story. Read on…
Curiosity about how the method computes
If we were to judge a book by its price tag, instead of actually reading it cover-to-cover, then the world would miss out on a lot. And yet, we take a similar outlook to testing our code: we test our code by its output, for some given input. We never dig into how it arrived at that output.
It is funny to me that we test a method by its output. What is the point of calling a method, if you are not going to test the … well, method? // there is bad-programmer humor lurking in that statement.
Most of this is driven by the “competent programer” hypothesis: that most programmers are competent and create correct or near correct programs. (that word — “correct” — again). This view contends that mistakes are outliers in the enterprise of programming. As a coder, I tend to agree with that idea — after all, if I keep second-guessing myself with every line of code that I write, I am no use as a programmer. The “competent programmer” hypothesis is a useful idea around trusting a programer’s capabilities, when thinking about their productivity.
And so, since we assume that the code we write is “mostly correct” (that word again), it is then sufficient to test it around the edges. And there is nothing more on the edge of a program than its final output.
But as a software tester, the applicability of the “competent programmer” hypothesis is meaningless. Because it is frankly not about the programmer. At that point it is about the program at hand — assuming that it compiles and runs.
As a tester, I love being curious about how a program works. So it is not enough for me that it produces the “correct”, or the expected output; but it matters how it did it —
what steps did the program take?
how could it have gone astray?
what conditions exist that make the program’s behavior uncertain or ambiguous?
how many steps did it take before achieving what it set out to do?
could it have taken fewer steps?
are there certain conditions that favor the programs working, more than others?
Answering such questions helps to assess if the correct output was intentional or an accident. But more importantly …
In asking those questions, the program has become this anthropomorphic entity that is entirely separate from the original programer who created it. Indeed, quite often, a single program is created by multiple programmers, not a lone coder in a dark room.
At this point, the program has its peculiarities, its own temperaments, its own behaviors.
This is when I like to look to the wisdom by Dan North, and all that he has to offer with Behavior Driven Development. The biggest shift in testing vocabulary that Mr. North has to offer is:
“tests” → “behaviors”
But perhaps a more subtle, but more important, suggestion that he ends up making is that shift from
“testing” → “verifying”
Just as with “correctness,” “testing” is a rigid word that not just invites discussions of “wrong and right”, but also requires that there be a bottom-line, or an end result that can judged wrong/right.
“Verification” on the other hand invites discussions around process; and almost places the method or process over outcomes. Yes, outcomes matter with verifications, but the method/process matter just as much, if not more.
And with all that said, I submit:
Verifying behaviors, instead of only testing correctness, offers a more nuanced and complete look at the workings of any program.
This brings us to a much more important, empirical discussion:
How do we actually verify behaviors instead of just testing correctness?
And, are we doing enough of it?
Does any of this amount to better software?
Useful questions. Subjects that I will start addressing tomorrow — I have rambled on enough for today 😅.
In the meanwhile, if you have thoughts, or disagree with what I lay out here, please leave me a note 🙂 (and go read that Dan North article about Behaviors.)