Fuzz…no not the stuff growing on your vegetables in the fridge but the data kind. Data fuzz is rumored to be behind the successful crack of the iPhone in recent months. However, some people in the industry may have not heard of the term “Data fuzz”. So in a short entry I will discuss what it is, how it can relate to blackbox testing and how it can make your applications more tolerant to outside forces… on this entry of the Programmer’s Underground!
Data fuzz is a relatively simple concept. It is the idea of random data noise. Much like white noise would be to signals “fuzz” is to random data generation. Program tools can be created to generate random data, be it a stream of characters, regenerating interface events, pseudo random numbers or some other kind of data. It is then thrown at a specific application to see where and how it fails.
It would be like taking your favorite little brother and duct taping him to a tree and then throwing as much snowballs at him as possible. Some with rocks, some without, some with pine cones or sand and wait for him to scream for mommy. This fits quite nicely with the idea of blackbox testing.
For those who didn’t read my previous post awhile back on blackbox testing, it is the concept of creating self contained modules that provide only a simple interface and which has its data members encapsulated and hidden from those entities using it. You don’t know how it works (hence the term ‘black’ in blackbox) and really don’t care, as long as it does its job and does it well.
We can test a blackbox module by throwing a bunch of random data at it and see if it fails. Again we don’t care what is in the blackbox but we do want to record what type of input we throw at it and its result. If it fails, we make a note of the input provided and can use that to formulate where the blackbox is screwing up.
Now there are advantages and disadvantages to this. The advantage is that it can be automated and without any kind of skew in perception from the tester (who might be the designer). It throws whatever at the module without care. It wants to break it and can do this attempt even thousands of times a minute… far greater than any human could test it. This allows for much larger test ranges and using data never before conceived of by a human tester.
The disadvantage of it is that it can often times miss crucial areas of testing or specific code areas (known as code coverage). To give you an example, lets say that you have a function which expects to take in a number between 1 and a billion. A human tester can’t possibly test all 1 billion inputs. A program could probably cover a few million though, far more than a human… but it may not always test the right numbers. In a range of numbers like this, you always want to test the edge numbers, does 1 work? Does 1 billion work? What about 1 billion and 1?
A human tester could test these values because it knows were to look. Maybe the human tester knows something about the module and how it handles numbers between 34,566 and 34,999 and would test values in that range. A computer program generating randomly may never even hit a number in that range. Maybe those numbers specified test a certain set of if statements in the program. Thus that code would go untested if the computer never hit any of those numbers during testing. This is what is meant by code coverage. In complex systems, the random generation of data can’t guarantee that all paths of execution will be fully tested. For this reason some companies actually throw various fuzz data generating programs at the same module, each testing different things with different inputs. One might test character data while another tests numbers and yet another test interface events. Some of these tools even use a database to generate data, test applications, record the results and even make it reproducible so that you can confirm a failure has been corrected.
But… given some more simple applications, enough time, and perhaps some rules of testing to follow, the program could generate enough fuzz data to cause the module to fall flat on its face and point out obvious errors that would have been discovered by hackers or average people in the public. So it does have value and may be worth the time.
As I eluded to earlier, it is rumored that the new iPhone was actually hacked because the hacker implemented some fuzz data input on it and found its flaws in the software. They then had a place to start to figure out how to circumvent the software and gain access. Now I know they also opened up the phone and did some soldering, but in combination with the software being hacked made it possible for the person to use other phone networks.
So if you have the opportunity to get your hands on, or create a data fuzzer program, it might be wise to run it against one of your latest creations before implementation in a production environment. Try out a few if you can or even one that can be configured to generate numbers, character data, and interface events. Even if the program finds one or two errors it really could mean the difference between you making your application stronger or letting the public get it, find the errors, and cost your company thousands or millions of dollars in damages.
Thanks for reading! 🙂