Hello once again! Today I want to talk to you about a design decision which involves validating data. Often times newbies build error prone code. This is because they are just learning and lack the experience of proper validation… and lack the experience of being smacked in the face when the code fails on the input of “a” when an integer is expected. As we become more experienced we learn about validating incoming data. Then we often times take it to the extreme, validating our data in every routine and every class method. Sometimes we do this to the point where our code becomes bloated with endless validation routines or slows to a crawl while we check that “a” is a number, is then a number between -1 and 37, is then prime, it is not 5 etc etc etc. How do we strike a good balance? I will talk about creating “validation barriers” in your code and how we can segregate code that needs validation from the routines and classes that do not. All this on another great episode of the Programming Underground!
Many experts may argue that validation should be done on a function by function basis. The idea behind this idea is portability (and extreme paranoia). If you move the function to another project, it can protect itself. Fantastic idea, but do you want to have to always worry about validating input or would you rather worry about getting the function right? Do I really want to worry about num1 and num2 and that they are valid in a simple add(num1, num2) function? I think I rather worry about making sure add returns the correct result when I give it two integers. Wouldn’t you?
Setting up barriers within a project can protect classes or functions from this validation chaos. Think of it as setting up a clean room. A section of the program where functions don’t have to worry about the incoming data being incorrect, it has passed through validation already and it can assume the input is going to be correct. On one side of this layer is the functions which take in user input or input from external “dirty” sources like a file or a database connection. The other side are functions which work with data that it knows is valid and can freely work on it.
One of these barriers can be a set of validation functions or classes that sit between the input functions/classes and validates the data before passing it on to the “pure” functions. If data is bad, it can reject it or it can simply scrub it and make the bad data into good clean data. For instance, assume we ask the user for a letter. They enter in -1. Obviously wrong. The validation routine would take in this data, see it is not a letter, and perhaps setup a default value like ‘a’ which it then passes on to the functions that deal with this character. Perhaps it sees the -1 and sends back a message to the user saying “Hey dummy, I asked for a letter not an integer! Try again!”.
Below is a graphic to illustrate two kinds of barriers. The first one is the barrier of classes/functions which separate the input classes from the “safe zone” of pure functions/classes. You will also notice that one of the classes has a second barrier on it. This is the interface for the class which will validate data coming into the class. In other words, the public properties and methods which ensure that the class is always set into a valid state. This is a class barrier I will speak about next.
Class Barriers (The Interface)
We often write classes to be reusable and why should we reinvent the wheel when we don’t have to? Classes should always ensure that the data they receive keeps the class valid. At no time should your Color class say it is of color “February 17, 2011”. Makes no sense. We as programmers should setup barriers in the interface methods and properties of our class to ensure that the class is created with proper data. This also goes for any time we want to alter the state of that class these methods/properties will keep the class valid. So when we call Color’s “setColor()” method we should be able to validate that the incoming data is indeed a valid color name for a color before we change the class to represent said color. Any internal helper methods for Color on the other hand already knows that the internal data is good, no validation necessary.
Helper methods within these classes, private to the class and the hidden from the outside world, can then be in its own “safe zone” within the class. It can always assume that the class’ private data members are valid and safe to use. In other words no validation is necessary… unless of course the class is reaching out for possibly dirty input again.
So these barriers we setup can help us in the following ways….
1) Provide a central location for creating validating routines to scrub and validate data from all kinds of sources. Easy for maintenance.
2) The barrier classes and routines can be portable. Yay for reusability!
3) We cut down on the need for validation code for all of our classes. We can get to work on solving problems without worrying about the incoming data being tainted and needing to be cleaned for each and every function. (Simplification and reduced complexity)
4) We keep classes in a proper and valid state which will cut down on errors later when we use them in other areas of code.
5) Less time validating every step of the way cuts down the bloat and increases the performance.
Other places you might want to consider these validation barriers…
1) Between subsystems in a complex system (Between items in a block diagram for those engineers out there)
2) Anywhere your system is expected to output valid data (Barriers can help our systems strive to be good citizens and output valid info)
3) Perhaps to act as wrappers around possibly unsafe or dangerous code. (Wall off bad code with validation)
With these barriers put in the correct places, we can keep our code secure and quickly isolate bugs as they appear. These barriers create choke points where bad data is forced through some kind of cleaning process before being sent through. Then once cleared, it can be treated with a certain level of expectations. I hope you enjoyed the entry and I look forward to writing another article to help you all become better programmers! Thanks for reading! 🙂