You can probably guess what a Data Flow Diagram might mean. It has something to do with Data from Star Trek and his schematics right? No no… while he kicks ass on the Enterprise, Data Flow Diagrams kick ass in system design. You might not be able to build a robot with artificial intelligence and calculates at teraflops a second, but you can dramatically improve your skills of converting a problem to a programming solution. We will talk about it here on the Programming Underground!
Have you ever been overwhelmed by some huge project that has a ton of working pieces and you don’t know where to start? I am sure we all have. While making some psuedo code is always a good plan and highly recommended, even making that for some complicated tasks require you to know where to start. How do the pieces of the system work together? This is where Data Flow Diagrams (also known as DFDs) fit in.
Don’t confuse these with Entity Relational Diagrams which describe objects and how they interact with one another. Data flow diagrams are used to track where data goes in a system. How it “flows” much like how you would track a molecule of water in a huge river. You are in the middle of Canada and you want to know how that molecule gets from point A to point B at the mouth of the Mississippi.
You first start with a general overview of the system. This is known as Level 0 or the context level and you slowly break each piece down further and further. This is known as decomposing the system and after each successful level of decomposition you increase the level of view. So if you were to take the level 0 diagram and peek inside the system to see it had several subsystems, that view would be known as level 1. If you were to decompose one of those subsystems you would get to level 2. You keep decomposing until you fully understand how things are put together and the pieces are simple enough for you to…. what else, program into some functions.
Lets look at an example. Lets take a microwave. Just looking at the microwave would be your level 0. You can see the type of inputs you can make (such as entering digits into the keypad). Now if you were to take off the cover of your microwave, you might notice inside that it has a few subsystems like the timer mechanism, the keypad, the handle for the door, perhaps a buzzer and a tool to cook that macaroni and cheese until it is pretty much black. This would be your level 1 view of the microwave. Now lets say we took out the timer and looked inside it to see a series of springs, maybe some circuit boards, wiring etc. This would be your level 2. At each of these levels you would make a note of the inputs and outputs of the subsystems/parts and how they interact with one another. You can see a relationship here between the levels if you look close enough. If at level 1 you see that the timer has one input and one output, you can rest assured that at level 2 you will see at least one input to the system and one output of the system. No more, no less.
These inputs and outputs are drawn with lines and arrows from each part of the system to other parts it interacts with. These arrows represent the data flows. You might have one or more arrows pointing to a system (the input… from another system or even a user pressing in a time on our microwave) and one or more pointing away from the system (the output of the system…. like the visual readout on the timer display for our timer showing how long before our dinner will explode). Some of these lines will be pointing at other subsystems where the output from our timer might be the input for our buzzer to tell it time is up and to go off. Sounds a bit simple right? It is really. There are just a few traps you have to watch for. Two of the most common are known as “blackholes” and “miracles”.
Input going into a system and nothing leaving is called a “blackhole” and is a violation of the diagram. It would be like punching in numbers on our microwave and then it doesn’t accept them or the microwave just doesn’t start. There would have to be something wrong. “Miracles” are the opposite. This is when a subsystem is giving output without needing input which again is a violation. How can a system work without getting some kind of instructions on what it needs to do? How does a timer know you want 3 minutes for you lunch unless you punch it in?
To identify these problems you have to look at the different levels of your diagram. Analyze where your data is moving to and where it is moving from. Make sure that each component is getting information from the user or another subsystem and that it is sending data to another subsystem. Data is always in motion and even if it is temporarily stored (like in a database). When it is stored it is considered to be in a “waiting state”. Even when you print to file the paper file you have in your hand is considered data is a state of waiting. It is data that could either be fed into another system or you pack way (which is then considered in a waiting state indefinitely).
Below is several diagrams which discuss how to start at the context level and move down through the levels. Pay particular attention to how the number of inputs and outputs match from level to level and how each level is a part of the level above it. At the context level we show that there is the user entering input and two forms of output, displaying the time and beeping.
Now I will show you the next level. This is inside the microwave. Notice how the user input is still seen coming into the microwave and how we still have two outputs, once for the beep and one for the time display.
Lastly we go down another level which zooms on the subsystem “timer”. But you know what this means? You could potentially have three different sheets showing you level 2, one for timer, buzzer, and keypad. Or as you may often see it, drawn all out on a huge piece of paper where all three components are shown broken down and linked to one another. You can see how after 3 levels that you can quickly have a lot of boxes and lines.
Now the above images are very simplistic in nature and show just the very basic of systems. It is meant only as an example! It doesn’t include the blackholes or the miracles, but you can quickly get the idea of each. Imagine if, in figure 3 above, we had no line from “decrement timer” to “put together display”. So all we had was a single line coming out of it to show the time to the user. This would be an instance of a miracle. No inputs, but an output. Same if we had no line coming out of the “put together display” module. This would then become a blackhole.
One thing ouched on earlier, is the idea of a storage component like a database or a file. These are known as “data stores” and they are a temporary location used to hold data that are in a waiting state as discussed earlier. You would still see an input and output lines going to and from it, but it would be marked with a slightly different symbol than other rectangles. Sometimes you see them as rectangles with double bars on the sides, sometimes you see them as a rectangle with no sides (just the top and bottom are displayed), and sometimes you see it as two rectangles connected together with the one on the left being smaller of the two and the one on the right has the right border missing.
Again this symbol is just to let you know that at this part of the system the data is being stored as the output of one or more processes and in wait to be the input for one or more processes. Some web sites will mention the idea of data “permanently” stored in a data store but don’t fall into that trap. It is better to think of it as data ready to go and just waiting to be let out. It may never be let out, but it is ready. You never know when you will be asked to build onto the system and that data store will become the input to another whole subsystem elsewhere.
So how does this make you a better programmer or system designer? Well remember that project you didn’t know where to start at? Start at the top and work your way down (known as the top down approach). Look at that CMS system and identify its parts, then look at each part and identify how just that part works. Take each one of those parts and break it down even further until you are at a point where you know you could code the component in a snap. Just don’t get too deep. Most systems never go more than 4 or 5 levels before it is obviously clear how to program it.
Thank you for reading and I hope you learned something out of this. Now Lt Commander Data, well… he might take quite a few data flow diagrams to fully document his systems. Now we know what Doctor Noonien Soong was up to for all those years… drawing boxes and lines instead of being with the Mrs. No wonder data kicks ass!
🙂