Hello everyone! The other day I was approached by macosxnerd101, a newly minted moderator at Dream.In.Code, and asked to contribute to a new thread coming up on Java data structures. Now most of the new programmers here know of the basic data structures like binary trees, arrays, heaps or stacks. Many of these topics I knew would be covered by other contributors to the thread. So I decided to introduce you to one that is based on our old friend the linked list, but with a new twist. It is called the SkipList (aka JumpList). We will talk a bit about what it is, how to build one in Java and what the heck they are good for. So sit right back and lets throw around some data structures on this episode of the Programming Underground!
To understand what a Skiplist is, we have to understand what a linked list is. For those of you who don’t know, a linked list is a collection of objects which are linked together by one “node” pointing to the next one in the chain. Think of it as a single file congo line. You put your hands on the hips of the person in front of you and they do likewise to the one in front of them. Starting at one end of the line, we can reach each node by following who they are holding onto.
Now as you can imagine a double linked list points in both directions. It points to the one in front of it and following it as well. This allows us to start at the beginning of the line or at the end of the line and still reach a node going either direction. To form a double linked list we simple make sure each node is pointing to the node we are adding to the list and that the one we are adding is pointing back to the one before it.
A SkipList (also known as a JumpList when the links are geometrically spaced) starts with the classic linked list. However, instead of just pointing to the node that it follows, or is in front of it, it can also point to nodes much further down the line. By having these extra pointers we can literally “skip” nodes in the line if we know they are not what we are after. If we know we need the 50th person in the congo line, why start at the beginning and move through the first 49 people? What if we asked the person at the start of the line “Can you point out who the 50th person is?” and then go straight there.
By having these extra pointers spaced out at our discretion, we can speed up lookups within a linked list. Depending on how many pointers and how we organize them, we can drastically increase operations in a linked list. Just keep in mind however that we have to take care of the pointers as we modify the list and this means writes/deletes may need to make additional operations from time to time.
In our example below we create a list of 100 nodes in our linked list. Each one of them is simply carrying an integer representation of where in the line they are. So we have nodes marked from 1 to 100. We are going to setup our nodes to include four extra pointers. Not all nodes in the chain will need to have these point to an actual object. Instead we will set them only on certain nodes. Below is a picture of how our setup works…
As you can see from the image above, our first node points to node number 2, but also will be set to point to node 25 (quarter of the way through the list) and also point to node number 50 (half way through the list). Node 25 will point to the next quarter which is also node number 50. Number 50 points to nodes 100 (next half) and also 75 (next quarter).
These new pointers are then used as “express lanes” when it comes to finding a value. If we are looking for node number 77, instead of looping through 76 nodes to get to it, we can ask “Is the first half way node still less than 77?” Yes, it is 50. So we jump all the way to node 50. Once there we ask that node “Is the next half way node still less than 77?” In this case it would be no because the next node is 100. We then ask “Ok, is the next quarter node less than 77?” Yes, it is 75. So we jump to 75. “Is the next quarter less than 77?” No, the next quarter is 100.
Now that we have no more express lanes to jump through, all we have left is our single linked list. We then go to 76 and finally reach 77. The overall process has lead to 4 iterations rather than 76 iterations if we had gone through one by one. This process can also work in reverse starting from the tail since we are also double linked through our previous pointers.
How might we accomplish this task using Java? Well… here is an example…
public class SkipListExample { private static Node Head = null; private static Node Tail = null; private static Node Current = null; // Records previous quarter or half mark private static Node quarter = null; private static Node half = null; public static void main(String[] args) { int ListCount = 100; // Setup a standard double linked list manually for (int i = 1; i <= ListCount; i++) { Node newNode = new Node(); newNode.data = i; // Assign to head node if this is the first node if (Head == null) { Head = newNode; Tail = newNode; Current = newNode; quarter = newNode; half = newNode; } else { Current.next = newNode; newNode.prev = Current; // Add a quarter pointer if this node is a multiple of 25 if ((i % 25) == 0) { newNode.prevQuarter = quarter; quarter.nextQuarter = newNode; quarter = newNode; } // Add a half pointer if this node is a multiple of 50 if ((i % 50) == 0) { newNode.prevHalf = half; half.nextHalf = newNode; half = newNode; } // Set current node to be the next in line, set the tail // and then loop back around for next node. Current = newNode; Tail = newNode; } } // Run some tests System.out.println("Find number 7... "); FindNumber(7); System.out.println("Find Number 33... "); FindNumber(33); System.out.println("Find number 67... "); FindNumber(67); System.out.println("Find number 101... "); FindNumber(101); } // Searches our skip lists to locate our number public static void FindNumber(int num) { Node currentNode = Head; boolean Found = false; while (currentNode != null) { // Simply show what nodes we have checked in our search System.out.println("Checked node with data: " + currentNode.data); // End search if value is greater than the value we are looking for. if (currentNode.data > num) { break; } // Check our various pointers to see if a jump would get us closer. if (currentNode.data != num) { if ((currentNode.nextHalf != null) && (currentNode.nextHalf.data <= num)) { currentNode = currentNode.nextHalf; } else if ((currentNode.nextQuarter != null) && (currentNode.nextQuarter.data <= num)) { currentNode = currentNode.nextQuarter; } else { currentNode = currentNode.next; } } else { Found = true; break; } } // Report our findings if (Found) { System.out.println("Number Found!"); } else { System.out.println("Number wasn't found!"); } } } // Our Node object with prev/next pointers and jump pointers class Node { public int data; public Node next = null; public Node prev = null; // Specialized skip list pointers public Node nextHalf = null; public Node prevHalf = null; public Node nextQuarter = null; public Node prevQuarter = null; }
The code above starts off by first creating a 100 node double linked list using our class “Node” (as seen towards the end). In addition to each next/prev pointer we have also added nextHalf/prevHalf and nextQuarter/prevQuarter. As we are building the list we are testing the value given to each node to see if it is a multiple of 25 (presenting a quarter of our 100 node list) and also if it is a multiple of 50 (start, half way or end of our list). At these special nodes we also set their corresponding skip list pointers. At each one of these special nodes we also save a reference it for when we reach the next “milestone”. We can use this reference to then get at it and set its next pointer while at the same time being able to set the previous pointer for the node we are currently adding. Think of that part like a bookmark of sorts.
Once we are done building the list, we have all our pointers in place and now we are ready to test them out with a test “FindNumber” function. Each time we visit a node we print out a small message so we can see how many nodes are visited during our search. The function goes through each node and takes the quickest path to its destination. It first tests the half way node and uses it if it can, otherwise it will look to the next quarter pointer and then finally the individual node pointers.
How can this sort of system break down? Well, if you misplace your pointers, perhaps don’t space them out properly or decide to add/remove nodes from the list without updating pointers then you can quickly find yourself cutting down the efficiency. For instance if you were to take our list, just the way it is, and made it 1 million nodes you would be able to look up values under 100 quite fast, but finding node 402,199 would still require a sequential lookup from node 100 onwards.
Other skipping schemes you could implement would be even or odd, marking prime numbers, square numbers, showing multiples or perhaps built in decision making. This type of data structure allows a SORTED linked list to lookup and locate nodes more quickly than a sequential process. Skip list structures can also be thought of as tree data structures but with multiple branches. Either way, if you have a lot of nodes in a sorted order and are going to be using it to find values multiple times, consider a skiplist. Depending on the data and how often it is accessed, this could even be faster than a binary search algorithm!
Hope you enjoyed this introduction to skip lists! Thanks again for reading! 🙂