Ever hear the term “Kicking the bucket”? If not, it is usually used here in North America to mean someone died. Well rest assured I am not going to be whipping out expressions like “I see dead programmerssss” or anything in this post. Ok, maybe I just did, but the idea behind BucketSort (aka Bin sort) is just that, creating buckets of numbers which we attack with our choice of sorting algorithm (even Bucketsort itself if you wanted). I will show you one way of using bucketsort in tandem with a previous entry I made about quicksort and sort an integer array all in the lovely lovely language of C#.. on this entry of the Programmer’s Underground!
Imagine if you will a list of integers in an array. Random integers scattered about like Kramer’s mind on Seinfeld. We would like to sort them in a nice neat little fashion, maybe to show our friends or boss how elite we are. To tell other programmers about a sort they probably haven’t dealt with and why you kick ass because you know of DIC and they don’t. Perhaps both. Now take these numbers and imagine creating a series of buckets. You then take each number out of the array and throw it in the appropriate bucket. Now the size of your bucket will determine how many buckets you will need to store all the values. If you have a bucket that can only hold 1 value, in an array of 100 numbers, you are going to have 100 buckets. But lets say you have a bucket that can hold numbers in groups of ten. Maybe one bucket will be for values 1 to 10, the other for 11 to 20, 21 to 30 etc. For the same 100 numbers you will have 10 buckets.
So you go through the array and place the integer into the appropriate bucket based on its value. So you might have 20 numbers in the 1 to 10 bucket and none in the 21 to 30 bucket. Empty buckets are fine.
After you have taken all 100 numbers and categorized them in one of our ten buckets, we then go bucket by bucket and sort the numbers in that bucket. How we sort them will be up to you. You can wire up any sorting algorithm you want. In our demonstration below we use quicksort, but you could easily use bucketsort again. Taking all the numbers in that bucket and again break them down into smaller buckets. Eventually you will get it down to a set of buckets with a number in each. After sorting a bucket, you move on to the next bucket.
Lastly, with all your buckets now sorted, you just have to go bucket by bucketing throwing out the sorted numbers to an output array (or in our case a listbox) and they will be in sorted order. Voila! Sorted numbers!
How about a little demonstration? I knew you wanted one. 😉
Lets say we have an integer array of 10 numbers that range from 1 to 100.
7, 11, 4, 5, 22, 73, 88, 39, 38, 1
We would first get the range, 1 to 100 and create our buckets based on this range. You can make ten buckets or 2 buckets. Changing the bucket size will effect the efficiency of this algorithm, so if you have a large range, use larger buckets! Here we will use 10 buckets, each which cover groups of ten.
Going to the first element, we see 7. Ok, throw that in the 1 – 10 bucket. Next is 11, throw that into the 11 – 20 bucket, both 4 and 5 will also go in the 1 – 10 bucket, 22 goes in the 21 – 30 bucket, 73 in the 71 – 80 bucket etc etc. Notice something? Buckets 41 – 50, 51 – 60, 61 – 70, 90 – 100 buckets are all empty. That is perfectly OK!
The first bucket 1 – 10 has the values 7, 4, 5, and 1 in it. We sort this bucket using any sort algorithm you find necessary. Afterwards the bucket has 1, 4, 5, and 7. Do the same with the other buckets. At the end we just go to the first bucket and print out 1, 4, 5, 7 and the second bucket, 11, and the third bucket 22, next bucket 38, 39…on to the last bucket where you would have 88.
Now lets see this in C#!
// Button event to setup an array of numbers and sort them with a modified bucketSort private void btnSort_Click(object sender, EventArgs e) { int[] testNumbers = { 1, 5, 7, 2, 74, 23, 55, 84, 32, 11, 23, 35, 46, 59, 61, 77, 99, 90, 85 }; bucketSort(testNumbers, 0, 100); } // Slight variation on bucketSort algorithm private void bucketSort(int[] arValues, int min, int max) { int interval = 10; int bucketnum = (max - min) / interval; // Setup our buckets according to interval across the range ArrayList[] buckets = new ArrayList[bucketnum]; for (int i = 0; i < bucketnum; i++) { buckets[i] = new ArrayList(); } // Throw values into the appropriate bucket. for (int i = 0; i < arValues.Length; i++) { buckets[arValues[i] / interval].Add(arValues[i]); } // Sort each bucket for (int i = 0; i < bucketnum; i++) { // This is where we would wire in our choice of sort. // Here we are converting our Arraylist to Int32 and passing it to // quicksort() int[] currentbucket = (int[]) buckets[i].ToArray(typeof(int)); quicksort(currentbucket, 0, currentbucket.Length - 1); for (int j = 0; j < currentbucket.Length; j++) { listValues.Items.Add(currentbucket[j]); } } } // Quicksort controller function, it partitions the different pieces of our array. private void quicksort(int[] arIntegers, int left, int right) { if (right > left) { int pivotIndex = left; int pivotNewIndex = partition(arIntegers, left, right, pivotIndex); // Recursive call to quicksort to sort each half. quicksort(arIntegers, left, pivotNewIndex - 1); quicksort(arIntegers, pivotNewIndex + 1, right); } } // This function takes an array (or one half an array) and sorts it. // It then returns a new pivot index number back to quicksort. private int partition(int[] arIntegers, int left, int right, int pivot) { int pivotValue = arIntegers[pivot]; // Swap it out all the way to the end of the array // So we know where it always is. swap(ref arIntegers[pivot], ref arIntegers[right]); int storeIndex = left; // Move through the array from start to finish comparing each to our // pivot value (not index, the value that was located at the pivot index) for (int i = left; i < right; i++) { if (arIntegers[i] <= pivotValue) { swap(ref arIntegers[i], ref arIntegers[storeIndex]); storeIndex++; } } swap(ref arIntegers[storeIndex], ref arIntegers[right]); return storeIndex; } // Simple swap function for our in place swapping. private void swap(ref int val1, ref int val2) { int temp = val1; val1 = val2; val2 = temp; }
If you step through the code you will see the classic quicksort from my previous article (unchanged) and how I tied it directly into the bucketSort setup. For demonstration purposes and easy use, I implemented an ArrayList from C# to handle the adding of elements. I could have used the ArrayList.Sort setup to sort the buckets if I so chose, but I wanted to demonstrate how we could hookup just about any sort into this setup by “plugging it in” when sorting the bucket. To help facilitate the use of the ArrayList with my quicksort, which took a standard integer array, I had to convert the ArrayList to an integer array and cast it.
The actual call to bucketSort takes place in a button where I have setup a demonstration array of integers and pass that array to the bucketSort along with the min and maximum range of numbers. NOT the size of the array, but the range of values those numbers are in. So to use this effectively you will have to know a bit about the range of values in your array prior to this sort. You could simply find the minimum and maximum values of the array using a loop (examples can be found all over DIC forums).
The swap and partition functions are part of the quicksort algorithm I setup.
I encourage you to read my blog entry on quicksort found at The Coders Lexicon – Quicksort Definitive (C++,C#,VB.NET, Java, PHP) if you haven’t already and then step through the code here to see how my variation of the BucketSort works with it. You could essentially use any sort (bubblesort, mergesort, heapsort etc) and give the sort the ability to divide and conquer.
I eluded to the idea that bucket size can effect efficiency on this and that is true. The bigger the buckets the more processing power your sort will have to exert on each bucket (it is dealing with more numbers and increases your computational complexity). It is good to strike a balance and set up your ranges based on a nice interval and depending on the type of sort you are using in tandem. Don’t create huge buckets of 1000 numbers and then go with a bubblesort on it or you might be slowing yourself down.
Under certain situations this algorithm may only run at linear time O(n). So if you are looking for some serious processing power, you might want to play with the type of sort you use in combination with this bucketSort approach.
I hope you had fun diving into the world of the bucketSort and if you ever find yourself lonely, nothing on TV, maybe even feeling too old to be much good to anyone, drop on by Dream.In.Code and we can certainly help you get back onto the road of looking slick at your next party with programming trivia!
Thanks for reading. 🙂