Sorting Algorithms: Quicksort

    Sorting Algorithms: Quicksort

      All the other algorithms we've talked about have a clear connection between their name and what they do. Quick Sort isn't like that. At all.

      Okay, maybe a little. At its best, it can run as fast as Merge Sort's O(nlogn). At its worst? It's only a little better than Bubble Sort's O(n2).

      Like Merge Sort, Quick Sort divides and conquers—meaning it uses recursion—but programmers tend to choose Quick Sort because it runs faster on average.

      We're getting ahead of ourselves, though. Let's talk about how it runs. Quicksort first picks a "pivot" value and compares every other value to the pivot, placing the value to the left of the pivot if it's smaller and to the right of the pivot if it's larger.

      Then it goes into the left side and picks a pivot to sort this half, working recursively until it's sorted each value on the left. After the left side's done, it recursively sorts the right side.

      Let's raid the songs off of Shmoop's modern pan flute playlist for an example. Here are the songs:

      • "Expensive Tuxedo"
      • "It's Only Shmoop"
      • "Motion Safari"
      • "Day Glo Battle"
      • "Attack of the Renegade"
       

      First we pick a pivot. "Motion Safari" is in the middle, so let's go with that one.

       

      Now we compare every value to "Motion Safari." If it comes first alphabetically, it moves to the left. If it comes after "Motion Safari" alphabetically, it shifts to the right. The end result will look like this:

       

      Hmm, maybe that wasn't the best choice of pivots: comes before of "Motion Safari" alphabetically. Even so, since we've compared everything to "Motion Safari," we know it's in the right place. Let's pick a pivot from the middle of the left side (compared to "Motion Safari"): "Day Glo Battle."

       

      After comparing it to everything on the left of "Motion Safari," we get something like this:

       

      So we know exactly where "Day Glo Battle" goes:

       

      Notice how "It's Only Shmoop" now comes before "Expensixe Tuxedo?" That's because Quick Sort's unstable (even though there are stable versions) meaning that it can't be bothered to remember how things were sorted in the last round.

      Since there's only one value to the left of "Day Glow Battle," we know the left must be sorted, so we pick a pivot from the right: "It's Only Shmoop." After comparing the two songs to the right of "Day Glo Battle" and the left of "Motion Safari," we've got a sorted list.

       

      Log25 +1 pivots later, and it's sorted.

      Boom.

      The first pivot showed exactly what's dangerous about Quick Sort: it's only quick quick if the pivot values belong near the middle of the list. In the worst case scenario, where we continually pick the largest and smallest values as pivots, we have to compare every value to every other value.

      That's not good.

      If we had picked "Expensive Tuxedo" (which happens to be the middle of the sorted list) in the first place, our sort would've gone more like this.

       

      Sorted:

       

      Then we place "Expensive Tuxedo," and pick a pivot to the left:

       

      Since we know everything to the left of "Expensive Tuxedo's larger than "Day Glo Battle," we only need to compare it to "Attack of the Renegade," and after that the left side's sorted.

       

      We do the same to the right:

       

      And, finding that it's less than "Motion Safari," we have our sorted list.

       

      Still log25 pivots, but it cut out one step, so that's something…it's a lot more impressive when we're dealing with huge lists, okay?

      When we pick good pivots, the run time's nlogn, but it does have the potential to run in n2 time, which is a little dangerous. That's why this sort has a simple way to pick pretty good pivots;

      1. Pick three things (whether numbers or tasteful pan flute music) from the list.
      2. Find the middle value from the three things. 
      3. That's it. That's your pivot.
      4. …?
      5. Take a well-deserved pizza break.

      The worst case for this optimization is that all three will be either the greatest or the smallest values in the list, meaning that our middle pivot will be the second to largest or the second to smallest value. It's possible, but that probability's a lot lower than randomly picking the largest or smallest value.

      If we blindly pick a pivot, the probability that we'll pick the largest or smallest value will be

      (1 / n) + (1 / n) = 2 / n

      Let's set n to our example (5) and a larger number (100):

      n = 5

      (1 / 5) + (1 / 5) = 2 / 5

      n = 100

      (1 / 100) + (1 / 100) = 2 / 100
      = 1 / 50

      If we pick three pivots, though, the probability's:

      n = 5

      (2 / 5) * (2 / 3) * (2 / 1) = 8 / 15

      n = 100

      (2 / 100) * (2 / 98) * (2 / 96) = 1 / 117600

      Picking the pivot's "worse" for n = 5 but better for n = 100. But keep in mind: since we're picking three of five values, getting the second highest value is one away from the middle, so it doesn't really matter until we hit those large values.

      Now that it's sorted, we can go back to listening to our playlist.