Quick-Sort Schemes

Any student of algorithms knows that there are plenty of algorithms for sorting an array. Many algorithms have their use cases: some are fast, some are easy to implement, and some achieve other qualities such as stability. Even when we talk about speed, there might be nuances to this quality: an algorithm might be fast in sorting an array in general does not mean it is fast in sorting an array with many duplicate values. This all explains the wealth of algorithms that simply sort.

But Quick Sort, or qsort for short, is a unique algorithm. Qsort proved to be a well-suited algorithm for general-purpose sorting. Nowadays, many implementations of sorting in programming language standard libraries use qsort for sorting in some fashion. The C++ standard library in Clang and the standard library in go, for example, both use qsort for sorting an array.

There has been many different implementations of qsort. One way of implementing qsort is to provide a cutoff value \(c\) such that if the array size to be sorted in the recursive routine is less than \(c\), then a different technique is used such as insertion sort or shell sort (go uses shell sort for arrays smaller than 12). Another way of implementation qsort is using 3-way-qsort, which after selecting the pivot, the array is divided 3-ways instead of 2: smaller than the pivot, equal to the pivot, and greater than the pivot. 3-way-qsort performs better when then array has many duplicates. But when I am interested in here is the classical qsort, but different approaches in selecting the pivot and partitioning: Lomuto scheme and Hoare scheme.

Lomuto scheme is the simplest: select the last element (or the first element) and that should be the pivot. Hoare scheme is slightly more complicated: the algorithm runs in both directions of the array, and swaps the elements from the right and from the left if they are on the opposite side of the pivot, until the two points meet.

I wrote the implementations in github. The test ran 100 times for arrays of size 1000, and 100 times for arrays of size 10000. These arrays are randomly generated (I expect no patterns). The expectation is that hoare will be faster on average, because that’s what the theory says.o

But to my surprised, Hoare’s is much faster than Lomuto. The result is posted here, and from what I see is that Hoare takes about 8 percent of the time Lomuto takes for 1000 arrays, and 20 percent for 10000 arrays. This is surprising because I thought initially that the difference would not be much. However, I suspect that the difference will diminish as the size of the array becomes larger, since both implementations are still \(O(n \log n)\).

It is important to note that both implementations are slow when the array is sorted or almost sorted. Other implementations as mentioned above might do better on average and in sorted or semi-sorted arrays than these two implementations. Unless you have special cases, use the standard library sorting algorithms because they are proven to work effectively for the general case. Below is the golang implementation. Golang uses shell sort for small slices. For the pivot, if the slice is small, it calculates the ninther median, otherwise it uses Hoare’s scheme. It also makes some consideration for duplicates in the array.

func quickSort(data Interface, a, b, maxDepth int) {
	for b-a > 12 { // Use ShellSort for slices <= 12 elements
		if maxDepth == 0 {
			heapSort(data, a, b)
		mlo, mhi := doPivot(data, a, b)
		// Avoiding recursion on the larger subproblem guarantees
		// a stack depth of at most lg(b-a).
		if mlo-a < b-mhi {
			quickSort(data, a, mlo, maxDepth)
			a = mhi // i.e., quickSort(data, mhi, b)
		} else {
			quickSort(data, mhi, b, maxDepth)
			b = mlo // i.e., quickSort(data, a, mlo)
	if b-a > 1 {
		// Do ShellSort pass with gap 6
		// It could be written in this simplified form cause b-a <= 12
		for i := a + 6; i < b; i++ {
			if data.Less(i, i-6) {
				data.Swap(i, i-6)
		insertionSort(data, a, b)

func doPivot(data Interface, lo, hi int) (midlo, midhi int) {
	m := int(uint(lo+hi) >> 1) // Written like this to avoid integer overflow.
	if hi-lo > 40 {
		// Tukey's ``Ninther,'' median of three medians of three.
		s := (hi - lo) / 8
		medianOfThree(data, lo, lo+s, lo+2*s)
		medianOfThree(data, m, m-s, m+s)
		medianOfThree(data, hi-1, hi-1-s, hi-1-2*s)
	medianOfThree(data, lo, m, hi-1)

	// Invariants are:
	//	data[lo] = pivot (set up by ChoosePivot)
	//	data[lo < i < a] < pivot
	//	data[a <= i < b] <= pivot
	//	data[b <= i < c] unexamined
	//	data[c <= i < hi-1] > pivot
	//	data[hi-1] >= pivot
	pivot := lo
	a, c := lo+1, hi-1

	for ; a < c && data.Less(a, pivot); a++ {
	b := a
	for {
		for ; b < c && !data.Less(pivot, b); b++ { // data[b] <= pivot
		for ; b < c && data.Less(pivot, c-1); c-- { // data[c-1] > pivot
		if b >= c {
		// data[b] > pivot; data[c-1] <= pivot
		data.Swap(b, c-1)
	// If hi-c<3 then there are duplicates (by property of median of nine).
	// Let be a bit more conservative, and set border to 5.
	protect := hi-c < 5
	if !protect && hi-c < (hi-lo)/4 {
		// Lets test some points for equality to pivot
		dups := 0
		if !data.Less(pivot, hi-1) { // data[hi-1] = pivot
			data.Swap(c, hi-1)
		if !data.Less(b-1, pivot) { // data[b-1] = pivot
		// m-lo = (hi-lo)/2 > 6
		// b-lo > (hi-lo)*3/4-1 > 8
		// ==> m < b ==> data[m] <= pivot
		if !data.Less(m, pivot) { // data[m] = pivot
			data.Swap(m, b-1)
		// if at least 2 points are equal to pivot, assume skewed distribution
		protect = dups > 1
	if protect {
		// Protect against a lot of duplicates
		// Add invariant:
		//	data[a <= i < b] unexamined
		//	data[b <= i < c] = pivot
		for {
			for ; a < b && !data.Less(b-1, pivot); b-- { // data[b] == pivot
			for ; a < b && data.Less(a, pivot); a++ { // data[a] < pivot
			if a >= b {
			// data[a] == pivot; data[b-1] < pivot
			data.Swap(a, b-1)
	// Swap pivot into middle
	data.Swap(pivot, b-1)
	return b - 1, c