A general framework for the realistic analysis of sorting and searching algorithms. Application for some popular algorithms

Thu Hien Ngyuen Thi (GREYC, Caen)

March 26, 2013

As there are some algorithms which deal with keys (viewed as a whole structure) and some others which deal with symbols, it is not easy to compare their efficiency. Furthermore, since in most real-life situations a key is complex and cannot be reduced to a single machine word, it is not realistic to consider the comparison between two keys as a unitary comparison. That is why Sedgewick proposed in 1998 to unify the analyses of basic sorting and searching algorithms by studying on the average the number of symbol comparisons between keys (seen as words). First results are due to Fill and Janson, Fill and Nakama who analyzed QuickSort and QuickSelect, but only on data composed of random uniform bits. Then, Clément, Fill, Flajolet and Vallée extend the previous analyses when data are words emitted by a general source.

In the present work, we draw a general framework for such realistic analyses: we provide an “automatic” method which takes as an input the probability of comparing two random words for a given algorithm and computes the exact formula for the average number of symbol comparisons. This method is applied to three popular algorithms: Insertion Sort, Bubble Sort and Minimum Selection. We also recover the results of QuickSort and QuickSelect. The results are the following: with respect to key comparisons, the average-case complexity of QuickSort is asymptotic to 2 n log n , Insertion Sort to 2 n , BubbleSort to n 2/2, QuickMin to 2 n , Selection Min to n -1. With respect to the number of symbol comparisons, the complexity becomes of respective order Θ( n log 2 n ), Θ( n 2), Θ( n 2 log n ) for sorting algorithms and remains Θ( n ) for Selection algorithms. We further discuss the change in the complexity measure of the algorithms, from the number of key comparisons to the number of symbol comparisons and also the different constants arising in the analysis reflecting the mixing between algorithms and the way data are produced.

Joint work with Julien Clément and Brigitte Vallée.