Nombre de comparaisons entre symboles dans QuickSort et QuickSelect

Brigitte Vallée (GREYC)

November 25, 2008

We revisit the famous QuickSort algorithm and its derivative, QuickSelect. Although the analyses of these algorithms are well-established, we argue, following Sedgewick, that their simplifying assumptions are unsatisfactory, both from the information-theoretic viewpoint and from the perspective of algorithmic engineering. Our complexity model fully takes into account the elementary comparisons between symbols that compose records to be sorted, while our probabilistic models comprise a wide category of information sources that encompass memoryless (i.e., independent-symbol), Markov, as well as many unbounded-correlation sources. Under this perspective, commonly accepted assertions, such as “the complexity of Quicksort is O ( n log n )” are to be challenged, and the relative merits of methods relying on different principles (e.g., radix-based versus comparison-based) can be precisely assessed. Our results build upon and broadly extend earlier ones of Fill, Janson, and Nakama that are specific to the model of uniform independent bits.

A novel idea, in this range of problems, is the introduction of fundamental probabilities that a word of the source begins with a given prefix. Our approach otherwise relies on methods from analytic combinatorics (Mellin transforms, complex asymptotics), as well as information theory and the theory of dynamical systems, this in a way that relates to earlier analyses of digital trees (“tries”) by Clément, Flajolet, and Vallée. We regard the present work as a first step towards revisiting the analysis of algorithms and data structures in a direction that should be both theoretically satisfactory – by developing a new kind of bit-level boolean-complexity-like average-case analysis – and practically meaningful – by taking into account “long” non- atomic records in data structures.

In collaboration with Julien Clément, Jim Allen Fill and Philippe Flajolet.