Sorting and Selection by Random Sampling

Some of the earliest randomized algorithms included algorithms for sorting a set $(S)$ of numbers, and the related problem of finding the $k$ th smallest element in $S$ . The main idea behind these algorithms is the use of random sampling: a randomly chosen member of $S$ is unlikely to be one of its largest or smallest elements; rather, it is likely to be “near the middle.” Extending this intuition suggests that a random sample of elements from $S$ is likely to be spread “roughly uniformly” in $S$ . We now describe randomized algorithms for sorting and selection based on these ideas.

: Algorithm RQS:
: Input: A set of numbers .
: Output: The elements of sorted in increasing order.
1.: Choose an element y uniformly at random from : every element in has equal probability of being chosen.
2.: By comparing each element of with , determine the set of elements smaller than and the set of elements larger than .
3.: Recursively sort and . Output the sorted version of , followed by , and then the sorted version of .

Algorithm RQS is an example of a randomized algorithm-an algorithm that makes random choices during execution. It is inspired by the Quicksort algorithm. We assume that the random choice in Step 1 can be made in unit time. What can we prove about the running time of RQS?

We now analyze the expected number of comparisons in an execution of RQS. Comparisons are performed in Step 2, in which we compare a randomly chosen element to the remaining elements. For $1 \leq i \leq n$ , let $S_{(i)}$ denote the element of rank $i$ (the $i$ th smallest element) in the set $S$ . Define the random variable $X_{ij}$ to assume the value $1$ if $S_{(i)}$ and $S_{(j)}$ are compared in an execution, and the value 0 otherwise. Thus, the total number of comparisons is

$\displaystyle \sum_{i=1}^n\sum_{j>i}X_{ij}.$

(8.9)

By linearity of expectation the expected number of comparisons is

$\displaystyle E[\sum_{i=1}^n\sum_{j>1}X_{ij}] = \sum_{i=1}^n\sum_{j>1}E[X_{ij}].$

(8.10)