Sequences of operations

For many applications, algorithm's “input” might be not just data, but a sequence of operations performed by the client. For example, a pushdown stack where the client pushes $N$ values, then pops them all, may have quite different performance characteristics from one where the client issues an alternating sequence $N$ of push and pop operations. Our analysis has to take both situations into account (or to include a reasonable model of the sequence of operations).

A randomized algorithm flips coins during its execution to determine what to do next. When considering a randomized algorithm, we usually care about its expected worst-case performance, which is the average amount of time it takes on the worst input of a given size. This average is computed over all the possible outcomes of the coin flips during the execution of the algorithm. We may also ask for a high-probability bound, showing that the algorithm does not consume too much resources most of the time.

In studying randomized algorithms, we consider almost the same issues as for deterministic algorithms:

The main difference is that it is often easier to design a randomized algorithm, because the randomness turns out to be a good substitute for cleverness, but it is harder to analyze it. So, much of what one does is develop a good technique for analyzing the often very complex random processes that arise in the execution of an algorithm. In doing so we should use techniques already developed for probability and statistics.

Example 7.1   Suppose you have two copies of the same long binary file at two different locations: call them $F_1$ and $F_2$. Suppose a hacker has tried to disturb $F_1$. You would like to check whether the file is corrupted.

Solution. Obviously, this can be achieved by sending the first file to the other location and then comparing them. However, we would like to minimize the number of bits that are communicated. This can be done with random check-bits. Assuming that you have the same random number generator at both ends, you can do the following: Compute a random subset $S$ of the bits by tossing a coin for each bit. Then compute the $XOR$ of the bits of $F_i$ corresponding to $S$: call the result $c_i$. That is, $c_i = F_i \oplus S$, or $c_1 = F_1 \oplus S$, and $c_2 = F_2 \oplus S$. Obviously, if $F_i$ not corrupted then $c_1 = c_2$.

Lemma 7.2   If $F_1 \neq F_2$ then $c_1 \neq c_2$ 50% of time.

Proof. Consider the last bit in which the files differ: call it $b$. Let $u_1$ and $u_2$ be the calculations so far. If $u_1 = u_2$, then $c_1 \neq c_2$ iff $S$ includes $b$: which is a 50:50 event. However, if $u_1 \neq u_2$, then $c_1 \neq c_2$ iff $S$ does not include $b$, again a 50:50 event. $\Box$

Thus, if $F_1 \neq F_2$ there is a 50:50 chance that this random check-bit is different. Thus we have a typical Monte carlo algorithm. If you perform this computation 200 times, and each check bit matches, then there is only a 1 in $2^{200}$ chance that the files are different. 1 in $2^{200}$ is roughly 1 in $10^{66}$, and that chance is extremely less. For all practical purposes, we can safely say that the algorithm will give a correct result with a probability of 1. $\Box$

This reduces the communication Complexity, because otherwise it requires to send $n$ bits to the other side.

Simplicity: This is the first and foremost reason for using randomized algorithms. There are numerous examples where an easy randomized algorithm can match (or even surpass) the performance of a deterministic algorithm.

Example 7.3   The Merge-sort algorithm is asymptotically the best deterministic algorithm. It is not too hard to describe. However, the same asymptotic running time can be achieved by the simple randomized QuickSort algorithm. The algorithm picks a random element as a pivot and partitions the rest of the elements: those smaller than the pivot and those bigger than the pivot. Recurse on these two partitions. $\Box$

Speed: For some problems, the best known randomized algorithms are faster than the best known deterministic algorithms. This is achieved by requiring that the correct answer be found only with high probability or that the algorithm should run in expected polynomial time. This means that the randomized algorithm may not find a correct answer in some cases or may take a very long time.

Example 7.4   Checking if a multi-variate polynomial is the zero polynomial. There is no known deterministic P-time algorithm that tells if the given multi-variate polynomial is the zero polynomial. On the other hand, there is a very simple and efficient randomized algorithm for this problem: just evaluate the given polynomial at random points. Note that such a polynomial could be represented implicitly. e.g., as a determinant of a matrix whose entries contain different variables (see figure [*]).

Figure: Polynomial curve.
\includegraphics[scale=0.35]{ranvar}

It is important to note that a randomized algorithm is not the same thing as probabilistic analysis. In probabilistic analysis, (deterministic) algorithms are analyzed assuming random input, e.g., in sorting one might analyze the deterministic Quicksort (with fixed pivot) on the input which is a random permutation of $n$ elements.

On the other hand, randomized algorithms use random coin flips for their execution, which amounts to picking a deterministic algorithm from a suite of algorithms. Moreover, randomized algorithms give guarantees for the worst case input.