# Probability Questions

Let $0 < \alpha < .5$ be some constant (independent of the input array length $n$). Recall the Partition subroutine employed by the QuickSort algorithm, as explained in lecture. What is the probability that, with a randomly chosen pivot element, the Partition subroutine produces a split in which the size of the smaller of the two subarrays is $\geq \alpha$ times the size of the original array?

• $1 - 2 * \alpha$
• $\alpha$
• $1 - \alpha$
• $2 - 2 * \alpha$

ANSWER: If the picked pivot is one of the smallest $n\alpha$ elements in the array, then the left partition’s size will be less than $n\alpha$. Similarly, if the picked pivot is one of the largest $n\alpha$ elements in the array, the right partition’s size will be less than $n\alpha$. Hence, there are $n - 2 * n\alpha$ elements that we can pick such that both partitions have size greater than or equal to $n\alpha$. Since the algorithm picks a pivot randomly, all elements have an equal probability of getting picked as the pivot, i.e. $\frac{1}{n}$. Therefore, $Pr = \frac{1}{n}(n - 2n\alpha) = 1 - 2\alpha$. Option 1 is correct.

Now assume that you achieve the approximately balanced splits above in every recursive call — that is, assume that whenever a recursive call is given an array of length $k$, then each of its two recursive calls is passed a subarray with length between $k\alpha$ and $k(1 - \alpha)$ (where $\alpha$ is a fixed constant strictly between $0$ and $.5$). How many recursive calls can occur before you hit the base case? Equivalently, which levels of the recursion tree can contain leaves? Express your answer as a range of possible numbers $d$, from the minimum to the maximum number of recursive calls that might be needed.

• $-\frac{\log n}{\log \alpha} \leq d \leq -\frac{\log n}{\log(1 - \alpha)}$
• $0 \leq d \leq -\frac{\log n}{\log \alpha}$
• $-\frac{\log n}{\log(1 - \alpha)} \leq d \leq -\frac{\log n}{\log \alpha}$
• $-\frac{\log n}{\log(1 - 2\alpha)} \leq d \leq -\frac{\log n}{\log(1 - \alpha)}$

ANSWER: If $\alpha < \frac{1}{2}$, then $1 - \alpha > \alpha$, thus the branch with subarray length $n(1 - \alpha)$ will have greater recursion depth.

If $d$ is the height of the recursion tree, $n(1 - \alpha)^d = 1$
Taking $log_{1 - \alpha}$ on both sides $log_{1 - \alpha} n= -d$
By log rule $d = -\frac{\log n}{\log(1 - \alpha)}$

The best case is $-\frac{\log n}{\log \alpha}$. Since both $\alpha$ and $(1 - \alpha)$ are less than $1$, so, their logarithms are negative. Adding one more negative symbol on that results in an overall positive value for minimum and maximum depth. Therefore, option 1 is correct.

Define the recursion depth of QuickSort to be the maximum number of successive recursive calls before it hits the base case — equivalently, the number of the last level of the corresponding recursion tree. Note that the recursion depth is a random variable, which depends on which pivots get chosen. What is the minimum-possible and maximum-possible recursion depth of QuickSort, respectively?

• $Minimum: \Theta(\log n); Maximum: \Theta(n)$
• $Minimum: \Theta(\log n); Maximum: \Theta(n\log n)$
• $Minimum: \Theta(1); Maximum: \Theta(n)$
• $Minimum: \Theta(\sqrt{n}); Maximum: \Theta(n)$

Consider a group of k people. Assume that each person’s birthday is drawn uniformly at random from the $365$ possibilities. (And ignore leap years.) What is the smallest value of k such that the expected number of pairs of distinct people with the same birthday is at least one?

[Hint: define an indicator random variable for each ordered pair of people. Use linearity of expectation.]

• $27$
• $28$
• $366$
• $20$
• $23$

ANSWER: For each pair of $k$ people in the room, let $X$ be an indicator random variable such that: $X = \begin{cases} 1 & \text{if the pair has the same birthday} \\ 0 & \text{otherwise} \end{cases}$

$E[X] = \sum_{i=1}^{k-1}\sum_{j=i+1}^{k} X_{ij}Pr[\text{i and j have the same birthday}]$ $Pr[\text{i and j have unique birthdays}] = 365/365 * 364/365$ ($i$ may have been born on any of the $365$ days, and $j$ on any of the remaining $364$ days). \therefore Pr[\text{i and j have the same birthday}] = 1 - \frac{364}{365} = \frac{1}{365} \\ \begin{equation*} \begin{aligned} E[X] & = \frac{1}{365} * \sum_{i=1}^{k-1}\sum_{j=i+1}^{k} X_{ij} \\ & = \frac{1}{365} * \sum_{i=1}^{k-1} (k - i - 1 + 1) \\ & = \frac{1}{365} * \sum_{i=1}^{k-1} (k - i) \\ & = \frac{1}{365} * (\sum_{i=1}^{k-1} k - \mathop{\sum_{i=1}^{k-1}} i) \\ & = \frac{1}{365} * (k(k - 1) - (1 + 2 +...+ k - 1)) \\ & = \frac{1}{365} * (k(k - 1) - \frac{k(k - 1)}{2}) \\ & = \frac{k(k - 1)}{(365 * 2)} \end{aligned} \end{equation*}

If $E[X] = 1, k * (k - 1) = 365 * 2$
Rewriting as a general quadratic equation $ax^2 + bx + c = 0$, for which $x$ is given by $\frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$
k^2 -k -730 = 0, a = 1, b = -1, c= -730 \\ \begin{equation*} \begin{aligned} k & = 1 \pm \frac{\sqrt{1 + 2920}}{2} \\ & = 1 \pm \frac{54.05}{2} \\ & = 1 \pm 27.02 \end{aligned} \end{equation*}

Since the number of people cannot be negative, $k \approx 28$. Option 2 is correct.

Let $X{1}, X{2}, X{3}$ denote the outcomes of three rolls of a six-sided die. (i.e., each $X{i}$ is uniformly distributed among $1,2,3,4,5,6$, and by assumption they are independent.) Let $Y$ denote the product of $X{1} \text{ and } X{2}$ and $Z$ the product of $X{1} \text{ and } X{3}$. Which of the following statements is correct?

• Y and Z are independent, but $E[YZ] \neq E[Y]*E[Z]$
• Y and Z are not independent, but $E[YZ] = E[Y]*E[Z]$
• Y and Z are not independent, but $E[YZ] \neq E[Y]*E[Z]$
• Y and Z are independent, but $E[YZ] \neq E[Y]*E[Z]$

ANSWER: For $Y$ and $Z$ to be independent, by definition, the two events $[Y = x1]$ and $[Z = x2]$ must be independent $\forall \text{ x1, x2}$. If we choose $x1 = x2 = 1$, then $Pr[Y = 1 \text{ AND } Z = 1]$ is the probability that the outcome of all three rolls was one. Since each roll is independent, this is equal to ${\frac{1}{6}}^3$. $Pr[Y = 1]Pr[Z = 1] = \frac{1}{36} * \frac{1}{36} = {\frac{1}{6}}^4$

Clearly, $Pr[Y = 1 \text{ AND } Z = 1] \neq Pr[Y = 1] * Pr[Z = 1]$, therefore, $Y$ and $Z$ are not independent.

Covariance reference: Expectations on the product of two dependent random variables

\begin{equation*} \begin{aligned} Cov[X,Y] &= E[(X - E[X]) \cdot (Y − E[Y])] \\ &= E[X \cdot Y] - E[X \cdot E[Y]] - E[Y \cdot E[X]] + E[E[X] \cdot E[Y]] \\ &= E[X \cdot Y] - E[X] \cdot E[Y] \\ \therefore E[XY] &= Cov[X,Y] + E[X]E[Y] \\ E[YZ] &= Cov[Y,Z] + E[Y]E[Z] \text{ (*)} \\ E[Y] &= \frac{1}{36} * (1 * (1 + 2 +... + 6) + 2 * (1 + 2 + ... + 6) \\ & + ... + 6 * (1 + 2 + ... + 6)) \\ &= \frac{1}{36} * (1 + 2 + ... + 6)^2 \\ &= \frac{1}{36} * 21^2 \\ &= 12.25 \end{aligned} \end{equation*}

Clearly, $E[Z]$ is the same.

$Cov(Y, Z) = \sum_{i=1}^{6}(y_{i} - E(Y))(z_{i} - E(Z))$
$y{i} \in \{1*1, 1*2,...,6*1,...,6*6\}$ and clearly $\neq E(Y)$. Similarly, $z_{i} \neq E(Z)$. Therefore, $Cov(Y, Z) \neq 0$, and from expression $(*), E[YZ] \neq E[Y]E[Z]$. Option 3 is correct.