Hacked Exam

Code Jam 2021 round 1A’s problem C. “Hacked Exam”. Now this one was painful.

It was so painful it’s the primary, dare I say only reason there’s such a pause in my Code Jam postings. It was so painful I had to take a peek at the analysis. Then read it in full. Then implement it all. Then flip a table or two and give it a rest. Then come back to it a few times. Then re-write key parts of it. And again for performance. And check out a few contest solutions for inspiration. Then bite the bullet and fall back on the linear-time solution.

I’m not so good at probabilities in competitive programming.

So, fair warning, this post is mostly a re-wording of the analysis. But it’s the re-wording I finally (mostly) came to accept and understand, so it may help. And it comes with a solution in Haskell, which isn’t something you’re going to find on the website.¹

So what’s the fuss all about? We’re cheating at an exam. An exam is a bitstring of length Q. An exam attempt² is a bitstring of length Q. An attempt’s score is the number of bits it has in common with the exam. For a given exam, we’re given one to three attempts and their score, and are asked for an attempt and its expected score. Maximized.

Let’s start out with a few types.

type Exam = [Bool]
type Score = Int
type Exp = Rational

data Attempt a = Attempt
  { attemptScore :: !a
  , attemptAnswers :: [Bool]
  }

I’m making Attempt parametric so its score can either be an integer, for the input, or a rational, for the output.

Ok. The small test set is ten questions, two attempts. We can literally generate all possible 1024 exams, the “universe”, keep those who are compatible with the two attempts we have, then identify the most likely answer per question.

small :: [Attempt Score] -> Attempt Exp
small as = foldMap computeAnswer (transpose feasibleExams)
  where
    universe :: [Exam]
    universe = mapM (const [False,True]) (attemptAnswers (head as))

    feasibleExams :: [Exam]
    feasibleExams = filter (\truth -> all (isFeasible truth) as) universe

    isFeasible :: Exam -> Attempt Score -> Bool
    isFeasible truth attempt =
      evaluate truth (attemptAnswers attempt) == attemptScore attempt

    evaluate :: Exam -> Exam -> Score
    evaluate = ((genericLength . filter (uncurry (==))) .) . zip

    computeAnswer :: [Bool] -> Attempt Exp
    computeAnswer bs | 2*t >= l   = Attempt (  t/l) [True]
                     | otherwise = Attempt (1-t/l) [False]
      where t = genericLength (filter id bs)
            l = genericLength bs

The most likely answer per question is simply the one that is most frequently found in the exams.

Attempts are generated using a monoid instance defined as such:

instance Num a => Semigroup (Attempt a) where
  Attempt s1 a1 <> Attempt s2 a2 = Attempt (s1+s2) (a1++a2)
instance Num a => Monoid (Attempt a) where
  mempty = Attempt 0 []

For the medium test case, we can have up to 40 questions, so 2⁴⁰ ≈ 10¹² possible tests. Enumeration won’t complete in time.

The key observations is that even though the number of questions increased, they still form classes: there are only four possible combinations the two provided attempts could have answered a question. So all questions fall into one of four groups, and within a group they are indistinguishable, so must have the same probability of the correct answer being true or false.

That’s four unknown probabilities. But we only have two equations: the attempts’ binding to their score. The other insight is that a question’s correct value from the exam is bound to an answer pair just as indistinguishably as above. For example, a question to which both attempts answered true has the same probability of actually being true as a question to which both answered false has of being false. Conversely a question to which both answered differently has the same bias towards the first attempt being correct as one to which both also answered dirrerently, but with the opposite absolute answer. (This aspect of the problem took some time to sink in.)

Using the problem analysis’s notation, we call p a question’s probability of being true, and subdivide into p_TT its probability of being true when both attempts answered true, p_TF when the first attempt answered true and the second false, and so on. The symmetries described in the previous paragraph can then be summarized as p_TT = 1 − p_FF and p_TF = 1 − p_FT.³

To aggregate the score from attempts, I’ll whip up a little auxiliary record type:⁴

data Medium = M !Int {-^ const -} !Int {-^ TT -} !Int {-^ TF -}
instance Semigroup Medium where M a b c <> M d e f = M (a+d) (b+e) (c+f)
instance Monoid Medium where mempty = M 0 0 0

As it says on the tin, it holds a constant term, a count of TTs encountered, and a count of TFs encountered. The constant term is there so I can replace TFs with 1-FTs on the fly.

medium :: [Attempt Score] -> Attempt Exp
medium [x] = medium [x,x]
medium [Attempt s1 as1,Attempt s2 as2] = answer
  where
    (M k1 stt1 stf1,M k2 stt2 stf2,answer) = mconcat $ zipWith combine as1 as2

Our combination function is where most of the ~~magic~~tedium lies. Brace yourselves.

    combine True  True  = (M 0 1 0,M 0 1   0, if pTT >= 0.5 then Attempt pTT [True] else Attempt (1 - pTT) [False])
    combine True  False = (M 0 0 1,M 1 0 (-1),if pTF >= 0.5 then Attempt pTF [True] else Attempt (1 - pTF) [False])
    combine False True  = (M 0 0 1,M 1 0 (-1),if pTF <  0.5 then Attempt (1 - pTF) [True] else Attempt pTF [False])
    combine False False = (M 0 1 0,M 0 1   0, if pTT <  0.5 then Attempt (1 - pTT) [True] else Attempt pTT [False])

It’s verbose, but it’s just what we said: convert FFs and FTs by symmetry and sum probabilities. Since this is Haskell, we can share the traversal to also build up the attempt: we know the question type, we know its probability of being true or false, so we pick the most likely and accumulate the probability of being right to our score’s expected value.

No, wait, we don’t know its probability of being true or false yet. We still need to actually solve the equation system. I was lazy, I used Cramer’s rule, which I won’t reproduce here. We’re using rationals, we won’t run into any numerical stability issues today.

    (pTT,pTF) = cramer2 (fromIntegral stt1) (fromIntegral stf1) (fromIntegral (s1-k1))
                        (fromIntegral stt2) (fromIntegral stf2) (fromIntegral (s2-k2))

So far, so good. Now for the large test set.

The major difference is that we now have three scored attempts at our disposal. We also have up to 120 questions per test, but with the current approach that oughtn’t change much of anything. Or ought it?

Let’s start by counting the questions per type. Sorry, per half-type, aggregating symmetries.

large :: [Attempt Score] -> Attempt Exp
large [x] = large [x,x,x]
large [x,y] = large [x,x,y]
large [Attempt s1 as1,Attempt s2 as2,Attempt s3 as3] =
  let
    tally :: Vector Int
    tally = V.accum (+) (V.replicate 4 0)
            [ (2*fromEnum (a2 /= a1) + fromEnum (a3 /= a1),1)
            | a1 <- as1 | a2 <- as2 | a3 <- as3 ]

tally is a vector of length 4, whose elements are the number of questions of types FFF/TTT, FFT/TTF, FTF/TFT, FTT/TFF respectively.

We can split the population of exams along the number of correct answers for questions a given type, giving rise to C_n^k distinct sub-exams. Now the same magic as for the medium set: knowing the number n of questions of a certain type, we can constrain its k to be between 0 and n; but choosing a specific k for any question type enforces specific ks for all other question types. It’s just a different phrasing of the three scores’ formulae, except this time we’re using question counts, which are enumerable, instead of probabilities.

But the core idea is the same: the score of attempt 2 (attempt 1 would be boring) is the sum of the ks of the question types where attempts 1 and 2 answer the same, plus the complements to n of the ks of the question types where attempts 1 and 2 answer differently. Repeat for attempts 1 and 3, we get three independent equations. Here’s my function to generate the full vector of ks from the first one:

    nsFrom0 :: Int -> Maybe (Vector Int)
    nsFrom0 n0 = do
      let twoN1 = s1 + s2 - 2*n0 - tally!2 - tally!3
      guard (even twoN1)
      guard (twoN1 >= 0)
      guard (twoN1 <= 2 * tally!1)
      let twoN2 = s1 + s3 - 2*n0 - tally!1 - tally!3
      guard (even twoN2)
      guard (twoN2 >= 0)
      guard (twoN2 <= 2 * tally!2)
      let twoN3 = s1 - s2 + tally!2 + tally!3 - twoN2
      guard (even twoN3)
      guard (twoN3 >= 0)
      guard (twoN3 <= 2 * tally!3)
      pure $ V.fromList [n0, twoN1 `div` 2, twoN2 `div` 2, twoN3 `div` 2]

We have all feasible exam structures. So how do we compute their payoffs and construct a satisfactory attempt out of them?

    construct :: Vector (Vector Int) -> Attempt Exp
    construct kss =

We’ll work by question type. Given a question’s k, we know it awarded k points to attempts that put a T there and n − k to those that put an F instead. So a question’s score is the average value of those per-k awards, weighted by the structure’s frequency within the feasible exams. Let’s start easy: computing those weights.

      let
        -- | number of exams with a given structure (count of Ts per
        -- question type)
        structureCounts :: Vector Integer
        structureCounts = V.product . V.zipWith ncr tally <$> kss

Next I’ll write the function to weight a question type’s awards.

        -- | structure count -> k -> (score of T,score of F) per question type
        weightedAwards :: Integer -> Vector Int -> Vector (Sum Integer,Sum Integer)
        weightedAwards c = V.zipWith (\n k -> ( Sum (c * fromIntegral (n-k))
                                              , Sum (c * fromIntegral k) )) tally

And actually aggregate:

        -- | (score of T,score of F) per exam structure
        typeAwards :: Vector (Sum Integer,Sum Integer)
        typeAwards = V.foldl1' (V.zipWith (<>)) $
                     V.zipWith weightedAwards structureCounts kss

This enables us (1) to compute the expected score, by always picking the answer with the biggest payoff.

        -- | sum of scores over feasible exams
        typeScores = sum $ uncurry max <$> typeAwards

        -- | expected score
        score = sum typeScores % sum structureCounts

And (2) to generate an attempt bitstring. We’ll first compute the best answer per question.

        -- | best choice per answer
        typeAnswers = uncurry (<) <$> typeAwards

This one might need a bit of specifying. We’re computing a “desymmetrised” answer, so we get choose the meaning of the bit as meaning, in effect, “is the correct answer what attempt 1 answered?” Our aggregated tables are all folded such that attempt 1 consistently answered false.⁵ So given a (score of T,score of F) pair, we want to answer true if the best answer is false. Which simplifies to “true if T awards less than F”.

With that out of the way, we can generate our attempt’s bitstring by re-identifying the question types and looking up the correct choice.

        -- | fetch an answer per type and unfold symmetry
        genAnswer a1 a2 a3 =
          typeAnswers ! (2*fromEnum (a2 /= a1)+fromEnum (a3 /= a1)) == a1

        -- | the attempt's bitstring
        answers = zipWith3 genAnswer as1 as2 as3

We may now construct the much awaited Attempt record.

    ( let score :: Exp
          answers :: [Bool]
    ) in Attempt score answers

And thread it all together for a large test set solver:

( let tally :: Vector Int
      nsFrom0 :: Int -> Maybe (Vector Int)
      construct :: Vector (Vector Int) -> Attempt Exp
) in construct $ V.mapMaybe nsFrom0 $ V.enumFromTo 0 (tally!0)

Whew.

Well I’m glad this one is over. Maybe I can get back to a normal posting schedule at last. As usual, the full code is on github.

Either.↩︎
I’m deliberately avoiding the problem statement’s vocabulary, that requires distinction between answers and correct answers.↩︎
It seems more natural when summarized this way then when trying to phrase it. And that’s… dangerous.↩︎
I wish Haskell had support for lexically-scoped types.↩︎
False, not wrong. (Object, not predicative.)↩︎