Cheating Detection

The final problem in this year’s qualifier, Cheating Detection, is really more of a statistics than a programming puzzle. It’s also the one I didn’t have time to check out during the contest.¹ Nevertheless, let’s tackle it with an open mind.

One hundred contestants answer the same ten thousand questions. They get each one either right or wrong. So we can visualize the entire input set quite easily with a simple two-dimensional bitmap.

It’s apparent that there is some form of structure here: there is a grid of horizontal and vertical patterns of sorts.² This is explained by a careful reading of the statement: each contestant as well as each question has a specific skill rating—a real number between -3 and 3. The probability of a contestant answering a question correctly is given by the sigmoid function applied to the difference of the contestant’s skill to the question’s.

So vertical black lines represent hard questions; horizontal black lines represent weak contestants. You can figure out what the white lines mean.

We can estimate how the questions rate with regard to each other by simply tallying their success rate over the contestant set and sorting them. Conversely, we can estimate how skilled the contestants are by tallying their success rate over the question set and sorting them.

The easy questions are to the left: they were answered successfully most of the time. The better contestants are to the bottom: they answered successfully most of the time. There’s still a random aspect: the border between black and white isn’t too clear.

Now a cheater happens to lurk in this data. But the cheater is devious: to throw the jury off scent, they didn’t cheat for all of the questions, merely for about half of them, selected at random by an independent coin flip per question. Can you spot the cheater in the above data?

Let’s zoom a bit to have distinguishable contestants.

[Noisy diagonal white/black split square with an anomaly]

There’s a crisp line around the 70% mark. An anomaly. That’s our cheater. Still, their line has more white to it than the line before, and less than the line after: that’s why it ended up there in the first place. But it definitely doesn’t appear “in place”.

Applying a simple edge detection convolution³ makes it much more apparent.

[Black square with a broad noisy gray diagonal, and a horizontal line with a few very distinctly bright pixels]

OK, let’s implement.

I’ll read and store the contest statistics as a contestant-indexed Vector of question-indexed Vectors of successes represented as a boolean stored in an Int.

readData :: IO (Vector (Vector Int))
readData = V.replicateM 100 $ V.fromList . map digitToInt <$> getLine

Now to identify the cheater. I’ll first rank the players by their success rate.

idCheater :: Vector (Vector Int) -> Int
idCheater fs = cheater
  where
    pEase = V.sum <$> fs
    pIndex = V.fromList $ map fst $ sortOn snd $ V.toList $ V.indexed pEase

Then rank the questions by their success rate.

    qEase = V.foldl1' vAdd fs
    qIndex = V.fromList $ map fst $ sortOn snd $ V.toList $ V.indexed qEase

Note I didn’t bother to reverse the sort; we’re getting easier first as opposed to the pictures above. It doesn’t change a thing for the rest of the process, so I’m aiming for simplicity.⁴

We reorder the data to (virtually) get the rectangular diagonal split.

    ranked = for pIndex $ \i ->
             for qIndex $ \j ->
             fs ! i ! j

Now we “zoom”, i.e. aggregate nearby questions by batches of 100, to get the square diagonal distribution with a horizontal line.

    zoom = for ranked $ \qs ->
           V.map V.sum $
           V.unfoldrN 100 (Just . V.splitAt 100) qs

Yeah, the GCJ platform is so old it doesn’t have V.unfoldrExactN. Moving on.

Edge detection:

    edges = ifor zoom $ \i z ->
            fmap ((^3) . abs) $
            vAvg $ catMaybes
              [ (z `vSub`) <$> zoom !? (i-1)
              , (z `vSub`) <$> zoom !? (i+1)
              ]

The slight complication here is to account for borders: we do want them tested because there is a higher chance of the cheater ranking best, but we don’t have a further contestant to compare to. So we just express the computation as a sum of neighborwise differences, and average to keep the order of magnitude the same.

It took a bit of trial and error to find an appropriate value for N.

N	Success Rate
1	88,6%
2	91,0%
3	93,8%
4	92,0%
∞	83,2%

So mostly anything passes as long as it doesn’t get out-of-handedly big. I might as well stick with 3 now I’ve identified it. I have no idea if there’s a solid explanation behind that. Or if it could get even better with fractionals.

Anyway, it seems robust enough. Let’s submit.

[Five practice attempts, all correct in TS1 and wrong in TS2]

Infuriating. And doubly so, too.

It takes three to four minutes per attempt to judge.
The input set is very precisely specified. My benchmarking generated tests that followed the spec to the letter. My code had acceptable success rates on such test sets, by a margin. I can’t think of a good reason for it to fail on the platform.

Oh well. Bear with me while I go take a peek at the problem analysis.

That was short.

Take-away 1: the test set is actually static. So there’s no need to mass-submit hoping for a luckier draw.

Take-away 2: I fail at 8 cases, which is exactly 1 too many to pass. 50 tests make for 2% success per test, it is sensitive. I’m tempted to call that some form of bad meta-luck.

Take-away 3: My edge detection is probably not the perfect approach, statistics-wise. But intuitively, the inversion count metric seems worse to me! I’ll really have to find the time to refresh my statistics. Or ask a friend who’s still involved enough.

This problem will leave a bit of a bitter taste, because it leaves me stranded without a preferable path forward. It doesn’t make sense to optimize for the specific test set GCJ uses. Optimizing for random was already the proper way to do it. Maybe I’ll just try and look for the smallest possible change that passes?

Update 22:23 well LOL. After a bit of above-comfort statistical hacking, in the end the easy fix was to just dumb it down. I’m now getting 98,3% accuracy on random input by quantizing to 10 times fewer pixels per contestant than what I initially wrote. With a perfect 100% on the test set.

I feel a bit better about my distant statistical abilities.

So now I’ve got a proper conclusion to today’s puzzle! As usual, fuller code is on my github. Stay tuned for a qualifier recap.

Hey, I’ve got a life too, you know!↩︎
The verticals are less visible because the squashing is more pronounced. Click through to see the square pixels.↩︎
Pictured: some form of |−1;2;−1|^N for some N lost to the gods of the GIMP’s curves tool.↩︎
Hah! I’m aiming to get the line of code to fit without a scrollbar, is all.↩︎