Digit Blocks

The round’s final problem, “Digit Blocks”, is a nice little statistical hacking problem.

We have N piles. Each pile starts empty and can hold up to B blocks. Blocks each have a value of D × 10^H where D is the digit printed on the block and H the height the block ends up at.

N × B blocks arrive, one by one, each painted with a digit coming from a uniform distribution over [0..9]; for each block we are asked in which pile to put it, with no knowledge of what digits the future blocks will have.

The submission is a success if the summed value over a few attempts is over 90% or 98% of the theoretical maximum. I’m still not totally clear to me if that’s the theoretical maximum over all lucks (including the one that happens to yield N × B nines in a row), or the theoretical maximum assuming the perfect strategy. Anyway, we’ll proxy that by doing the best we can at every attempt and not worry so much more about it.

So it’s a classical act of balance between committing score points and creating opportunity.

For knowledge representation, my wording of the statement may have made it a bit more obvious than the original, so I’ll repeat it: it doesn’t matter which pile a block is in; all that matters is its height. So we can represent our state as a map from height to cardinality of the piles who have that height.

As a side note, the puzzle’s I/O protocol does ask us for a column index to place digits. We still don’t really need to track the column contents: we can maintain a decreasing set of piles by merely choosing which of the available heights we’ll fill with a block, and get a valid column index by summing the cardinalities of higher columns.

So here’s a general wrapper.

type RemHeight = Int
type Card = Int
type Digit = Int

type PureStrategy = Card -> RemHeight -> St -> Digit -> RemHeight

playStrategy :: MonadIO m => Card -> RemHeight -> PureStrategy -> m ()
playStrategy n b strat = do
  let play = strat n b
  flip evalStateT (st0 n b) $ replicateM_ (n * b) $ do
    d <- readFromJudge
    st <- get
    let rh = play st d
    put (consume st rh)
    writeToJudge (heightToIndex st rh)

type St = IntMap Card

st0 :: Card -> RemHeight -> St
st0 n b = IMap.singleton b n

-- | Place a block at a remaining height of rh.  Incorrect (unchecked)
-- to call for rh ≤ 0.
consume :: St -> RemHeight -> St
consume s rh = IMap.unionWith (+) s $ IMap.fromList [(rh,-1),(rh-1,1)]

heightToIndex :: St -> RemHeight -> Int
heightToIndex st rh = 1 + sum (IMap.filterWithKey (\h _ -> h < rh) st)

The top spots of every column are worth much more than anything below them, so they’re the ones we ought to optimize first. How do we put the most of the best value, nine, there?

It takes about ten draws to get a nine, so we want a spot for it to be available for as long as possible. So we can try a simple greedy algorithm of always placing digits 0 through 8 in the lower spots, and nines on the top spot.

We won’t necessarily have a top spot open when a nine comes up. That’s literally why we said we want those open as soon as possible. At any rate, if that happens, well… just treat the nine as a lower digit.
We won’t necessarily have a lower spot open when a lower digit comes up. It would be real unlucky (and unlikely) most of the time, but it’s extremely likely to happen for the very last spot. So when that happens, well, there’s no choice either.

Here’s that greedy strategy expressed in code.

-- | Always place on the leftmost tower that fits.  Nines fit
-- anywhere; other digit fit as long as they don't conclude the
-- column.  Unless there's no other choice left.
greedy :: PureStrategy
greedy n b st d = fst $ IMap.findMin $
                  if d == 9 || IMap.null safe then valid else safe
  where valid = IMap.filter (> 0) $ IMap.delete 0 st
        safe = IMap.delete 1 valid

This happens to be enough to pass the first test set. The problem analysis gives a much better explanation of why than I could.¹ Now for the harder set.

Obviously, the greedy strategy described above can’t be optimal. We’ll achieve a better optimum considering the actual expected values of the positions we reach after every choice we make. Each position’s value can be computed recursively from its successors’, so Dynamic Programming fits the bill.

But the number of states to compute is still prohibitive: there are 3 247 943 160 states² for normalized (decreasing) unlabeled block piles only. I’m not aware of a straightforward indexing for them, especially with the 1GB RAM limit this problem has.

But we can reduce the number of states by using a variant of the greedy heuristic and only ever considering placing a digit in the top R positions, or in the leftmost sub-R column.

R	states	time
0	3k	<1s
1	30k	3,5s
2	200k	30s
3	1M	3,5m

(Those are columns×digits states, so there are about 10× more naturally.)

R = 0 is the “always play leftmost nonfull” strategy. It yields about 50%, which is as good as random. Indeed, with no decision taken, it’s as good as the “play a random nonfull column” one.

R = 1 is close to the greedy strategy, except special-casing 9 isn’t hard-coded and the chosen digit to top a column may vary depending on how close we are to the end. It returns 92%, so indeed better than greedy but unworthy of the second test set.

R = 2 is mostly the sweet spot. It’s computable within problem limits and doesn’t overflow. It returns some yield over 98%³ reliably.

R = 3 goes overtime, but not by too much. It seems conceivable optimizing it could give even greater results. My implementation is a rather straightforward one, there is definitely room for improvement:

-- | Place at the “optimal” height among those <th or on the leftmost
-- among those >= th.
dp :: RemHeight -> PureStrategy
dp th n b = \ st d -> fst (cache ! (st,d)) where
  cache = execState (mapM_ (go (st0 n b)) [0..9]) Map.empty
  go st d = gets (Map.lookup (st,d)) >>= \ case
    Just r -> pure r
    Nothing -> do
      r <- expected th go st d
      modify' (Map.insert (st,d) r)
      pure r

-- | Expected gain of placing a digit.
expected :: Monad m => RemHeight
         -> (St -> Digit -> m (RemHeight,Double))
         ->  St -> Digit -> m (RemHeight,Double)
expected th rec st d = do
    candidates <- mapM f (prune th st)
    pure $ if null candidates then (-1,0)
           else maximumBy (comparing snd) candidates
  where f rh = expected' rec (consume st rh) >>= \e ->
               pure (rh,fromIntegral d * 10^^(-rh) + e)

-- | Expected gain of placing any digit.
expected' :: Monad m => (St -> Digit -> m (RemHeight,Double)) -> St -> m Double
expected' rec st = (/ 10) . sum . map snd <$> mapM (rec st) [0..9]

-- | Reduce search space by only allowing one tower that's not close
-- to the max to be filled in.  In dual terms, only allow one height
-- among those greater than the threshold.
prune :: RemHeight -> St -> [RemHeight]
prune th st = case r of []    ->   l
                        (x:_) -> x:l
  where (l,r) = partition (< th) $ dropWhile (<= 0) $
                IMap.keys $ IMap.filter (> 0) st

So yeah, improvement is still feasible. OTOH, it passes the test reliably, so don’t hold your breath waiting for me to do it.

In the meantime, I’ve given out an almost complete program, so here’s the little bit of wrapper and protocols holding it all together.

main :: IO ()
main = do
  [t,n,b,p] <- map read . words <$> getLine
  let strategy = case p of
        860939810732536850 -> greedy
        937467793908762347 -> dp 3
  replicateM_ t $ playStrategy n b strategy
  void readFromJudge
  traceM "VICTORY"

readFromJudge :: MonadIO m => m Int
readFromJudge = liftIO $ do
  num <- readLn
  when (num < 0) $ traceM "DEFEAT" *> exitSuccess
  pure num

writeToJudge :: MonadIO m => Int -> m ()
writeToJudge i = liftIO (print i *> hFlush stdout)

And… as is too often the case with those DP strategy problems, I’ve now got a close-to-perfect solution while still having very little inner understanding of how to play the game in real life. Oh, well…

This concludes round 1B’s final problem. As usual, the fuller code is on GitHub. Stay tuned for a short round recap before I go back to counting days to round 2!

It probably helps that they understand what the test set’s ratio is really about.↩︎
If my calculation isn’t mistaken, that is.↩︎
The local testing judge provided by GCJ doesn’t give the sum when the attempt passes, and I CBA to patch it. So I know it’s good enough, but not by how much.↩︎