Append Sort

Hey, there! Long time no see. Real life and so on, but especially real nasty problem C. Wait, I’m getting ahead of myself.

Code Jam 2021 round 1A’s problem A, “Append Sort”, is such a twist on this year’s leitmotiv it’s really not a sorting problem at all anymore. We’re given a list of integers, and asked to convert it into a sorted list by appending as few digits as possible to existing entries.

The main observation here is that making each number larger than the previous one by minimal appending is enough to solve the problem. It obviously sorts the list. It less-obviously results in a minimal solution, so long as the updated entry is not only as short as possible, but also as small as possible. That’s… actually not too hard to achieve.

So how do we make a number bigger than the previous one?

Well, it could already be bigger. That’s the trivial case. Don’t change anything then.

So let’s suppose it’s smaller or equal. No number starts with zero, so it’s clear that making it wider than the previous, no matter the actual contents, will make it larger.

The interesting case lies in the middle: in which cases can we make it larger without making it wider? Narrower yet larger is obviously impossible. But with both numbers of the same width, there’s still some wiggle room. For example 5 and 6 are of the same width, yet one is larger than the other.

It’s easier to reach the clear idea by trying to bring a narrower number to be equal to the previous one. In this case, it’s only possible if the latter number is a prefix of the former. We’d just add the correct digits one by one until we reached it.

But the problem statement won’t tolerate an equal one: the list order has to be strictly increasing. So we’ll compare prefixes and match digits for one plus the former entry.

If we don’t have a full prefix match, we have a digit of first mismatch. The cases here are:

it’s a greater digit. In that case, anything we add will result in a strictly larger number as soon as we match the width. So let’s add zeros, to keep options for later.
it’s a lower digit. In that case, anything we add will result in a strictly smaller number before we exceed the width. So we’ll need to add one more digit, it might as well be zeros again.

And that’s about it. Let’s get to the code.

We want to be able to operate on the numbers from a variety of views:

we need to be able to compare them with one another as whole numbers.
we need to be able to compare their prefixes.
having direct access to their width would be nice.
also, incrementing it.

So let’s simply bundle it all in a record and maintain it as we go:

data Number = N
  { nRaw    :: [Int]    -- ^ digits
  , nCooked :: Integer  -- ^ value
  , nWidth  :: Int      -- ^ width
  }

Input comes as a string, so we’ll need a function to import them.

fromString :: String -> Number
fromString s = N (digitToInt <$> s) (read s) (length s)

We want an easy comparison.

instance Eq  Number where  (==)   =  (==)   `on` nCooked
instance Ord Number where compare = compare `on` nCooked

Incrementing could come in a variety of ways. We only need one per entry, so performance isn’t too important. I’ll just use perform it on the bigint representation and re-import.

nbSucc :: Number -> Number
nbSucc (N _ cooked _) = N ds cooked' (length ds)
  where cooked' = succ cooked
        ds = digitToInt <$> show cooked'

We need a prefix check. That’s why we maintained the per-digit split representation.

nbPrefixOf :: Number -> Number -> Bool
nbPrefixOf = isPrefixOf `on` nRaw

Oh, and zero-padding, that came up a lot.

zeroPad :: Number -> Int -> Number
zeroPad (N raw cooked width) pad =
  N (raw ++ replicate pad 0) (cooked * 10^pad) (width + pad)

We decided to go the greedy route. Our state is simply the previous (resulting) entry; our accumulator the number of added digits. I’ll store both in a pair and fold.

solve :: [Number] -> Int
solve (x:xs) = snd $ foldl f (x,0) xs where

We now proceed one entry after the other. Cases ordered smallest possible result first.

Smallest (and easiest) case: when there’s nothing to do. Just update the current entry; don’t touch the accumulator.

  f (prev,acc) next
    | prev < next = (next,acc)

Next: we have a compatible prefix. Add digits as required to reach the number one above the previous one.

    | n' <- nbSucc prev, next `nbPrefixOf` n' = (n',acc + nWidth n' - nWidth next)

If the prefix isn’t compatible, we’ll need to check whether it’s above or below the previous number’s prefix of the same length. We could extract them and perform a lexicographical comparison, but that would be a bit tedious. It’s easier to just build the resulting candidate numbers and compare those directly. It’s more wasteful, but we’re absolutely not thirsty for performance.

    | prev < paddedSmall = (paddedSmall,acc + padWidth)
    | otherwise          = (paddedLarge,acc + padWidth + 1)
    where padWidth = nWidth prev - nWidth next
          paddedSmall = zeroPad next padWidth
          paddedLarge = zeroPad paddedSmall 1

And that’s it!

A wrapper for main and we’re good to go.

main :: IO ()
main = do
  t <- readLn :: IO Int
  forM_ [1..t] $ \i -> do
    void getLine -- n
    xs <- map fromString . words <$> getLine
    putStrLn $ "Case #" ++ show i ++ ": " ++ show (solve xs)

I’ve mumbled on about performance being ok, but hand-waved any actual justification away. Let’s take a quick closer look.

Reading the input is linear in input size. We usually don’t worry too much about this, but I’m using bigints to make my life easier, so it’s always something to be conscious about.
We have a greedy algorithm, functioning per entry, so globally we’ll run N times something.
The easy case is a bigint comparison, linear in int width.
The prefix match case increments then compares bigints. Both are linear in int width.
The prefix mismatch case pads twice. Both linear in int width.

So our global running time is O(NW), where W is the worst case integer width we’re handling. How big can it get?

Input numbers go up to 10⁹. Worst case width increase is one per entry, so W ≤ W_input + N. For a global runtime of O(N×(W_i+N)) = O(N²).

So we’re quadratic in N. But N ≤ 100. We’re fine, really.

This concludes today’s solution. The full code is on github.