Today we have a drastic shift for Advent of Code, in both difficulty of implementation and resulting program length.
I’m still going literate Haskell, so you’re not getting away with the rest of the post without me first dropping a few imports.
import Control.Applicative (Alternative,empty)
import Control.Arrow ((&&&))
import Control.Monad.RWS.Strict (join,evalRWS,get,put,modify,tell)
import Data.List (find,foldl',sort,unfoldr)
import System.FilePath.Posix ((</>),takeDirectory)
import qualified Data.Map.Strict as Map
The name of the game today is accounting for aggregated file sizes in a tree structure.
Today is the first day in a long time where I can see a plethora of different ways to implement it.
- we could do it somewhat optimally and reproduce a clean tree structure of size occupied the files at the local directory level, summing it up a single time for the aggregate volumes.
- we could do it somewhat brutally and convert the input into an actual filesystem representation of what the Elves have, and then analyze it with regular filesystem tools. A cursory glance at the input file verifies it’s safe enough. That’s not the way I picked, but it’s definitely one of the fun ones, and if nobody on the subreddit did I will have to.
- I picked a more hybrid way.
I am going to merely generate a list of path-size pairs, where path represents the complete directory trace from root to file. I will then, file by file, aggregate those sizes upwards in the directory structure. This is time-suboptimal because I am summing each file individually on each level it can contribute to, as opposed to summing only once per first-degree directory entry (file or subdir). It gains a lot on the code simplicity front: I skip on building a tree structure, and I can use standard third-party functions to parse paths.
It’s not so obvious to deduce a time complexity from that. For a given tree structure, it’s obvious my way is O(nfiles×max_file_depth), when optimal would be O(nfileslog(max_file_depth)). But actually constructing a worst case out of that formula from a constant-size input such as mine gets tricky. And on my input, total time remains small no matter what.
So I’ll start with a stateful parsing phase. The state will be the current path. The output will be the list of (file,size) pairs.
type FileEvent = (FilePath,Int)
I’ll tokenize by words. Starting state is root directory
/
.
parse :: String -> [FileEvent]
= snd $ evalRWS (readCommand (words s)) () "/"
parse s where
= pure () readCommand []
Directory-changing commands will update the state accordingly and yield nothing.
Changing to root is a pure state assignment:
"$":"cd":"/":ts) = put "/" *> readCommand ts readCommand (
…whereas relative changes make use of standard directory-handling functions and operators.
"$":"cd":"..":ts) = modify takeDirectory *> readCommand ts
readCommand ("$":"cd":p:ts) = modify (</> p) *> readCommand ts readCommand (
The command that lists entries switches parsing from command mode to file listing parsing mode.
"$":"ls":ts) = readLsOutput ts readCommand (
We remain in list parsing mode until we encounter something that looks like a command again. Or reach end of input.
@("$":_) = readCommand ts
readLsOutput ts= pure () readLsOutput []
Encountering a subdirectory in a listing would be very useful if we were actually generating those commands. But we’re merely analyzing their result, so we don’t need to do anything specific when discovering a new directory.
"dir":_:ts) = readLsOutput ts readLsOutput (
Encountering a file is the real meat: we get to generate one of those coveted filesystem entry pairs.
:fn:ts) = do
readLsOutput (fs<- get
p </> fn,read fs)]
tell [(p readLsOutput ts
Once we have those entries, we can aggregate sizes on all levels.
The outer loop handles files one by one:
type FileSystem = Map.Map FilePath Int
buildFileSystem :: String -> FileSystem
= foldl' registerFile Map.empty . parse buildFileSystem
And the inner loop adds their sizes to each of their parent directories’ tally.
registerFile :: FileSystem -> FileEvent -> FileSystem
= foldl' account t0 (trail path)
registerFile t0 (path,size) where account t fp' = Map.insertWith (+) fp' size t
The trail
helper enumerates a path’s parent
directories.
trail :: FilePath -> [FilePath]
= ("/" :) . unfoldr (fmap dup . guarded (/= "/") . takeDirectory) trail
guarded
and dup
are semi-standard
auxiliaries I could have imported from most anywhere else but didn’t
bother to this time.
where
guarded :: Alternative f => (a -> Bool) -> a -> f a
= if p a then pure a else empty
guarded p a dup :: a -> (a,a)
= join (,) dup
Once we have this directory to aggregate volume mapping, solving part 1 is a simple matter of performing the requested computation: summing all directory sizes among directories of size smaller than 100 000.
You’ll notice there’s really no need for directory names or paths anymore, so we’ll operate directly on the sizes.
part1 :: [Int] -> Int
= sum . filter (<= 100000) part1
A small wrapper gets us in business:
main :: IO ()
= interact $ show . (part1 &&& part2) . Map.elems . buildFileSystem main
Ah, yes, part 2. There we are to perform a more convoluted computation. We could probably set the math down and simplify it, but it’s really a constant-time operation much better left to the compiler to optimize. As if it could have any significant runtime impact.
part2 :: [Int] -> Maybe Int
= find (>= target) (sort ss)
part2 ss where
= 70000000
size = 30000000
need = maximum ss
used = size - used
unused = need - unused target
This concludes today’s solution. But this one is a good candidate to come back to.