Using a modern GHC compiler, how much memory would this program use?

x <- readFile "foo"
x `deepseq` print ()

Linear in the size of foo? Or something else?

Turns out, for the default readFile from the Prelude, the answer is about 40 times the size of the input file.

The default Haskell strings take 5 words per character, so on a 64bit machine this is 40 5*8 = 40 bytes per character. The list of characters is stored as a linked list, roughly like this diagram (taken from Johan Tibbel’s ZuriHac 2015 talk, slides are here):

We can check the actual memory usage of a Haskell program (compiled with GHC) by using the RTS options:

$ ghc readfile.hs -Wall -O2 -rtsopts
$ ./readfile +RTS -toutput <other options>

See this repository for some scripts to benchmark a few variants of readFile:

  • Prelude
  • Data.ByteString
  • Data.ByteString.Char8
  • Data.ByteString.Lazy
  • Data.ByteString.Lazy.Char8

Basically, anything’s better than the default readFile: