# How much overhead do function calls have in Haskell?

I am new to Haskell and I am puzzled by the cost of a function call, which seems to be completely unreasonable to me, and makes me think I am doing something fundamentally wrong.

Consider the following Haskell code:

module Main where logistic x = 4.0*x*(1.0-x) lg :: Double -> Int -> Double lg !x 0 = x lg !x !n = lg (logistic x) (n-1) main = putStrLn $ show $ lg 0.7861 100000000

Compiling this with the command

ghc -O3 -XBangPatterns -o tsths tst.hs

and running it, I get:

real 0m15.904s user 0m15.853s sys 0m0.016s

If instead of calling the function logistic I calculate the expression inline:

module Main where lg :: Double -> Int -> Double lg !x 0 = x lg !x !n = lg (4.0*x*(1.0-x)) (n-1) main = putStrLn $ show $ lg 0.7861 100000000

the execution time becomes:

real 0m0.838s user 0m0.828s sys 0m0.004s

which is exactly the same as the equivalent C program, which is

#include <stdio.h> int main() { int i, num=100000000; double x=0.7861; for (i=0; i<num; ++i) x *= 4.0*(1.0-x); printf("%lg\n", x); }

Am I doing something terribly wrong?

Many thanks.

## Answers

Add a type signature to logistic and you will see the speedup. Allow me to use CPP to demonstrate the difference.

bash> cat tst.hs module Main where #if defined(SIG) logistic :: Double -> Double #endif logistic x = 4.0*x*(1.0-x) lg :: Double -> Int -> Double lg !x 0 = x lg !x !n = lg (logistic x) (n-1) main = putStrLn $ show $ lg 0.7861 100000000 bash> ghc --version The Glorious Glasgow Haskell Compilation System, version 7.4.1

If compiled without SIG defined (the type signature is excluded):

bash> ghc -O3 -XBangPatterns -XCPP -o tsths tst.hs [1 of 1] Compiling Main ( tst.hs, tst.o ) Linking tsths ... bash> time ./tsths 0.34209286442469333 real 0m13.187s user 0m13.177s sys 0m0.008s

Now lets compile with SIG defined so the signature is included:

bash> rm tsths *.o *.hi bash> ghc -O3 -XBangPatterns -XCPP -DSIG -o tsths tst.hs [1 of 1] Compiling Main ( tst.hs, tst.o ) Linking tsths ... bash> time ./tsths 0.34209286442469333 real 0m0.464s user 0m0.440s sys 0m0.020s

Not sure why GHC doesn't optimize it without the signature; the monomorphism restriction should restrict it to Double -> Double anyways.

It's a bug in GHC-7.4.1. Looking at the generated core (only the core for the function lg is important, from GHC-7.4.2, we get

Main.lg3 :: GHC.Types.Double [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False, ConLike=False, Cheap=False, Expandable=False, Guidance=IF_ARGS [] 30 0}] Main.lg3 = GHC.Float.$w$cfromRational Main.lg4 Main.lg2 Main.lg1 :: GHC.Types.Double [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False, ConLike=False, Cheap=False, Expandable=False, Guidance=IF_ARGS [] 30 0}] Main.lg1 = GHC.Float.$w$cfromRational Main.lg2 Main.lg2 Main.$wlg :: GHC.Prim.Double# -> GHC.Prim.Int# -> GHC.Prim.Double# [GblId, Arity=2, Str=DmdType LL, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True, ConLike=True, Cheap=True, Expandable=True, Guidance=IF_ARGS [0 30] 158 0}] Main.$wlg = \ (ww_s1Oy :: GHC.Prim.Double#) (ww1_s1OC :: GHC.Prim.Int#) -> case ww1_s1OC of ds_Xvs { __DEFAULT -> case Main.lg3 of _ { GHC.Types.D# x_awJ -> case Main.lg1 of _ { GHC.Types.D# x1_awV -> letrec { $wlg1_X1PF [Occ=LoopBreaker] :: GHC.Prim.Double# -> GHC.Prim.Int# -> GHC.Prim.Double# [LclId, Arity=2, Str=DmdType LL] $wlg1_X1PF = \ (ww2_X1Pv :: GHC.Prim.Double#) (ww3_X1PA :: GHC.Prim.Int#) -> case ww3_X1PA of ds1_Xwr { __DEFAULT -> $wlg1_X1PF (GHC.Prim.*## (GHC.Prim.*## x_awJ ww2_X1Pv) (GHC.Prim.-## x1_awV ww2_X1Pv)) (GHC.Prim.-# ds1_Xwr 1); 0 -> ww2_X1Pv }; } in $wlg1_X1PF (GHC.Prim.*## (GHC.Prim.*## x_awJ ww_s1Oy) (GHC.Prim.-## x1_awV ww_s1Oy)) (GHC.Prim.-# ds_Xvs 1) } }; 0 -> ww_s1Oy }

two top-level Doubles and a decent loop.

GHC-7.4.1 was a bit too inlining-happy, that produced

Rec { Main.$wlg [Occ=LoopBreaker] :: GHC.Prim.Double# -> GHC.Prim.Int# -> GHC.Prim.Double# [GblId, Arity=2, Str=DmdType LL] Main.$wlg = \ (ww_s1NS :: GHC.Prim.Double#) (ww1_s1NW :: GHC.Prim.Int#) -> case ww1_s1NW of ds_Xvb { __DEFAULT -> case GHC.Float.$wfromRat'' (-1021) 53 Main.logistic4 Main.logistic2 of ww2_a1Mt { __DEFAULT -> case GHC.Float.$wfromRat'' (-1021) 53 Main.logistic2 Main.logistic2 of ww3_X1Nq { __DEFAULT -> Main.$wlg (GHC.Prim.*## (GHC.Prim.*## ww2_a1Mt ww_s1NS) (GHC.Prim.-## ww3_X1Nq ww_s1NS)) (GHC.Prim.-# ds_Xvb 1) } }; 0 -> ww_s1NS } end Rec }

and gave you two calls to the fromRational worker in each iteration.

Now, fromRational is a fairly complicated function. It is still rather slow, despite having gotten a much faster implementation in the 7.2 series, so these calls hurt big time.

With a type signature, there are no Rational top-level constants produced, only Double constants, and these are then used, which of course doesn't include a gratuitous slowdown.

As suggested by Dan Burton it is actually overhead of polymorphic function, because GHC infers type logistic :: Fractional a => a -> a. If you specify type explicitly you generally enable both better checking and better optimizations. I believe it is good practice to specify type of function explicitly.

If you want to have function with polymorphic type but have full speed of monomorphic call in case of specific usage you can use SPECIALIZE pragma, but I believe this is GHC specific.

{-# LANGUAGE BangPatterns #-} module Main where logistic :: Fractional a => a -> a {-# SPECIALISE logistic :: Double -> Double #-} logistic x = 4.0*x*(1.0-x) lg :: Double -> Int -> Double lg !x 0 = x lg !x !n = lg (logistic x) (n-1) main = putStrLn $ show $ lg 0.7861 100000000

Also note that you can specify LANGUAGE pragma at the beginning of file to enable bang patterns and don't need to enable them on command line.

Times on my machine were 21 seconds for original, 0.67 sec for explicit type, 0.7 sec for specialize (which is basically the same).

I believe overhead of specialized call is very small because it is just bunch of instructions which gets inlined anyway but polymorphic function results in call. Though it is weird that GHC cant inline despite polymorphism.