Why is `vapply` safer than `sapply`?

The documentation says

vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer [...] to use.

Could you please elaborate as to why it is generally safer, maybe providing examples?


P.S.: I know the answer and I already tend to avoid sapply. I just wish there was a nice answer here on SO so I can point my coworkers to it. Please, no "read the manual" answer.

Answers


As has already been noted, vapply does two things:

  • Slight speed improvement
  • Improves consistency by providing limited return type checks.

The second point is the greater advantage, as it helps catch errors before they happen and leads to more robust code. This return value checking could be done separately by using sapply followed by stopifnot to make sure that the return values are consistent with what you expected, but vapply is a little easier (if more limited, since custom error checking code could check for values within bounds, etc.).

Here's an example of vapply ensuring your result is as expected. This parallels something I was just working on while PDF scraping, where findD would use a regex to match a pattern in raw text data (e.g. I'd have a list that was split by entity, and a regex to match addresses within each entity. Occasionally the PDF had been converted out-of-order and there would be two addresses for an entity, which caused badness).

> input1 <- list( letters[1:5], letters[3:12], letters[c(5,2,4,7,1)] )
> input2 <- list( letters[1:5], letters[3:12], letters[c(2,5,4,7,15,4)] )
> findD <- function(x) x[x=="d"]
> sapply(input1, findD )
[1] "d" "d" "d"
> sapply(input2, findD )
[[1]]
[1] "d"

[[2]]
[1] "d"

[[3]]
[1] "d" "d"

> vapply(input1, findD, "" )
[1] "d" "d" "d"
> vapply(input2, findD, "" )
Error in vapply(input2, findD, "") : values must be length 1,
 but FUN(X[[3]]) result is length 2

As I tell my students, part of becoming a programmer is changing your mindset from "errors are annoying" to "errors are my friend."

Zero length inputs One related point is that if the input length is zero, sapply will always return an empty list, regardless of the input type. Compare:

sapply(1:5, identity)
## [1] 1 2 3 4 5
sapply(integer(), identity)
## list()    
vapply(1:5, identity)
## [1] 1 2 3 4 5
vapply(integer(), identity)
## integer(0)

With vapply, you are guaranteed to have a particular type of output, so you don't need to write extra checks for zero length inputs.

Benchmarks

vapply can be a bit faster because it already knows what format it should be expecting the results in.

input1.long <- rep(input1,10000)

library(microbenchmark)
m <- microbenchmark(
  sapply(input1.long, findD ),
  vapply(input1.long, findD, "" )
)
library(ggplot2)
library(taRifx) # autoplot.microbenchmark is moving to the microbenchmark package in the next release so this should be unnecessary soon
autoplot(m)


Need Your Help

Acces class methods from classes out of a list

java list greenfoot

Is it possible to access a getter from a class out of a List of classes? See my code:

What reference is needed to bind Word Documents through Access?

ms-access vba ms-word reference

I'm programming over Access 2003 and Word.Application is unrecognized.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.