Checking whether a row has only NA
values
We create random data.
A <- array(0, dim = c(5000, 50)) nar <- sample(nrow(A), 100) A[nar, ] <- NA
Three functions that return a logical vector whose
length matches the number of rows in A
. The functions
go from straightforward to more obscure.
f1 <- function(a) apply(a, 1, function(x) all(is.na(x))) f2 <- function(a) rowSums(is.na(a)) == dim(a)[[2L]] f3 <- function(a) { mn <- dim(a) .rowSums(is.na(a), m = mn[[1L]], n = mn[[2L]]) == dim(a)[[2L]] }
We check whether results are equal.
all.equal(sort(nar), which(f1(A))) all.equal(sort(nar), which(f2(A))) all.equal(sort(nar), which(f3(A)))
[1] TRUE [1] TRUE [1] TRUE
The functions differ in computing time.
library("rbenchmark") benchmark(f1(A), f2(A), f3(A), order = "relative", replications = 1000)[, 1:4]
test replications elapsed relative 3 f3(A) 1000 0.765 1.000 2 f2(A) 1000 0.824 1.077 1 f1(A) 1000 7.244 9.469
What if we had fewer columns?
A <- array(0, dim = c(5000, 3)) nar <- sample(nrow(A), 100) A[nar, ] <- NA benchmark(f1(A), f2(A), f3(A), order = "relative", replications = 1000)[, 1:4]
test replications elapsed relative 3 f3(A) 1000 0.058 1.000 2 f2(A) 1000 0.062 1.069 1 f1(A) 1000 4.634 79.897
Altogether there seems little difference between using
.rowSums
and rowSums
; but both variations are much
faster than the (straightforward) use of apply
.