Sample with constraint, vectorized

What is the most efficient way to sample a data frame under a certain constraint?

For example, say I have a directory of Names and Salaries, how do I select 3 such that their sum does not exceed some value. I'm just using a while loop but that seems pretty inefficient.

Answers


You could face a combinatorial explosion. This simulates the selection of 3 combinations of the EE's from a set of 20 with salaries at a mean of 60 and sd 20. It shows that from the enumeration of the 1140 combinations you will find only 263 having sum of salaries less than 150.

> sum( apply( combn(1:20,3) , 2, function(x) sum(salry[x, "sals"]) < 150))
[1] 200
> set.seed(123)

> salry <- data.frame(EEnams = sapply(1:20 , 
                                       function(x){paste(sample(letters[1:20], 6) , 
                                             collapse="")}), sals = rnorm(20, 60, 20))
> head(salry)
  EEnams     sals
1 fohpqa 67.59279
2 kqjhpg 49.95353
3 nkbpda 53.33585
4 gsqlko 39.62849
5 ntjkec 38.56418
6 trmnah 66.07057

> sum( apply( combn(1:NROW(salry), 3) , 2, function(x) sum(salry[x, "sals"]) < 150))
[1] 263

If you had 1000 EE's then you would have:

> choose(1000, 3)   # Combination possibilities
# [1] 166,167,000   Commas added to output

One approach would be to start with the full data frame and sample one case. Create a data frame which consists of all the cases which have a salary less than your constraint minus the selected salary. Select a second case from this and repeat the process of creating a remaining set of cases to choose from. Stop if you get to the number you need (3), or if at any point there are no cases in the data frame to choose from (reject what you have so far and restart the sampling procedure).

Note that different approaches will create different probability distributions for a case being included; generally it won't be uniform.


How big is your dataset? If it is small (and small really depends on your hardware), you could just list all groups of three, calculate the sum, and sample from that.

## create data frame
N <- 100
salary <- rnorm(N))
## list all possible groups of 3 from this
x <- combn(salary, 3)
## the sum
sx <- colSums(x)
sxc <- sx[sx<1]
## sampling with replacement
sample(sxc, 10, replace=TRUE) 

Need Your Help

For Loop not executing properly

excel vba excel-vba

I have the following VBA code:

How to solve Object reference not set to an instance of an object.?

c# asp.net nullreferenceexception

In my asp.net program.I set one protected list.And i add a value in list.But it shows Object reference not set to an instance of an object error

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.