C++: function creation using array

Write a function which has:

input: array of pairs (unique id and weight) length of N, K =< N  
output: K random unique ids (from input array)  

Note: being called many times frequency of appearing of some Id in the output should be greater the more weight it has. Example: id with weight of 5 should appear in the output 5 times more often than id with weight of 1. Also, the amount of memory allocated should be known at compile time, i.e. no additional memory should be allocated.

My question is: how to solve this task?

EDIT thanks for responses everybody! currently I can't understand how weight of pair affects frequency of appearance of pair in the output, can you give me more clear, "for dummy" explanation of how it works?

Answers


My short answer: in no way.

Just because the problem definition is incorrect. As Axn brilliantly noticed:

There is a little bit of contradiction going on in the requirement. It states that K <= N. But as K approaches N, the frequency requirement will be contradicted by the Uniqueness requirement. Worst case, if K=N, all elements will be returned (i.e appear with same frequency), irrespective of their weight.

Anyway, when K is pretty small relative to N, calculated frequencies will be pretty close to theoretical values.

The task may be splitted on two subtasks:

  1. Generate random numbers with a given distribution (specified by weights)
  2. Generate unique random numbers
Generate random numbers with a given distribution
  1. Calculate sum of weights (sumOfWeights)
  2. Generate random number from the range [1; sumOfWeights]
  3. Find an array element where the sum of weights from the beginning of the array is greater than or equal to the generated random number

Code

#include <iostream>
#include <cstdlib>
#include <ctime>

// 0 - id, 1 - weight
typedef unsigned Pair[2];

unsigned Random(Pair* i_set, unsigned* i_indexes, unsigned i_size)
{
   unsigned sumOfWeights = 0;
   for (unsigned i = 0; i < i_size; ++i)
   {
      const unsigned index = i_indexes[i];
      sumOfWeights += i_set[index][2];
   }

   const unsigned random = rand() % sumOfWeights + 1;

   sumOfWeights = 0;
   unsigned i = 0;
   for (; i < i_size; ++i)
   {
      const unsigned index = i_indexes[i];
      sumOfWeights += i_set[index][3];
      if (sumOfWeights >= random)
      {
         break;
      }
   }

   return i;
}
Generate unique random numbers

Well known Durstenfeld-Fisher-Yates algorithm may be used for generation unique random numbers. See this great explanation.

It requires N bytes of space, so if N value is defined at compiled time, we are able to allocate necessary space at compile time.

Now, we have to combine these two algorithms. We just need to use our own Random() function instead of standard rand() in unique numbers generation algorithm.

Code

template<unsigned N, unsigned K>
void Generate(Pair (&i_set)[N], unsigned (&o_res)[K])
{
   unsigned deck[N];
   for (unsigned i = 0; i < N; ++i)
   {
      deck[i] = i;
   }

   unsigned max = N - 1;

   for (unsigned i = 0; i < K; ++i)
   {
      const unsigned index = Random(i_set, deck, max + 1);

      std::swap(deck[max], deck[index]);
      o_res[i] = i_set[deck[max]][0];
      --max;
   }
}
Usage
int main()
{
   srand((unsigned)time(0));

   const unsigned c_N = 5;    // N
   const unsigned c_K = 2;    // K
   Pair input[c_N] = {{0, 5}, {1, 3}, {2, 2}, {3, 5}, {4, 4}}; // input array
   unsigned result[c_K] = {};

   const unsigned c_total = 1000000; // number of iterations
   unsigned counts[c_N] = {0};       // frequency counters

   for (unsigned i = 0; i < c_total; ++i)
   {
      Generate<c_N, c_K>(input, result);
      for (unsigned j = 0; j < c_K; ++j)
      {
         ++counts[result[j]];
      }
   }

   unsigned sumOfWeights = 0;
   for (unsigned i = 0; i < c_N; ++i)
   {
      sumOfWeights += input[i][1];
   }

   for (unsigned i = 0; i < c_N; ++i)
   {
      std::cout << (double)counts[i]/c_K/c_total  // empirical frequency
                << " | "
                << (double)input[i][1]/sumOfWeights // expected frequency
                << std::endl;
   }

   return 0;
}
Output
N = 5, K = 2

      Frequencies
Empiricical | Expected
 0.253813   | 0.263158
 0.16584    | 0.157895
 0.113878   | 0.105263
 0.253582   | 0.263158
 0.212888   | 0.210526

Corner case when weights are actually ignored

N = 5, K = 5

      Frequencies
Empiricical | Expected
 0.2        | 0.263158
 0.2        | 0.157895
 0.2        | 0.105263
 0.2        | 0.263158
 0.2        | 0.210526

Assuming a good enough random number generator:

  • Sum the weights (total_weight)
  • Repeat K times:
    • Pick a number between 0 and total_weight (selection)
    • Find the first pair where the sum of all the weights from the beginning of the array to that pair is greater than or equal to selection
    • Write the first part of the pair to the output

You need enough storage to store the total weight.


Ok so you are given input as follows:

(3, 7)
(1, 2)
(2, 5)
(4, 1)
(5, 2)

And you want to pick a random number so that the weight of each id is reflected in the picking, i.e. pick a random number from the following list:

3 3 3 3 3 3 3 1 1 2 2 2 2 2 4 5 5

Initially, I created a temporary array but this can be done in memory as well, you can calculate the size of the list by summing all the weights up = X, in this example = 17

Pick a random number between [0, X-1], and calculate which which id should be returned by looping through the list, doing a cumulative addition on the weights. Say I have a random number 8

(3, 7) total = 7 which is < 8
(1, 2) total = 9 which is >= 8 **boom** 1 is your id!

Now since you need K random unique ids you can create a hashtable from initial array passed to you to work with. Once you find an id, remove it from the hash and proceed with algorithm. Edit Note that you create the hashmap initially only once! You algorithm will work on this instead of looking through the array. I did not put in in the top to keep the answer clear

As long as your random calculation is not using any extra memory secretly, you will need to store K random pickings, which are <= N and a copy of the original array so max space requirements at runtime are O(2*N)

Asymptotic runtime is :

O(n) : create copy of original array into hastable +
(
   O(n) : calculate sum of weights +
   O(1) : calculate random between range +
   O(n) : cumulative totals
) * K random pickings
= O(n*k) overall

This is a good question :)


This solution works with non-integer weights and uses constant space (ie: space complexity = O(1)). It does, however modify the input array, but the only difference in the end is that the elements will be in a different order.

  • Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.

  • sum_weights = the sum of all of the weights, and n = N.

  • K times:

    • Choose a random number r in the range [0,sum_weights)

    • binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.

    • Add input[i].id to output.

    • Subtract input[i-1].weight from input[i].weight (unless i == 0). Now subtract input[i].weight from to following (> i) input weights and also sum_weight.

    • Move input[i] to position [n-1] (sliding the intervening elements down one slot). This is the expensive part, as it's O(N) and we do it K times. You can skip this step on the last iteration.

    • subtract 1 from n

  • Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight

Time complexity is O(K*N). The expensive part (of the time complexity) is shuffling the chosen elements. I suspect there's a clever way to avoid that, but haven't thought of anything yet.

Update

It's unclear what the question means by "output: K random unique Ids". The solution above assumes that this meant that the output ids are supposed to be unique/distinct, but if that's not the case then the problem is even simpler:

  • Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.

  • sum_weights = the sum of all of the weights, and n = N.

  • K times:

    • Choose a random number r in the range [0,sum_weights)

    • binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.

    • Add input[i].id to output.

  • Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight

Time complexity is O(K*log(N)).


Need Your Help

Use Entity's method in DQL (Symfony2)

php symfony2

My Symfony2 app has a Person entity, it has $firstName and $lastName as its properties. It's got a method getFullName() which returns the full name of Person.

Which outperforms the other?(related to string)

c# string performance text indexof

I don't know how IndexOf() method of String object works and so I would like to know which outperforms the other with the 2 following implementations:

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.