Tuesday, October 23, 2007

Ranking permutations of multiset - more details

I refactored and documented my algorithms, which I described before to be more readable. Now I will try to explain how it work.

So, let's have multi-set, set with repetitions. For example (0,0,1,2). Let's agree that list of permutations will be ordered in lexicographic order. Permutations are indexed from zero.
0012


My algorithms are able to find index of given permutation, this is called Rank(). And for given index generate the permutation, this is called Unrank(). Now I will explain Unrank(), which is easier to understand. Let's try to find permutation with index 7 from total 12 possible permutations.
0012-2
We will iterate thru all 4 positions from,left to right and identify members of multiset which should be there. We fill now follow the first iteration of outer loop in algo below. So for first position, you can already see that we should choose member type 1, but in the algo we don't know it yet. Let's look at first column, what possibilities we have.
0012-2
As you can see, there is imbalance between number of permutations which start with member type 0. This is caused by fact that member type 0 appears twice in input multiset. Or from other viewpoint the number of possibilities (permutations) for sub-multisets is bigger for (0,1,2) than for (0,0,1) or (0,0,2). Those sub-multisets are defined as original multiset without member already used (chosen).
0012-2

Back to row on index 7, so we will shrink index(in range of potential) to fit to range of length. Then we will scan for offset of group which has selector inside of it's range. For our example it will be group of type 1 starting at offset 6. Now after some expanding we will subtract offset from index (we will shift base), so that all possible indexes are in range of new potential which is computed on next line. This potential is potential of sub-multiset of remaining, not yet used members. The new index (for our example) for that sub-multiset is 1. Outer loop will process next column now.

Now it should be easy follow the algo. Note that number of permutation is called 'potential' here and that some helper structures were filled before call to that method.
function Unrank(permutationIndex)
{
    // initialize: length, maximal potential and counts could be prepared for all ranking operation on same multiset
    currentPotential = maxPotential;
    currentLength = length;
    currentCounts[] = preparedCounts[];

    for each position from left
    {
        // note that 
        //  - sum of count of all currenttly remaining types is length of multiset
        //  - current potential is sum of potentials of sub-multisets
        
        // compute selector, which is just rank shrink to be in range between zero and current length of multiset
        selector = ((permutationIndex * currentLength) / currentPotential);
    
        // initialize
        offset = 0; type = 0;
    
        // scanning for offset of sub-multiset in which range selector could be found
        while ((offset + currentCounts[type]) <= selector)
        {
            offset += currentCounts[type];
            type++;
        }
    
        // remove consumed offset
        permutationIndex -= (currentPotential * offset) / currentLength;
    
        // compute potential of sub-multiset
        currentPotential = (currentPotential * currentCounts[type]) / currentLength;
    
        // consume type
        currentCounts[type]--;
        
        // store chosen type
        result[position] = type;
    }
    return result;
}

Ranking is very similar to this, but just inverse.

Ideas

It would be much better if we could have faster algo which will have only single loop thru positions, even when order of permutations wouldn't be lexicographical.

In simple example like one above, it could seem that solution is easy. Just pick type of current member which is on index of selector (in buffer of remaining unused members). Then swap that member from buffer of remaining members. But this will introduce problem for multisets where length of set is aliqant (not divisor) of permutation potential. This work in my algo, because when all members of same type are together, decimal fragments of such division is always inside one group (type). But swapping of already used members with unused ones broke groups of members of same type. This swapping idea works for ranking of set, because all sub-sets have same potential.
0012

Above is example of problematic multiset. Note that on first position we are choosing 1/6 permutations, in my algo by scanning for offset of group. But for swapped buffer we should find constant time function which will give member type with correct distribution for all permutations of working buffer (because buffer state/permutation could be heritage of ranking wider multiset).

I know that algo could be probably made bit faster by swapping out empty slots from currentCounts[] array. But this will help just a bit and will broke lexicographical order, while it will complicate algo lot. I don't think it's worth of trying for mid-sized sets.

I'am interested to hear your ideas! And of course corrections if I missed something.

Clarification 13.4.2008:

Whole algorithm is not designed for permutating of small multisets, because such small ones could be easily precomputed and stored in memory for reuse. Let's talk about multiset of 2*A, 2*B, 2*C ... 2*J or something like that. This (2*10)! / ((2!)*10), a big space.
My algo could find one rank in O(10*2*(10/2)) = circa 100 operations. (some of them are quite big, but constant time)
Also notice that reasonable limit for size of space is 64bit or 128bit numbers, so that it could fit into CPU register and allow for constant time operations.
I'm looking for something faster in terms of computational complexity. Linear not quadratic.

1 comment:

jesus.salas_at_gmail.com said...

Thank you very much for this code, is exactly what I was looking for, I'm going to port to GMP library to support really big permutation index, I'm working with permutations between 48! and 60! that generates indexes since 203 to 273 bits (depending on repetitions on the set).

greets.