hashed-permutation

(Timothy Hobbs 18.9.2019)

project homepage ipynb

hashed-permutation is a rust library which allows you to quickly grab a number out of a shuffled list of numbers. Hashed permutation does not actually create a permutation of a list, rather it uses an algorithm to grab the number in O(1) time from a virtual list. This is both memory and time efficient. You can read about how the algorithm works in the library author's post A Memory and Space Constant Shuffling Algorithm.

In [ ]:

In [2]:
:dep hashed-permutation="2.1.1"
In [3]:
:dep factorial="0.1.1"
In [4]:
extern crate hashed_permutation;
use hashed_permutation::*;
use factorial::Factorial;

Basic usage is pretty simple. First you describe your virtual shuffled list, then you ask for the nth number from that list.

In [5]:
let my_shuffled_list = HashedPermutation {
        seed: 3,
        length: 5,
};

my_shuffled_list.shuffle(3).unwrap()
Out[5]:
1
In [6]:
let len = 4;

for i in 0..len.factorial() {
    let perm = HashedPermutation {
        seed: i,
        length: len,
    };

    print!("[seed: {:2}] ", i);
    for n in 0..len {
        let permuted_number = perm.shuffle(n).unwrap();
        print!("{:2} ", permuted_number);
    }
    print!("\n")
}
[seed:  0]  0  1  2  3 
[seed:  1]  1  2  3  0 
[seed:  2]  2  3  0  1 
[seed:  3]  3  0  1  2 
[seed:  4]  0  1  2  3 
[seed:  5]  1  2  3  0 
[seed:  6]  2  3  0  1 
[seed:  7]  3  0  1  2 
[seed:  8]  0  1  2  3 
[seed:  9]  1  2  3  0 
[seed: 10]  2  3  0  1 
[seed: 11]  3  0  1  2 
[seed: 12]  0  1  2  3 
[seed: 13]  1  2  3  0 
[seed: 14]  2  3  0  1 
[seed: 15]  3  0  1  2 
[seed: 16]  0  1  2  3 
[seed: 17]  1  2  3  0 
[seed: 18]  2  3  0  1 
[seed: 19]  3  0  1  2 
[seed: 20]  0  1  2  3 
[seed: 21]  1  2  3  0 
[seed: 22]  2  3  0  1 
[seed: 23]  3  0  1  2 
Out[6]:
()

One of the first things that occured to me when I saw this library was the claim that it was space constant. This didn't really make sense to me. Afterall, if you are able to express any permutation of any list no matter with a space constant number, wouldn't that be a magic compression algorithm which violated information theory? After all, information theory tells us that it is impossible to create a compression algorithm which would cover the entire space of inputs and always successfully compress them to less than their origional size. The proof of this is pretty trivial. Say you have a decompressor d(compressed_input) → decompressed_output. For every decompressed_output there must be at least one compressed_input. Otherwise, your compression algorithm wouldn't cover the entire space of inputs. But in order to do that, you need to have enough bits to hold all those possible compressed_input values.

If you think about the example above where I listed every permutation of a sequence of four numbers, there are 24 possible permutations and the maximum seed value is 23 (there are 24 seed values). In order to express every possible permutation of any given sequence, the variable that holds the seed must be able to store a number as large as the number of possible permutations. That number can be quite large (it is calculated by n!, aka factorial(n). So what is the longest possible sequence that this library can mutate in all it's possible permutations?

Well, with the current source code that is about 4 billion permutations.

/// The `HashedPermutation` struct stores the initial `seed` and `length` of the permutation
/// vector. In other words, if you want to shuffle the numbers from `0..n`, then `length = n`.
///
/// Because the shuffle is performed using bit arithmetic, the fields have to be 32 bit integers.
/// Unfortunately, larger types are not supported at this time.
#[derive(Clone, Debug)]
pub struct HashedPermutation {
    /// The random seed that dictates which permutation you want to use. The shuffle is
    /// deterministic, so using the same seed will yield the same permutation every time.
    pub seed: u32,

    /// The upper bound on the range of numbers to shuffle (from `0..length`). This value must be
    /// greater zero, otherwise undefined behavior may occur.
    pub length: u32,
}

Looking at the wikipedia article I see that 13! is 6 227 020 800 which overflows a u32 integer, so the maximum sequence length that this library can handle is 12. Too bad, I was hoping to win a Hutter prize with this thing.

In [ ]: