Bit Vector in JavaScript


A bit vector (also known as bit set or bit array) is a set data structure which uses only 1 bit per element. It answers the question is X in the set?.

The main advantage is memory efficiency and speed, as the memory is allocated in a single continuous block, making it very cache friendly (unlike some tree based data structures), and requiring only a few bitwise operations to access/modify elements in the set. The disadvantage is that we need to know the size of the bit vector beforehand, and that we might be wasting some of the memory if we only store a few elements in a large vector. Let’s look at this more closely.

For simplicity, we can think of the bit vector as an array of bits. Not boolean true/false values, or bytes, but bits. Our goal is to map the set of all possible values we might want to store (also called the domain) to a unique index in the bit vector. A good example would be if we wanted a set of small integers (say for an algorithm like the prime sieve of Eratosthenes). We would then need as many bits as is the highest integer we might want to store. If the highest number is 1024, our vector would need 1024 bits, or 128 bytes to store all our membership values (flags).

As a small sidenote, you can use the Chrome developer console to run the example code. It was written in a way that you can copy paste each snippet as you go along and everything will work.

Implementation using basic Array and Number types

Let’s do a simple implementation first, using bare JavaScript arrays of Number. We can do this because the JavaScript bitwise operators treat their operands as 32 bit integers. Afterwards, we’ll do the same using the new Uint32Array type. But first, let’s use a regular Array to make things simpler.

To create a bit vector, we first need to specify the number of bits we need (which is also the number of possible values we can store membership of). The question becomes, how many 32-bit integers do we need to store N bits? The answer is unsurprisingly N / 32.

// A function which takes a number of bits and returns an initialized
// bit vector of given length.
function buildVector(bitCount) {
    // The number of bits each `Number` can store.
    var BITS_PER_ELEMENT = 32;

    // Total number of `Number` values in the vector.
    // We round up, because even if we need less than 32 bits, we need at least 1 `Number`.
    var elementCount = Math.ceil(bitCount / BITS_PER_ELEMENT);

    var vector = new Array(elementCount);

    // We initialize our bit vector to all zeros
    for (var i = 0; i < elementCount; i++) {
        vector[i] = 0;
    }

    return vector;
}

Now that we have our bit vector, all that is left to do is write the get and set methods for manipulating bit values by their respective bit index. We’ll first consider only having a single Number representing 32 bits.

Short introduction to binary

Binary numbers are represented as a sum of powers of two, for example:

  • \(1 = 2^0\)
  • \(2 = 2^1\)
  • \(3 = 1 + 2 = 2^0 + 2^1\)
  • \(4 = 2^2\)

If you don’t have much experience with binary, you might be tempted to think that \(4 = 1 + 3 = 2^0 + (2^0 + 2^1)\), but that wouldn’t work, as we only want one of each power of two. A simple rule to achieve this is that we do a greedy approach, starting from the biggest power of two we can.

  • 7 … we can fit a 4 into that, which means it’s \(4 + 3\), and we have to convert the 3, which is \(2 + 1\), resulting in \(4 + 2 + 1\) or \(2^2 + 2^1 + 2^0\)
  • 14 … we can fit an 8, leaving us with 6, on which we iterate the same rule and get \(4 + 2\), resulting in \(8 + 4 + 2\) or \(2^3 + 2^2 + 2^1\)

A binary number is then simply a sequence of 0 or 1, stating 1 for each power of 2 we have (going from the lowest from the right), and 0 for each one we don’t have, thus:

  • \(7 = 2^2 + 2^1 + 2^0\) which we can write as \(1 \cdot 2^2 + 1 \cdot 2^1 + 1 \cdot 2^0\), which gives us \(111\)
  • \(14 = 2^3 + 2^2 + 2^1\) which we can write as \(1 \cdot 2^3 + 1 \cdot 2^2 + 1 \cdot 2^1 + 0 \cdot 2^0\), which gives us \(1110\) reading from the right

Note that we can add any number of zeros to the left, so 111 is equivalent to 0111 and 000000111.

A small sidestep here, we can also use hexadecimal numbers to represent binary, as the conversion is rather simple. A hexadecimal digit represents a value from 0 to 15, which is exactly what 4 bits represent. We can thus take any binary number, such as 100000101011111010 and convert it to hex (or back):

  • look at it as groups of 4 bits from the right 10 0000 1010 1111 1010
  • add leading zeros to the leftmost group if needed 0010 0000 1010 1111 1010
  • convert each individual group 2 0 10 15 10 to decimal
  • write each decimal in hex 2 0 A F A
  • put them back together, prefixing with 0x to get 0x20AFA

Converting hexadecimal back to binary is simple, just take these steps backwards. The reason we use hexadecimal numbers instead of binary often is because they are much easier to visually parse, understand and remember. Looking at a number like 0xA1 is much clearer than looking at 10100001, because you don’t have to count how long is each run of zeros/ones.

Bitwise operators

To manipulate individual bits, we’ll make use of a few simple bitwise operators. They are called bitwise, because they manipulate individual bits. Specifically, we’ll need:

  • negation (NOT), using the ~ (tilde) operator, which simply flips all the bits
  • conjunction (AND), using the & operator, which returns 1 when both bits are 1, otherwise 0
  • disjunction (OR), using the | operator, which returns 0 when both bits are 0, otherwise 1
  • left shift, using the << operator, which is shifting all the bits to the left, or semantically multiplying by a given power of 2.

Because these operators are bitwise, they will operate on all the bits in parallel. This is different from the more common logical operators && and || (note that they’re doubled), which operate on the whole numbers. Here are a few examples (the b suffix signifies a binary string, this is not proper JavaScript syntax and only used for demonstration purposes):

  • 1 | 2 = 01b | 10b = 11b = 3
  • 1 || 2 = 1, because both are converted to booleans, and both are true
  • 1 & 3 = 01b & 11b = 01b = 2
  • 1 && 3 = 1, because both are converted to booleans, and both are true
  • 1 << 1 = 1b << 1 = 10b = 2, or \(1 \cdot 2^1\)
  • 1 << 3 = 1b << 3 = 1000b = 8, or \(1 \cdot 2^3\)

To understand the NOT operator, we first need to understand that while mathematically, binary numbers are infinite (or can be), we are only working with 32-bit integers. This means if we start with 0 and do a negation ~0, we get a 32 bit number with all bits set to 1.

Because JavaScript uses two’s complement, ~0 will actually be -1 (or 11111111111111111111111111111111 in binary). This is because the Number type behaves as a signed 32-bit number, which means it also has to represent negative values. The important thing here is that two’s complement doesn’t say anything about the actual bits, it only specifies what value those bits represent when doing other mathematical operations (+, -, *, etc.). It also affects how the browser will display each number. If you want to learn more about two’s complement, check out the wikipedia article or the following online calculator (there are many others) to get an idea for how it works.

But since our bit vector doesn’t need to do arithmetic, we don’t really need to worry about this. We might occasionally want to print out a given number in hex or binary, which can be done using the .toString function, for example (14).toString(2) outputs binary 1110 and (14).toString(16) outputs hex e.

Knowing how binary and bitwise operators work, we can finally figure out how to set a specific bit in a given 32-bit integer. The OR operator | is perfect for this, as it won’t change a 1 to 0 (and thus leaving the existing values alone), but will be able to set a 0 to 1. Counting from 0, if we want to set the 1st bit (at index 0) to 1, we simply do num | 1, as this can also be read as num | 00000000000000000000000000000001b. If we wanted to set the 2nd bit (at index 1), we’d want num | 10b, or num | 2 in decimal/hex. The 3rd bit (at index 2) would be num | 100b or num | 4, the 4th bit (at index 3) would be num | 1000b or num | 8, and so on. We’ll call the number on the right side of the operator a bit mask.

If you look closely, you can probably figure out the pattern. To set the i-th bit, we need to OR the number with \(2^i\), which can be easily created with a left shift as 1 << i. The whole operation then becomes num | (1 << i). Before moving on, let’s do a more visual example. We’ll start with num = 0xDA (or 11011010, or 218 dec), and toggle the 3rd bit (index 2).

num  11011010
mask 00000100
OR | --------
     11011110
````

We can also use the same operation to check if a given bit is set. As all the bits except for the `i`-th are zero, we can use the `&` operator, which will return a non-zero number if and only if the `i`-th bit in `num` is 1. The `get` operation is then `num & (1 << i)`.

Lastly, we might want to remove elements from the bit vector, which means we need the ability to clear a specific bit. A small recap of the `&` operator.
& 0 1
0 0 0
1 0 1


We can see that if we set the mask to all `1`, doing `num & 1111...1111b` doesn't change the `num` value. We also see that no matter what value is in `num`, if any of the bits in the mask is `0`, the resulting bit will also be `0`. Thus `num & 11101b` will set the 2nd bit from the right (index 1) to `0` and leave all of the other bits intact.

Constructing such mask is simple, since we only need to take our `set` mask from before and flip all the bits using the NOT operator `~`. Resulting in `num & (~(1 << i))`. I've added extra parentheses to make the order of operations clear. Beware that `&` and `|` have a very low priority, so it might be a good idea to be very explicit with parens around bit operations unless you're sure what you're doing is correct.

Here's a similar example as we did for OR, starting with `num = 0xDA`, clearing the 7th bit (index 6). We first construct the mask step by step `1 << 6 = 01000000b`, followed by a negation `~01000000 = 10111111`.

```text
num   11011010
mask  10111111
AND & --------
      10011010

Implementing get, set and clear on a 32-bit vector

As mentioned before, let’s first consider only a 32-bit vector stored in a single Number. The operations would be:

// Set the i-th bit to 1
function set(vec, i) {
    return vec | (1 << i);
}

// Clear the i-th bit
function clear(vec, i) {
    return vec & (~(1 << i));
}

// Return the value of the i-th bit
function get(vec, i) {
    var value = vec & (1 << i);
    // we convert to boolean to make sure the result is always 0 or 1,
    // instead of what is returned by the mask
    return value != 0;
}

Note that all of these functions return a new number as their result. Let’s test to see if it works:

// Since our bit vector is stored in a single number, we simply initialize it as 0.
var vec = 0;

vec = set(vec, 3);
console.log("is 3 in vec? " + get(vec, 3));
// is 3 in vec? true
console.log("is 4 in vec? " + get(vec, 4));
// is 4 in vec? false

vec = clear(vec, 3);
console.log("is 3 in vec? " + get(vec, 3));
// is 3 in vec? false

Remember the number only has 32-bits, so don’t use an index bigger than 31. If you do, it will simply wrap around, so you’ll get set(0, 0) == set(0, 32).

Implementing get, set and clear on an arbitrary length bit vector

Now we’re finally ready to create the whole data structure, an arbitrary length bit vector. We need to modify our get, set, and clear to calculate the right Number within the array first, and then to do the same bit manipulation they did before.

Going again from the right, bits 0 - 31 will be stored in the 1st Number (at index 0), bits 32 - 63 at index 1, 64 - 95 at index 2, etc. From this, we can see that the index in the bigger array is simply the bit index divided by 32 and rounded down. Simply Math.floor(i / 32). This gives us the index of the Number.

To get the bit index within the number, we simply take the remainder of dividing by 32, or the modulo 32 of the original bit index. This gives us i % 32 for the bit index. Putting this together (note that since we’re using an Array, the bit vector is now mutable, unlike the previous 32-bit version using only a Number). I’ve added the original buildVector function to make it easier to copy paste this code as a whole.

// Set the i-th bit to 1
function set(vec, i) {
    var bigIndex = Math.floor(i / 32);
    var smallIndex = i % 32;

    vec[bigIndex] = vec[bigIndex] | (1 << smallIndex);
}

// Clear the i-th bit
function clear(vec, i) {
    var bigIndex = Math.floor(i / 32);
    var smallIndex = i % 32;

    vec[bigIndex] = vec[bigIndex] & (~(1 << smallIndex));
}

// Return the value of the i-th bit
function get(vec, i) {
    var bigIndex = Math.floor(i / 32);
    var smallIndex = i % 32;

    var value = vec[bigIndex] & (1 << smallIndex);
    // we convert to boolean to make sure the result is always 0 or 1,
    // instead of what is returned by the mask
    return value != 0;
}

// A function which takes a number of bits and returns an initialized
// bit vector of given length.
function buildVector(bitCount) {
    // Total number of `Number` values in the vector.
    // Adding Math.ceil here to make sure we allocate enough space even if the size
    // is not divisible by 32.
    var elementCount = Math.ceil(bitCount / 32);
    var vector = new Array(elementCount);

    for (var i = 0; i < elementCount; i++) {
        vector[i] = 0;
    }

    return vector;
}

We can do a similar test as we did before to test our bit vector:

// Since our bit vector is stored in a single number, we simply initialize it as 0.
var vec = buildVector(64);

set(vec, 30);
console.log("is 30 in vec? " + get(vec, 30));
// is 30 in vec? true
console.log("is 40 in vec? " + get(vec, 40));
// is 40 in vec? false

clear(vec, 30);
console.log("is 30 in vec? " + get(vec, 30));
// is 30 in vec? false

Using Uint32Array instead of an Array of Number

Modern browsers now provide a better and more efficient variant to an Array of Number, which is using Uint32Array. The difference here is that JavaScript Array is not exactly the array you would expect if you came out of a computer science class. It behaves more like a hash map with integer keys. You can also store different types in the same array, for example [1, "hello"] is completely valid JavaScript. Secondly, Number is not a 32-bit integer. According to the standard, Number is a IEEE-754 double precision float. The trick here is that the bitwise operators convert their operands to a 32-bit integer before applying the operation. The conversion is defined as an abstract operation, so it most likely comes down to how the implementation chooses to handle things.

The ideal scenario would be that the JIT (just in-time compiler) recognizes that we’re only doing bitwise operations on something that starts out as a constant zero, and thus uses a 32-bit integer as the backing store for our data, and also recognizes that the array doesn’t contain anything else, so it wouldn’t have to use a generic implementation that allows different types, but rather a continuous block of memory. While this might be possible, it’s most likely not what happens, at least not something that can be guarateed to happen 100% of the time, because the JIT would need to understand everything your code is doing to prove that such optimization is possible. The halting problem however proves that the compiler can’t understand any arbitrary code, and as such any optimization could be only based on heuristics.

This is why the Uint32Array type was added to JavaScript. While the compiler/interpreter/JIT can’t know that we only intend to use 32-bit integers, we as the programmers do know it, so we can choose a more specific data structure that allows for exactly that. Uint32Array is a type which has only one purpose, to store unsigned 32-bit integers in a continuous block of memory.

Using it is actually even simpler than what we did before, as our buildVector function turns into a one liner.

function buildVector(bitCount) {
    // The constructor accepts a number of 32-bit integers in the array,
    // which is simply the number of bits in our bit vector divided by 32.
    // We also keep the `Math.ceil` just to make the API more robust.
    return new Uint32Array(Math.ceil(bitCount / 32));
}

On the outside, the Uint32Array behaves just like an Array, with the exception that the operands to the indexer [] operator get converted to unsigned 32-bit integers.

var arr = new Uint32Array(10);

arr[0] = "foo";
arr[1] = "123";
arr[2] = 3.14;

console.log(arr[0] + " " + arr[1] + " " + arr[2]);
// 0 123 3

Everything else about the bit vector (set, get, and clear) will stay the same, so there isn’t really anything we’re giving up for using the more efficient Uint32Array version.

Conclusion

If you’ve read this far, you should now feel pretty confident about how the bit vector works, and be able to implement it yourself. A bit vector might not be the most popular data structure, but it can come handy in various different scenarios. A specific example could be using binary frames with the WebSocket API, in which case you might want to minimize the network traffic as much as possible. When working with binary frames, you will most certainly run into Uint32Array and bitwise operators, so at least knowing how a bit vector works can help you there. It’s also useful to know that there are other built-in array types with predefined length, such as Uint8Array, Int32Array (note the lack of U, as this is a signed integer version of a 32-bit array), Float64Array, etc. For more details on these check out the Indexed collections section under Global Objects on MDN. You might also be interested in seeing browser support of typed arrays in the modern browsers and different polyfill options.

Lastly, I’d like to note a little bit about dynamically sized bit vectors. Much like a regular array, a bit vector can also be implemented in a way that allows for resizing. In the Array variant, we would just need to push a few additional zeroed Number instances into the array to make the bit vector larger, while the Uint32Array variant would require us to allocate a new Uint32Array with larger size and copy things over. At first it might seem like the Array variant is clearly superior in this regard, but here’s a few thoughts:

  • if the JIT recognized our Array should use an efficient packed 32-bit integer block of memory to store the data, pushing a new element into it would do exactly the same as if we create a new Uint32Array (there could be more optimizations going on, but the same could be said for the JIT optimizing a resize of the Uint32Array variant)
  • if the Array is backed by a generic array of objects with extended capacity for pushing new elements into it, the push itself wouldn’t cost as much, but there could be a price paid in terms of performance of the regular set, get and clear operations

Note that this is mostly food for thought, I haven’t done any benchmarks comparing the two variants, and could be very wrong with regards what happens in actual JavaScript implementations. If I was made to guess, I’d say the Uint32Array would outperform the Array even with an occasional resize. But feel free to correct me on this in the comments.

References

See also


Liked this blog post? Share it!


comments powered by Disqus