Addendum#
Performance#
bidict
is written to be as performant as possible
without sacrificing other important goals,
such as safety, portability, and maintainability.
In general, using a bidict
to maintain a bidirectional mapping
should exhibit about the same performance as
keeping two mutually-inverse one-directional mappings
in sync manually.
The test suite includes benchmarks so that bidict’s performance
can be continuously measured and improved.
If you spot an opportunity to improve bidict
’s performance further,
please don’t hesitate to
file an issue or submit a pull request.
bidict
Avoids Reference Cycles#
A careful reader might notice the following…
>>> fwd = bidict(one=1)
>>> inv = fwd.inverse
>>> inv.inverse is fwd
True
…and worry that a bidict
and its inverse
create a reference cycle.
If this were true,
in CPython this would mean that the memory for a bidict
could not be immediately reclaimed when you retained no more references to it,
but rather would have to wait for the next garbage collection to kick in
before it could be reclaimed.
However, bidict
s use a weakref.ref
to store the inverse reference in one direction,
avoiding the strong reference cycle.
As a result, when you no longer retain
any references to a bidict
you create,
you can be sure that its refcount in CPython drops to zero,
and that its memory will therefore be reclaimed immediately.
Note
In PyPy this is not an issue, as PyPy doesn’t use reference counts. The memory for unreferenced objects in PyPy is only reclaimed when GC kicks in, which is unpredictable.
Terminology#
It’s intentional that the term “inverse” is used rather than “reverse”.
Consider a collection of (k, v) pairs. Taking the reverse of the collection can only be done if it is ordered, and (as you’d expect) reverses the order of the pairs in the collection. But each original (k, v) pair remains in the resulting collection.
By contrast, taking the inverse of such a collection neither requires the collection to be ordered nor guarantees any ordering in the result, but rather just replaces every (k, v) pair with the inverse pair (v, k).
“keys” and “values” could perhaps more properly be called “primary keys” and “secondary keys” (as in a database), or even “forward keys” and “inverse keys”, respectively.
bidict
sticks with the terms “keys” and “values” for the sake of familiarity and to avoid potential confusion, but technically values are also keys themselves.Concretely, this allows
bidict
s to return a set-like (dict_keys) object forvalues()
, rather than a non-set-like dict_values object.
Missing bidict
s in the Standard Library#
The Python standard library actually contains some examples
where bidict
s could be used for fun and profit
(depending on your ideas of fun and profit):
The
logging
module contains a private_levelToName
dict which maps integer levels like 10 to their string names like DEBUG. If I had a nickel for every time I wanted that exposed in a bidirectional map (and as a public attribute, no less), I bet I could afford some better turns of phrase.The
dis
module maintains a mapping from opnames to opcodesdis.opmap
and a separate list of opnames indexed by opcodedis.opnames
. These could be combined into a single bidict.Python 3’s
html.entities
module maintains separatehtml.entities.name2codepoint
andhtml.entities.codepoint2name
dicts. These could be combined into a single bidict.
Caveats#
Non-Atomic Mutation#
As with built-in dicts,
mutating operations on a bidict
are not atomic.
If you need to mutate the same bidict
from different threads,
use a
synchronization primitive
to coordinate access. [1]
Equivalent but distinct Hashable
s#
Consider the following:
>>> d = {1: int, 1.0: float}
How many items do you expect d to contain? The actual result might surprise you:
>>> len(d)
1
And similarly,
>>> dict([(1, int), (1.0, float), (1+0j, complex), (True, bool)])
{1: <... 'bool'>}
>>> 1.0 in {True}
True
(Note that 1 == 1.0 == 1+0j == True
.)
This illustrates that a mapping cannot contain two items with equivalent but distinct keys (and likewise a set cannot contain two equivalent but distinct elements). If an object that is being looked up in a set or mapping is equal to a contained object, the contained object will be found, even if it is distinct.
With a bidict
,
since values function as keys in the inverse mapping,
this behavior occurs in the inverse direction too,
and means that a bidict
can end up with a different
but equivalent key from the corresponding value
in its own inverse:
>>> b = bidict({'false': 0})
>>> b.forceput('FALSE', False)
>>> b
bidict({'FALSE': False})
>>> b.inverse
bidict({0: 'FALSE'})
nan as a Key#
In CPython, nan is especially tricky when used as a dictionary key:
>>> d = {float('nan'): 'nan'}
>>> d
{nan: 'nan'}
>>> d[float('nan')]
Traceback (most recent call last):
...
KeyError: nan
>>> d[float('nan')] = 'not overwritten'
>>> d
{nan: 'nan', nan: 'not overwritten'}
In other Python implementations such as PyPy,
nan behaves just like any other dictionary key.
But in CPython, beware of this unexpected behavior,
which applies to bidict
s too.
bidict
contains no special-case logic
for dealing with nan as a key,
so bidict’s behavior will match dict
’s
on whatever runtime you’re using.
See e.g. these docs for more info (search the page for “nan”).
Simultaneous Assignment#
bidict
s may behave differently
from dicts with respect to so-called “simultaneous assignment”.
Consider the following:
>>> m = {'a': 'a', 'b': 'b'}
>>> m['a'], m['b'] = m['b'], m['a'] # swap two values
>>> m
{'a': 'b', 'b': 'a'}
With a bidict
,
simultaneous assignment cannot be used
to swap two values in this way:
>>> m = bidict({'a': 'a', 'b': 'b'})
>>> m['a'], m['b'] = m['b'], m['a']
Traceback (most recent call last):
...
bidict.KeyAndValueDuplicationError: ('a', 'b')
This is because “simultaneous” assignments like the above are by definition just syntax sugar for:
# desugaring: m['a'], m['b'] = m['b'], m['a']
tmp = (m['b'], m['a'])
m['a'] = tmp[0]
m['b'] = tmp[1]
and so the intermediate m['a'] = tmp[0]
assignment
raises KeyAndValueDuplicationError
before the second half of the swap assignment has a chance to run.
For a working alternative, you can write:
>>> m.forceupdate({m['a']: m['b'], m['b']: m['a']})
>>> m
bidict({'a': 'b', 'b': 'a'})
For more in this vein, check out Learning from bidict.