Learning from bidict

Below are some of the more fascinating Python corners I got to explore further thanks to working on bidict.

If you are interested in learning more about any of the following, reviewing the (small) codebase could be a great way to get started.


The following is just an outline. Expand and provide more references and examples.

Python’s data model

  • Using object.__new__() to bypass default object initialization, e.g. for better copy() performance

    • See _base.py for an example
  • Overriding object.__getattribute__() for custom attribute lookup

  • Using object.__getstate__(), object.__setstate__(), and object.__reduce__() to make an object pickleable that otherwise wouldn’t be, due to e.g. using weakrefs (see below)

  • Using __slots__ to speed up attribute access and reduce memory usage

    • Must be careful with pickling and weakrefs, see BidictBase.__getstate__()
  • Making an immutable type hashable, i.e. insertable into dicts and sets

    • See object.__hash__() and object.__eq__() docs

      • Interestingly, unlike other attributes, if a class implements __hash__(), any subclasses will not inherit it, as if Python implicitly adds __hash__ = None to the body of every class that doesn’t define __hash__ otherwise. So if you do want a subclass to inherit a base class’s __hash__() implementation, you have to set that manually (e.g. by adding __hash__ = BaseClass.__hash__ in the class body, which is exactly what FrozenOrderedBidict does).
      • This is consistent with the fact that object implements __hash__(), but subclasses of object are not hashable by default.
    • If overriding object.__eq__():

    • How this affects hashable ordered collections like FrozenOrderedBidict that have an order-insensitive __eq__()

      • All contained items must participate in the hash, order-insensitively

      • The collections.abc.Set._hash method provides a pure Python implementation of the same hash algorithm used to hash frozensets.

        Since ItemsView extends Set, bidict.frozenbidict.__hash__() can just call ItemsView(self)._hash().

        • Why is collections.abc.Set._hash() private?
        • Why isn’t the C implementation of this algorithm directly exposed in CPython? Only way to use it is to call hash(frozenset(self.items())), which wastes memory allocating the ephemeral frozenset, and time copying all the items into it before they’re hashed.
  • Resulting corner cases produce possibly surprising results:

    • See nan As Key
    • See pywat#38 for some surprising results when keys of (related but) different types compare equal, or when a hashable type’s __eq__() is intransitive (as in OrderedDict):
    • If a bidict contains the same items as another Mapping of a different subtype, should the bidict compare equal to the other mapping? Or should it at least compare unequal if the other instance is not also a BidirectionalMapping? Or should it return the NotImplemented object?
      • bidict’s __eq__() design errs on the side of allowing more type polymorphism, on the grounds that this is probably what the majority of use cases expect and that this is more Pythonic.
      • Any user who does need exact-type-matching equality can just override bidict’s __eq__() method in a subclass.
        • If this subclass were also hashable, would it be worth overriding bidict.frozenbidict.__hash__() too to include the type?
        • Only point would be to reduce collisions when multiple instances of different types contained the same items and were going to be inserted into the same dict or set (since they’d now be unequal but would hash to the same value otherwise). Seems rare, probably not worth it.

Other interesting things discovered in the standard library

namedtuple()-style dynamic class generation

  • See _named.py for an example

How to efficiently implement an ordered mapping

API Design

  • Integrating with collections via collections.abc and abc
  • Implementing ABCs like collections.abc.Hashable
  • Thanks to Hashable implementing abc.ABCMeta.__subclasshook__(), any class that implements all the required methods of the Hashable interface (namely, __hash__()) makes it a virtual subclass already, no need to explicitly extend. I.e. As long as Foo implements a __hash__() method, issubclass(Foo, Hashable) will always be True, no need to explicitly subclass via class Foo(Hashable): ...
  • collections.abc.Mapping and collections.abc.MutableMapping don’t implement __subclasshook__(), so must either explicitly subclass (if you want to inherit any of their implementations) or use abc.ABCMeta.register() (to register as a virtual subclass without inheriting any implementation)
  • Providing a new open ABC like BidirectionalMapping
    • Just override __subclasshook__()!
    • See _abc.py for an example
    • Interesting consequence of the __subclasshook__() design: the “subclass” relation is now intransitive, e.g. object is a subclass of Hashable, list is a subclass of object, but list is not a subclass of Hashable
  • Notice we have collections.abc.Reversible but no collections.abc.Ordered or collections.abc.OrderedMapping
    • Would have been useful for bidict’s __repr__() implementation (see _base.py), and potentially for interop with other ordered mapping implementations such as SortedDict
  • Beyond collections.abc.Mapping, bidicts implement additional APIs that dict and OrderedDict implement.
    • When creating a new API, making it familiar, memorable, and intuitive is hugely important to a good user experience.
  • Making APIs Pythonic
    • Zen of Python
    • “Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess.” → bidict’s default duplication policies
    • “Explicit is better than implicit. There should be one—and preferably only one—obvious way to do it.” → dropped the alternate .inv APIs that used the ~ operator and the old slice syntax


Correctness, performance, code quality, etc.

bidict provided a need to learn these fantastic tools, many of which have been indispensable (especially hypothesis – see test_hypothesis.py):