Learning from bidict

Below are some of the more fascinating Python corners I got to explore further thanks to working on bidict.

If you are interested in learning more about any of the following, reading through or even contributing to bidict’s code could be a great way to get started.

Todo

The following is just an outline. Expand and provide more references and examples.

Python’s data model

  • Using object.__new__() to bypass default object initialization, e.g. for better copy() performance

  • Overriding object.__getattribute__() for custom attribute lookup

  • Using object.__getstate__(), object.__setstate__(), and object.__reduce__() to make an object pickleable that otherwise wouldn’t be, due to e.g. using weakrefs (see below)

  • Using __slots__ to speed up attribute access and reduce memory usage

    • Must be careful with pickling and weakrefs, see frozenbidict.__getstate__
  • Making an immutable type hashable, i.e. insertable into dicts and sets

    • See object.__hash__() and object.__eq__() docs

    • If overriding object.__eq__():

    • How this affects hashable ordered collections like FrozenOrderedBidict that have an order-insensitive __eq__()

      • All contained items must participate in the hash, order-insensitively

      • The collections.abc.Set._hash method provides a pure Python implementation of the same hash algorithm used to hash frozensets.

        Since ItemsView extends Set, frozenbidict can just call ItemsView(self)._hash().

        • Why is collections.abc.Set._hash() private?
        • Why isn’t the C implementation of this algorithm directly exposed in CPython? Only way to use it is to call hash(frozenset(self.items())), which wastes memory allocating the ephemeral frozenset, and time copying all the items into it before they’re hashed.
  • Resulting corner cases produce possibly surprising results:

    • See nan As Key
    • See pywat#38 for some surprising results when keys of (related but) different types compare equal, or when a hashable type’s __eq__() is intransitive (as in OrderedDict):
    • If a bidict contains the same items as another Mapping of a different subtype, should the bidict compare equal to the other mapping? Or should it at least compare unequal if the other instance is not also a BidirectionalMapping? Or should it return the NotImplemented object?
      • bidict’s __eq__() design errs on the side of allowing more type polymorphism, on the grounds that this is probably what the majority of use cases expect and that this is more Pythonic.
      • Any user who does need exact-type-matching equality can just override bidict’s __eq__() method in a subclass.
        • If this subclass were also hashable, would it be worth overriding bidict.frozenbidict.__hash__() too to include the type?
        • Only point would be to reduce collisions when multiple instances of different frozenbidict subclasses contained the same items and were going to be inserted into the same dict or set (since they’d now be unequal but would hash to the same value otherwise). Seems rare, probably not worth it.

Other interesting things discovered in the standard library

namedtuple()-style dynamic class generation

How to efficiently implement an ordered mapping

API Design

Correctness, performance, code quality, etc.

bidict provided a need to learn these fantastic tools, many of which have been indispensable (especially hypothesis – see bidict’s usage):