Learning from bidict

Below is an outline of some of the more fascinating Python corners I got to explore further thanks to working on bidict.

If you are interested in learning more about any of the following, reviewing the (small) codebase could be a great way to get started.

Python’s data model

  • Using __new__() to bypass default object initialization, e.g. for better copy() performance. See _base.py.

  • Overriding object.__getattribute__() for custom attribute lookup. See Sorted Bidict Recipes.

  • Using object.__getstate__(), object.__setstate__(), and object.__reduce__() to make an object pickleable that otherwise wouldn’t be, due to e.g. using weakrefs, as bidicts do (covered further below).

  • Using __slots__ to speed up attribute access and reduce memory usage. Must be careful with pickling and weakrefs. See BidictBase.__getstate__().

  • What happens when you implement a custom __eq__()? e.g. a == b vs. b == a when only a is an instance of your class? Great write-up in https://eev.ee/blog/2012/03/24/python-faq-equality/

  • Making an immutable type hashable (so it can be inserted into dicts and sets): Must implement __hash__() such that a == b hash(a) == hash(b). See the object.__hash__() and object.__eq__() docs. See bidict.frozenbidict.

    • Consider FrozenOrderedBidict: its __eq__() is order-insensitive. So all contained items must participate in the hash order-insensitively.

    • Can use collections.abc.Set._hash which provides a pure Python implementation of the same hash algorithm used to hash frozensets. (Since ItemsView extends Set, bidict.frozenbidict.__hash__() just calls ItemsView(self)._hash().)

      • Does this argue for making collections.abc.Set._hash() non-private?
      • Why isn’t the C implementation of this algorithm directly exposed in CPython? Only way to use it is to call hash(frozenset(self.items())), which wastes memory allocating the ephemeral frozenset, and time copying all the items into it before they’re hashed.
    • Unlike other attributes, if a class implements __hash__(), any subclasses of that class will not inherit it. It’s like Python implicitly adds __hash__ = None to the body of every class that doesn’t explicitly define __hash__. So if you do want a subclass to inherit a base class’s __hash__() implementation, you have to set that manually, e.g. by adding __hash__ = BaseClass.__hash__ in the class body. See FrozenOrderedBidict.

      This is consistent with the fact that object implements __hash__(), but subclasses of object are not hashable by default.

  • Surprising Mapping corner cases:

  • If an instance of your custom mapping type contains the same items as a mapping of another type, should they compare equal? What if one of the mappings is ordered and the other isn’t? What about returning the NotImplemented object?

    • bidict’s __eq__() design errs on the side of allowing more type polymorphism on the grounds that this is what the majority of use cases expect, and that it’s more Pythonic.
    • Any user who does need exact-type-matching equality can just override bidict’s __eq__() method in a subclass.
      • If this subclass were also hashable, would it be worth overriding bidict.frozenbidict.__hash__() too to include the type?
      • Only point would be to reduce collisions when multiple instances of different types contained the same items and were going to be inserted into the same dict or set (since they’d now be unequal but would hash to the same value otherwise). Probably not worth it.

Using weakref

See inv Avoids Reference Cycles. The doubly-linked lists that back ordered bidicts also use weakrefs to avoid creating strong reference cycles.

Other interesting stuff in the standard library

Subclassing namedtuple() classes

To get the performance benefits, intrinsic sortability, etc. of namedtuple() while customizing behavior, state, API, etc., you can subclass a namedtuple() class. Just make sure to include __slots__ = (), or you’ll lose a lot of the performance benefits.

_marker.py contains a small example. Here’s a larger one:

>>> from collections import namedtuple
>>> from itertools import count

>>> class Node(namedtuple('_Node', 'cost tiebreaker data parent')):
...     """Represent nodes in a graph traversal. Suitable for use with e.g. heapq."""
...
...     __slots__ = ()
...     _counter = count()  # break ties between equal-cost nodes, avoid comparing data
...
...     # Give call sites a cleaner API for creating new Nodes
...     def __new__(cls, cost, data, parent=None):
...         tiebreaker = next(cls._counter)
...         return super(Node, cls).__new__(cls, cost, tiebreaker, data, parent)
...
...     @property
...     def depth(self):
...         return self.parent.depth + 1 if self.parent else 0
...
...     def __repr__(self):
...         return 'Node(cost={cost}, data={data!r})'.format(**self._asdict())

>>> start = Node(cost=0, data='foo')
>>> child = Node(cost=5, data='bar', parent=start)
>>> child
Node(cost=5, data='bar')
>>> child.parent
Node(cost=0, data='foo')
>>> child.depth
1

namedtuple()-style dynamic class generation

See _named.py.

How to efficiently implement an ordered mapping

  • Use a backing dict and doubly-linked list.
  • See _orderedbase.py. OrderedDict provided a good reference.

API Design

  • Integrating with collections via collections.abc and abc
  • Implementing ABCs like collections.abc.Hashable
  • Thanks to Hashable implementing abc.ABCMeta.__subclasshook__(), any class that implements all the required methods of the Hashable interface (namely, __hash__()) makes it a virtual subclass already, no need to explicitly extend. I.e. As long as Foo implements a __hash__() method, issubclass(Foo, Hashable) will always be True, no need to explicitly subclass via class Foo(Hashable): ...
  • collections.abc.Mapping and collections.abc.MutableMapping don’t implement __subclasshook__(), so must either explicitly subclass (if you want to inherit any of their implementations) or use abc.ABCMeta.register() (to register as a virtual subclass without inheriting any implementation)
  • Providing a new open ABC like BidirectionalMapping
  • Notice we have collections.abc.Reversible but no collections.abc.Ordered or collections.abc.OrderedMapping. Proposed in bpo-28912 but rejected. Would have been useful for bidict’s __repr__() implementation (see _base.py), and potentially for interop with other ordered mapping implementations such as SortedDict
  • Beyond collections.abc.Mapping, bidicts implement additional APIs that dict and OrderedDict implement.
    • When creating a new API, making it familiar, memorable, and intuitive is hugely important to a good user experience.
  • Making APIs Pythonic
    • Zen of Python
    • “Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess.” → bidict’s default duplication policies
    • “Explicit is better than implicit. There should be one—and preferably only one—obvious way to do it.” → dropped the alternate .inv APIs that used the ~ operator and the old slice syntax

Portability

Python Syntax hacks

bidict used to support slice syntax for looking up keys by value.

See this for an example of how it was implemented.

See #19 for why it was dropped.

Tools

See Projects for some of the fantastic tools for software verification, performance, code quality, etc. that bidict has provided a reason to learn and master.