Learning from bidict¶
Below is an outline of some of the more fascinating Python corners I got to explore further thanks to working on bidict.
If you are interested in learning more about any of the following, reviewing the (small) codebase could be a great way to get started.
Python’s data model¶
Using
object.__new__()
to bypass default object initialization, e.g. for bettercopy()
performance. See_base.py
.Overriding
object.__getattribute__()
for custom attribute lookup. See Sorted Bidict Recipes.Using
object.__getstate__()
,object.__setstate__()
, andobject.__reduce__()
to make an object pickleable that otherwise wouldn’t be, due to e.g. using weakrefs, as bidicts do (covered further below).Using __slots__ to speed up attribute access and reduce memory usage. Must be careful with pickling and weakrefs. See
BidictBase.__getstate__()
.What happens when you implement a custom
__eq__()
? e.g.a == b
vs.b == a
when onlya
is an instance of your class? Great write-up in https://eev.ee/blog/2012/03/24/python-faq-equality/Making an immutable type hashable (so it can be inserted into
dict
s andset
s): Must implement__hash__()
such thata == b ⇒ hash(a) == hash(b)
. See theobject.__hash__()
andobject.__eq__()
docs. Seebidict.frozenbidict
.Consider
FrozenOrderedBidict
: its__eq__()
is order-insensitive. So all contained items must participate in the hash order-insensitively.Can use collections.abc.Set._hash which provides a pure Python implementation of the same hash algorithm used to hash
frozenset
s. (SinceItemsView
extendsSet
,bidict.frozenbidict.__hash__()
just callsItemsView(self)._hash()
.)- Does this argue for making
collections.abc.Set._hash()
non-private? - Why isn’t the C implementation of this algorithm directly exposed in
CPython? Only way to use it is to call
hash(frozenset(self.items()))
, which wastes memory allocating the ephemeral frozenset, and time copying all the items into it before they’re hashed.
- Does this argue for making
Unlike other attributes, if a class implements
__hash__()
, any subclasses of that class will not inherit it. It’s like Python implicitly adds__hash__ = None
to the body of every class that doesn’t explicitly define__hash__
. So if you do want a subclass to inherit a base class’s__hash__()
implementation, you have to set that manually, e.g. by adding__hash__ = BaseClass.__hash__
in the class body. SeeFrozenOrderedBidict
.This is consistent with the fact that
object
implements__hash__()
, but subclasses ofobject
are not hashable by default.
Surprising
Mapping
corner cases:- nan as key
- Equivalent but distinct Hashables
- pywat#38
- “Intransitive equality
(of
OrderedDict
) was a mistake.” –Raymond Hettinger - Hence __eq__() is order-insensitive for ordered bidicts.
- “Intransitive equality
(of
If an instance of your custom mapping type contains the same items as a mapping of another type, should they compare equal? What if one of the mappings is ordered and the other isn’t? What about returning the
NotImplemented
object?- bidict’s
__eq__()
design errs on the side of allowing more type polymorphism on the grounds that this is what the majority of use cases expect, and that it’s more Pythonic. - Any user who does need exact-type-matching equality can just override
bidict’s __eq__()
method in a subclass.- If this subclass were also hashable, would it be worth overriding
bidict.frozenbidict.__hash__()
too to include the type? - Only point would be to reduce collisions when multiple instances of different
types contained the same items
and were going to be inserted into the same
dict
orset
(since they’d now be unequal but would hash to the same value otherwise). Probably not worth it.
- If this subclass were also hashable, would it be worth overriding
- bidict’s
Other interesting stuff in the standard library¶
reprlib
andreprlib.recursive_repr()
(but not needed for bidict because there’s no way to insert a bidict into itself)operator.methodcaller()
platform.python_implementation
- See Missing bidicts in Stdlib!
Subclassing namedtuple()
classes¶
To get the performance benefits, intrinsic sortability, etc.
of namedtuple()
while customizing behavior, state, API, etc.,
you can subclass a namedtuple()
class.
Just make sure to include __slots__ = ()
,
or you’ll lose a lot of the performance benefits.
_marker.py
contains a small example.
Here’s a larger one:
>>> from collections import namedtuple
>>> from itertools import count
>>> class Node(namedtuple('_Node', 'cost tiebreaker data parent')):
... """Represent nodes in a graph traversal. Suitable for use with e.g. heapq."""
...
... __slots__ = ()
... _counter = count() # break ties between equal-cost nodes, avoid comparing data
...
... # Give call sites a cleaner API for creating new Nodes
... def __new__(cls, cost, data, parent=None):
... tiebreaker = next(cls._counter)
... return super(Node, cls).__new__(cls, cost, tiebreaker, data, parent)
...
... @property
... def depth(self):
... return self.parent.depth + 1 if self.parent else 0
...
... def __repr__(self):
... return 'Node(cost={cost}, data={data!r})'.format(**self._asdict())
>>> start = Node(cost=0, data='foo')
>>> child = Node(cost=5, data='bar', parent=start)
>>> child
Node(cost=5, data='bar')
>>> child.parent
Node(cost=0, data='foo')
>>> child.depth
1
namedtuple()
-style dynamic class generation¶
See _named.py
.
How to efficiently implement an ordered mapping¶
- Use a backing dict and doubly-linked list.
- See
_orderedbase.py
.OrderedDict
provided a good reference.
API Design¶
- Integrating with
collections
viacollections.abc
andabc
- Implementing ABCs like
collections.abc.Hashable
- Thanks to
Hashable
implementingabc.ABCMeta.__subclasshook__()
, any class that implements all the required methods of theHashable
interface (namely,__hash__()
) makes it a virtual subclass already, no need to explicitly extend. I.e. As long asFoo
implements a__hash__()
method,issubclass(Foo, Hashable)
will always be True, no need to explicitly subclass viaclass Foo(Hashable): ...
collections.abc.Mapping
andcollections.abc.MutableMapping
don’t implement__subclasshook__()
, so must either explicitly subclass (if you want to inherit any of their implementations) or useabc.ABCMeta.register()
(to register as a virtual subclass without inheriting any implementation)- Providing a new open ABC like
BidirectionalMapping
- Notice we have
collections.abc.Reversible
but nocollections.abc.Ordered
orcollections.abc.OrderedMapping
. Proposed in bpo-28912 but rejected. Would have been useful for bidict’s__repr__()
implementation (see_base.py
), and potentially for interop with other ordered mapping implementations such as SortedDict - Beyond
collections.abc.Mapping
, bidicts implement additional APIs thatdict
andOrderedDict
implement.- When creating a new API, making it familiar, memorable, and intuitive is hugely important to a good user experience.
- Making APIs Pythonic
- Zen of Python
- “Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess.” → bidict’s default duplication policies
- “Explicit is better than implicit.
There should be one—and preferably only one—obvious way to do it.”
→ dropped the alternate
.inv
APIs that used the~
operator and the old slice syntax
Portability¶
Python 2 vs. Python 3
mostly
dict
API changes, but also functions likezip()
,map()
,filter()
, etc.If you define a custom
__eq__()
on a class, it will not be used for!=
comparisons on Python 2 automatically; you must explicitly add an__ne__()
implementation that calls your__eq__()
implementation. If you don’t,object.__ne__()
will be used instead, which behaves likeis not
. GOTCHA alert!Python 3 thankfully fixes this.
borrowing methods from other classes:
In Python 2, must grab the
.im_func
/__func__
attribute off the borrowed method to avoid gettingTypeError: unbound method ...() must be called with ... instance as first argument
See
_frozenordered.py
.
CPython vs. PyPy
- gc / weakref
- http://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies
- hence
test_no_reference_cycles
(intest_hypothesis.py
) is skipped on PyPy
- primitives’ identities, nan, etc.
- gc / weakref
Python Syntax hacks¶
bidict
used to support
slice syntax
for looking up keys by value.
See this for an example of how it was implemented.
See #19 for why it was dropped.