Effective Python 86 - 90

Click here for the first post, which contains the context of this series.

Item #86: Consider module-scoped code to configure deployment environments.

  • Programs often need to run in multiple deployment environments that each have unique assumptions and configurations.
  • You can tailor a module's contents to different deployment environments by using normal Python statements in module scope.
  • Module contents can be the product of any external condition, including host introspection through the sys and os modules.

Item #87: Define a root Exception to insulate callers from APIs.

  • Defining root exceptions for modules allows API consumers to insulate themselves from an API.
  • Catching root exceptions can help you find bugs in code that consumes an API.
  • Catching the Python Exception base class can help you find bugs in API implementations.
  • Intermediate root exceptions let you add more specific types of exceptions in the future without breaking your API consumers.

Item #88: Know how to break circular dependencies.

  • Circular dependencies happen when two modules must call into each other at import time. They can cause your program to crash at startup.
  • The best way to break a circular dependency is by refactoring mutual dependencies into a separate module at the bottom of the dependency tree.
  • Dynamic imports are the simplest solution for breaking a circular dependency between modules while minimizing refactoring and complexity.

Item #89: Consider warnings to refactor and migrate usage.

  • The warnings module can be used to notify callers of your API about deprecated usage. Warning messages encourage such callers to fix their code before later changes break their programs.
  • Raise warnings as errors by using the -W error command-line argument to the Python interpreter. This is especially useful in automated tests to catch potential regressions of dependencies.
  • In production, you can replicate warnings into the logging module to ensure that your existing error reporting systems will capture warnings at runtime.
  • It's useful to write tests for the warnings that your code generates to make sure that they'll be triggered at the right time in any of your downstream dependencies.

Item #90: Consider static analysis via typing to obviate bugs.

  • Python has special syntax and the typing built-in module for annotating variables, fields, functions, and methods with type information.
  • Static type checkers can leverage type information to help you avoid many common bugs that would otherwise happen at runtime.
  • There are a variety of best practices for adopting types in your programs, using them in APIs, and making sure they don't get in the way of your productivity.

Effective Python 81 - 85

Click here for the first post, which contains the context of this series.

Item #81: Use tracemalloc to understand memory usage and leaks.

  • It can be difficult to understand how Python programs use and leak memory.
  • The gc module can help you understand which objects exist, but it has no information about how they were allocated.
  • The tracemalloc built-in module provides powerful tools for understanding the sources of memory usage.

Item #82: Know where to find community-built modules.

  • The Python Package Index (PyPI) contains a wealth of common packages that are built and maintained by the Python community.
  • pip is the command-line tool you can use to install packages from PyPI.
  • The majority of PyPI modules are free and open source software.

Item #83: Use virtual environments for isolated and reproducible dependencies.

  • Virtual environments allow you to use pip to install many different versions of the same package on the same machine without conflicts.
  • Virtual environments are created with python -m venv, enabled with source bin/activate, and disabled with deactivate.
  • You can dump all of the requirements of an environment with python3 -m pip freeze. You can reproduce an environment by running python3 -m pip install -r requirements.txt.

Item #84: Write docstrings for every function, class, and module.

  • Write documentation for every module, class, method, and function using docstrings. Keep them up-to-date as your code changes.
  • For modules: Introduce the contents of a module and any important classes or functions that all users should know about.
  • For classes: Document behavior, important attributes, and subclass behavior in the docstring following the class statement.
  • For functions and methods: Document every argument, returned value, raised exception, and other behaviors in the docstring following the def statement.
  • If you're using type annotations, omit the information that's already present in type annotations from docstrings since it would be redundant to have it in both places.

Item #85: Use packages to organize modules and provide stable APIs.

  • Packages in Python are modules that contain other modules. Packages allow you to organize your code into separate, non-conflicting namespaces with unique absolute module names.
  • Simple packages are defined by adding an __init__.py file to a directory that contains other source files. These files become the child modules of the directory's package. Package directories may also contain other packages.
  • You can provide an explicit API for a module by listing its publicly visible names in its __all__ special attribute.
  • You can hide a package's internal implementation by only importing public names in the package's __init__.py file or by naming internal-only members with a leading underscore.
  • When collaborating within a single team or on a single codebase, using __all__ for explicit APIs is probably unnecessary.

Effective Python 76 - 80

Click here for the first post, which contains the context of this series.

Item #76: Verify related behaviors in TestCase subclasses.

  • You can create tests by subclassing the TestCase class from the unittest built-in module and defining one method per behavior you'd like to test. Test methods on TestCase classes must start with the word test.
  • Use the various helper methods defined by the TestCase class, such as assertEqual, to confirm expected behaviors in your tests instead of using the built-in assert statement.
  • Consider writing data-driven tests using the subTest helper method in order to reduce boilerplate.

Item #77: Isolate tests from each other with setUp, tearDown, setUpModule, and tearDownModule.

  • It's important to write both unit tests (for isolated functionality) and integration tests (for modules that interact with each other).
  • Use the setUp and tearDown methods to make sure your tests are isolated from each other and have a clean test environment.
  • For integration tests, use the setUpModule and tearDownModule module-level functions to manage any test harnesses you need for the entire lifetime of a test module and all of the TestCase classes that it contains.

Item #78: Use mocks to test code with complex dependencies.

  • The unittest.mock module provides a way to simulate the behavior of interfaces using the Mock class. Mocks are useful in tests when it's difficult to set up the dependencies that are required by the code that's being tested.
  • When using mocks, it's important to verify both the behavior of the code being tested and how dependent functions were called by that code, using the Mock.assert_called_once_with family of methods.
  • Keyword-only arguments and the unittest.mock.patch family of functions can be used to inject mocks into the code being tested.

Item #79: Encapsulate dependencies to facilitate mocking and testing.

  • When unit tests require a lot of repeated boilerplate to set up mocks, one solution may be to encapsulate the functionality of dependencies into classes that are more easily mocked.
  • The Mock class of the unittest.mock built-in module simulates classes by returning a new mock, which can act as a mock method, for each attribute that is accessed.
  • For end-to-end tests, it's valuable to refactor your code to have more helper functions that can act as explicit seams to use for injecting mock dependencies in tests.

Item #80: Consider interactive debugging with pdb.

  • You can initiate the Python interactive debugger at a point of interest directly in your program by calling the breakpoint built-in function.
  • The Python debugger prompt is a full Python shell that lets you inspect and modify the state of a running program.
  • pdb shell commands let you precisely control program execution and allow you to alternate between inspecting program state and progressing program execution.
  • The pdb module can be used for debug exceptions after they happen in independent Python programs (using python -m pdb -c continue <program path>) or the interactive Python interpreter (using import pdb; pdb.pm()).

Effective Python 71 - 75

Click here for the first post, which contains the context of this series.

Item #71: Prefer deque for producer-consumer queues.

  • The list type can be used as a FIFO queue by having the producer call append to add items and the consumer call pop(0) to receive items. However, this may cause problems because the performance of pop(0) degrades superlinearly as the queue length increases.
  • The deque class from the collections built-in module takes constant time—regardless of length—for append and popleft, making it ideal for FIFO queues.

Item #72: Consider searching sorted sequences with bisect.

  • Searching sorted data contained in a list takes linear time using the index method or a for loop with simple comparisons.
  • The bisect built-in module’s bisect_left function takes logarithmic time to search for values in sorted lists, which can be orders of magnitude faster than other approaches.

Item #73: Know how to use heapq for priority queues.

  • Priority queues allow you to process items in order of importance instead of in first-in, first-out order.
  • If you try to use list operations to implement a priority queue, your program's performance will degrade superlinearly as the queue grows.
  • The heapq built-in module provides all of the functions you need to implement a priority queue that scales efficiently.
  • To use heapq, the items being prioritized must have a natural sort order, which requires special methods like __lt__ to be defined for classes.

Item #74: Consider memoryview and bytearray for zero-copy interaction with bytes.

  • The memoryview built-in type provides a zero-copy interface for reading and writing slices of objects that support Python's high-performance buffer protocol.
  • The bytearray built-in type provides a mutable bytes-like type that can be used for zero-copy data reads with functions like socket.recv_from.
  • A memoryview can wrap a bytearray, allowing for received data to be spliced into an arbitrary buffer location without copying costs.

Item #75: Use repr strings for debugging output.

  • Calling print on built-in Python types produces the human-readable string version of a value, which hides type information.
  • Calling repr on built-in Python types produces the printable string version of a value. These repr strings can often be passed to the eval built-in function to get back the original value.
  • %s in format strings produces human-readable strings like str. %r produces printable strings like repr. F-strings produce human-readable strings for replacement text expressions unless you specify the !r suffix.
  • You can define the __repr__ special method on a class to customize the printable representation of instances and provide more detailed debugging information.

Effective Python 66 - 70

Click here for the first post, which contains the context of this series.

Item #66: Consider contextlib and with statements for reusable try/finally behavior.

  • The with statement allows you to reuse logic from try/finally blocks and reduce visual noise.
  • The contextlib built-in module provides a contextmanager decorator that makes it easy to use your own functions in with statements.
  • The value yielded by context managers is supplied to the as part of the with statement. It is useful for letting your code directly access the cause of a special context.

Item #67: Use datetime instead of time for local clocks.

  • Avoid using the time module for translating between different time zones.
  • Use the datetime built-in module along with the pytz community module to reliably convert between times in different time zones.
  • Always represent time in UTC and do conversions to local time as the very final step before presentation.

Item #68: Make pickle reliable with copyreg.

  • The pickle built-in module is useful only for serializing and deserializing objects between trusted programs.
  • Deserializing previously pickled objects may break if the classes involved have changed over time (e.g., attributes have been added or removed).
  • Use the copyreg built-in module with pickle to ensure backward compatibility for serialized objects.

Item #69: Use decimal when precision is paramount.

  • Python has built-in types and classes in modules that can represent practically every type of numerical value.
  • The Decimal class is ideal for situations that require high precision and control over rounding behavior, such as computations of monetary values.
  • Pass str instances to the Decimal constructor instead of float instances if it's important to compute exact answers and not floating point approximations.

Item #70: Profile before optimizing.

  • It's important to profile Python programs before optimizing because the sources of slowdowns are often obscure.
  • Use the cProfile module instead of the profile module because it provides more accurate profiling information. The Profile object's runcall method provides everything you need to profile a tree of function calls in isolation.
  • The Stats object lets you select and print the subset of profiling information you need to see to understand your program's performance.

Munkres §52: The Fundamental Group


Claim: If $A$ is star convex, then $A$ is simply connected.

Proof: $A$ is clearly path-connected. Let $a\in A$ be the star point, let $\alpha$ and $\beta$ be two loops in $A$, and define $F:I\times I\to A$ by
$$(x,t)\mapsto\begin{cases}(1-2t)\alpha(x)+2ta&t\leq1/2\\2(1-t)a+(2t-1)\beta(x)&t>1/2\end{cases}.$$
Then $F$ is a path homotopy between $\alpha$ and $\beta$, implying that $\pi_1(A,a)=0$.
$$\tag*{$\blacksquare$}$$
Claim: If $\gamma=\alpha*\beta$, then $\widehat\gamma=\widehat\beta\circ\widehat\alpha$.

Proof:
$$\begin{align}\widehat\gamma([f])&=[\overline{\alpha*\beta}]*[f]*[\alpha*\beta]\\&=[\overline\beta*\overline\alpha]*[f]*[\alpha*\beta]\\&=[\overline\beta]*[\overline\alpha]*[f]*[\alpha]*[\beta]\\&=[\overline\beta]*\widehat\alpha([f])*[\beta]\\&=\widehat\beta(\widehat\alpha([f]))\\&=(\widehat\beta\circ\widehat\alpha)([f]).\tag*{$\blacksquare$}\end{align}$$
Claim: $\pi_1(X,x_0)$ is abelian if and only if $\widehat\alpha=\widehat\beta$ for all paths $\alpha,\beta$ from $x_0$ to $x_1$, where $X$ is path-connected.

Proof: Suppose that $\pi_1(X,x_0)$ is abelian and recall that $\pi_1(X,x_1)$ is isomorphic to it. Then
$$\begin{align}\widehat\alpha([f])&=[\overline{\alpha}]*[f]*[\alpha]\\&=[\overline{\alpha}]*[f]*[\beta]*[\overline\beta*\alpha]\\&=[\overline\beta*\alpha]*[\overline{\alpha}]*[f]*[\beta]\\&=[\overline\beta]*[f]*[\beta]\\&=\widehat\beta([f]).\end{align}$$
Conversely, suppose that $\widehat\alpha=\widehat\beta$ for all paths $\alpha,\beta$ from $x_0$ to $x_1$, let $\alpha$ be a path from $x_0$ to $x_1$, let $f$ and $g$ be loops based at $x_0$, and note that $\gamma:=f*\alpha$ is a path from $x_0$ to $x_1$. Then
$$\begin{align}\widehat\gamma([g])&=[\overline\gamma]*[g]*[\gamma]\\&=[\overline{f*\alpha}]*[g]*[f*\alpha]\\&=[\overline\alpha*\overline f]*[g]*[f*\alpha]\\&=[\overline\alpha]*[\overline f*g*f]*[\alpha]\\&=[\overline\alpha]*[g]*[\alpha]\\&=\widehat\alpha([g])\end{align}$$
implies that $[\overline f*g*f]=[g]$, which in turn implies that $[g]*[f]=[f]*[g]$.
$$\tag*{$\blacksquare$}$$
Claim: If $a\in A\subset X$ and $r$ is a retraction of $X$ onto $A$, then
$$r_*:\pi_1(X,a)\to\pi_1(A,a)$$
is surjective.

Proof:
$$r_*\circ\iota_*=(r\circ\iota)_*=\mathrm{id}_{\pi_1(A,a)},$$
where $\iota:A\hookrightarrow X$, implies that $r$ has a right inverse, which in turn implies that it is surjective.
$$\tag*{$\blacksquare$}$$
Claim: If $a\in A\subset\mathbb R^n$, $y\in Y$, $h:\pi_1(A,a)\to\pi_1(Y,y)$, and $h$ is extendable to a continuous $\widetilde h:\mathbb R^n\to Y$, then $h_*$ is trivial.

Proof:
$$h=\widetilde h\circ\iota\implies h_*=\widetilde h_*\circ\iota_*,$$
where $\iota:A\hookrightarrow\mathbb R^n$. However, the domain of $\widetilde h_*$ is $\pi_1(\mathbb R^n,a)=0$.
$$\tag*{$\blacksquare$}$$
Claim: If $X$ is path-connected, $h:X\to Y$ is continuous, $h(x_0)=y_0$, $h(x_1)=y_1$, $\alpha$ is a path from $x_0$ to $x_1$, and $\beta=h\circ\alpha$, then
$$\widehat\beta\circ(h_{x_0})_*=(h_{x_1})_*\circ\widehat\alpha.$$
This is equivalent to saying that $h_*$ is independent of base point up to isomorphism.

Proof:
$$\begin{align}\widehat\beta\circ(h_{x_0})_*([f])&=[\overline\beta]*(h_{x_0})_*([f])*[\beta]\\&=[h\circ\overline\alpha]*[h\circ f]*[h\circ\alpha]\\&=(h_{x_1})_*([\overline\alpha]*[f]*[\alpha])\\&=(h_{x_1})_*\circ\widehat\alpha.\end{align}$$
$$\tag*{$\blacksquare$}$$
Let $G$ be a topological group with operation $\cdot$ and identity $x_0$, let $\Omega(G,x_0)$ be the set of loops in $G$ based at $x_0$, and let
$$(f\otimes g)(s):=f(s)\cdot g(s)$$
for all $f,g\in\Omega(G,x_0)$.

Note that $\Omega(G,x_0)$ equipped with $\otimes$ is a group with identity $x_0(s)=x_0$.

Claim$\otimes$ induces a group operation $\otimes$ on $\pi_1(G,x_0)$.

Proof: Let $[f]\otimes[g]:=[f\otimes g]$ and note that it is well-defined since $(s,t)\mapsto F(s,t)\cdot G(s,t)$ is a homotopy between $s\mapsto F(s,0)\otimes G(s,0)$ and $s\mapsto F(s,1)\otimes G(s,1)$.
$$\tag*{$\blacksquare$}$$
Claim: $*$ and $\otimes$ on $\pi_1(G,x_0)$ are the same.

Proof: Note that
$$\begin{align}[f]\otimes[g]&=[f*e_{x_0}]\otimes[e_{x_0}*g]\\&=[(f*e_{x_0})\otimes(e_{x_0}*g)]\\&=[f*g]\\&=[f]*[g]\end{align}$$
because
$$\begin{align}((f*e_{x_0})\otimes(e_{x_0}*g))(s)&=(f*e_{x_0})(s)\cdot(e_{x_0}*g)(s)\\&=\begin{cases}f(s)&s\leq1/2\\g(s)&s>1/2\\\end{cases}\\&=(f*g)(s).\end{align}$$
$$\tag*{$\blacksquare$}$$
Claim: $\pi_1(G,x_0)$ is abelian.

Proof:
$$\begin{align}[f]*[g]&=[f]\otimes[g]\\&=[e_{x_0}*f]\otimes[g*e_{x_0}]\\&=[(e_{x_0}*f)\otimes(g*e_{x_0})]\\&=[g*f]\\&=[g]*[f].\end{align}$$
$$\tag*{$\blacksquare$}$$

Examples of Banach Algebras

Let
$$\ell^\infty(\Omega):=\{f:\Omega\to\mathbb C:\|f\|_\infty<\infty\},$$
where $\Omega$ is a set and
$$\|f\|_\infty:=\sup_{\omega\in \Omega}|f(\omega)|.$$
Claim: $\ell^\infty(\Omega)$ is a unital Banach algebra.
 
Proof: It is clear that that $\ell^\infty(\Omega)$ is unital and an algebra. Let $(f_n)_n$ be Cauchy, let $\epsilon>0$, let $N$ be such that
$$\|f_n-f_m\|_\infty=\sup_{\omega\in\Omega}|f_n(\omega)-f_m(\omega)|<\frac\epsilon2$$
for all $n,m\geq N$, define $f:\Omega\to\mathbb C$ by $\omega\mapsto\lim_nf_n(\omega)$, and note that $f$ is well-defined since $(f_n(\omega))_n$ is Cauchy for all $\omega\in\Omega$. Then
$$\lim_m|f_n(\omega)-f_m(\omega)|=|f_n(\omega)-\lim_mf_m(\omega)|=|f_n(\omega)-f(\omega)|\leq\frac\epsilon2<\epsilon$$
for all $\omega\in\Omega$, meaning that $\|f_n-f\|_\infty<\epsilon$ for all $n\geq N$. Finally,
$$\|f\|_\infty=\|f-f_n+f_n\|_\infty\leq\|f_n-f\|_\infty+\|f_n\|_\infty<\infty.\tag*{$\blacksquare$}$$
Let
$$C_b(\Omega):=\{f\in\ell^\infty(\Omega):f\text{ is continuous}\},$$
where $\Omega$ is a topological space.

Claim: $C_b(\Omega)$ is a unital Banach algebra.

Proof: We show that it is closed: recall that convergence with respect to $\|\cdot\|_\infty$ is equivalent to uniform convergence and that uniformly convergent sequences of continuous functions converge to continuous functions.
$$\tag*{$\blacksquare$}$$
Let
$$C_0(\Omega):=\{f\in C_b(\Omega):f\text{ vanishes at infinity}\},$$
where $\Omega$ is locally compact and Hausdorff.

Recall that $f$ vanishes at infinity if for all $\epsilon>0$, there is $K$ compact such that $|f(\omega)|<\epsilon$ for all $\omega\in K^c$.

Claim: $C_0(\Omega)$ is a Banach algebra.

Proof: We show that it is closed: let $n$ be such that $\|f_n-f\|_\infty<\epsilon/2$ and let $K$ compact be such that $|f_n(\omega)|<\epsilon/2$ for all $\omega\in K^c$. Then
$$|f(\omega)|\leq|f_n(\omega)-f(\omega)|+|f_n(\omega)|<\|f_n-f\|_\infty+\frac\epsilon2<\epsilon$$
for all $\omega\in K^c$.
$$\tag*{$\blacksquare$}$$
$C_0(\Omega)$ is one of the most important examples of a Banach algebra in C*-algebra theory, and it is unital if and only if $\Omega$ is compact, in which case
$$C(\Omega)=C_b(\Omega)=C_0(\Omega).$$
Let $L^\infty(\Omega,\mu)$ be the set of classes of essentially bounded, complex-valued, measurable functions on $\Omega$, where $(\Omega,\mu)$ is a measure space, equipped with the essential supremum norm.

Claim: $L^\infty(\Omega,\mu)$ is a unital Banach algebra.

Let $B_\infty(\Omega):=\{f\in\ell^\infty(\Omega):f\text{ is measurable}\}$, where $\Omega$ is measurable.

Claim: $B_\infty(\Omega)$ is a unital Banach algebra.

Proof: Recall that a point-wise convergent sequence of measurable functions converges to a measurable function.
$$\tag*{$\blacksquare$}$$
Let $A$ be the set of continuous, complex-valued functions on the closed unit disc $D\subset\mathbb C$ that are holomorphic on the interior $D^{\mathrm{o}}$ of $D$.

Claim: $A$ is a unital Banach algebra called the disc algebra.

Proof. If $(f_n)_n$ converges to $f$ with respect to $\|\cdot\|_\infty$, then it converges uniformly and $f$ is thus continuous. Moreover,
$$0=\lim_n\oint_\gamma f_n(z)\,\text{d}z=\oint_\gamma\lim_nf_n(z)\,\text{d}z=\oint_\gamma f(z)\,\text{d}z$$
for all closed, piece-wise $C^1$ curves $\gamma$ on $D^{\mathrm{o}}$. Therefore, by Morera's theorem, $f$ is holomorphic on $D^{\mathrm{o}}$.
$$\tag*{$\blacksquare$}$$





Effective Python 61 - 65

Click here for the first post, which contains the context of this series.

Item #61: Know how to port threaded I/O to asyncio.

  • Python provides asynchronous versions of for loops, with statements, generators, comprehensions, and library helper functions that can be used as drop-in replacements in coroutines.
  • The asyncio built-in module makes it straightforward to port existing code that uses threads and blocking I/O over to coroutines and asynchronous I/O.

Item #62: Mix threads and coroutines to ease the transition to asyncio.

  • The awaitable run_in_executor method of the asyncio event loop enables coroutines to run synchronous functions in ThreadPoolExecutor pools. This facilitates top-down migrations to asyncio.
  • The run_until_complete method of the asyncio event loop enables synchronous code to run a coroutine until it finishes. The asyncio.run_coroutine_threadsafe function provides the same functionality across thread boundaries. Together these help with bottom-up migrations to asyncio.

Item #63: Avoid blocking the asyncio event loop to maximize responsiveness.

  • Making system calls in coroutines—including blocking I/O and starting threads—can reduce program responsiveness and increase the perception of latency.
  • Pass the debug=True parameter to asyncio.run in order to detect when certain coroutines are preventing the event loop from reacting quickly.

Item #64: Consider concurrent.futures for true parallelism.

  • Moving CPU bottlenecks to C-extension modules can be an effective way to improve performance while maximizing your investment in Python code. However, doing so has a high cost and may introduce bugs.
  • The multiprocessing module provides powerful tools that can parallelize certain types of Python computation with minimal effort.
  • The power of multiprocessing is best accessed through the concurrent.futures built-in module and its simple ProcessPoolExecutor class.
  • Avoid the advanced (and complicated) parts of the multiprocessing module until you've exhausted all other options.

Item #65: Take advantage of each block in try, except, else, and finally.

  • The try/finally compound statement lets you run cleanup code regardless of whether exceptions were raised in the try block.
  • The else block helps you minimize the amount of code in try blocks and visually distinguish the success case from the try/except blocks.
  • An else block can be used to perform additional actions after a successful try block but before common cleanup in a finally block.
import json
try:
    my_file = open('my_file.json')
except FileNotFoundError as e:
    print(e)
else:
    try:
        data = json.loads(my_file.read())
    except json.decoder.JSONDecodeError as e:
        print(e)
    else:
        # Work with data here.
        pass
    finally:
        my_file.close()

Effective Python 56 - 60

Click here for the first post, which contains the context of this series.

Item #56: Know how to recognize when concurrency is necessary.

In this item, the author creates Conway's Game of Life and asks about its scalability in the context of a MMO. He summarizes it as follows: "Python provides many built-in tools for achieving fan-out and fan-in with various trade-offs. You should understand the pros and cons of each approach and choose the best tool for the job, depending on the situation."

Item #57: Avoid creating new Thread instances for on-demand fan-out.

Consider Conway's Game of Life, mentioned in the previous item, and suppose that you create a Thread instance for each cell. This will work, but there are tradeoffs:
  • Thread instances require special tools (like Lock) to coordinate among themselves.
  • Each Thread instance requires about 8 MB, which is high.
  • Starting a Thread instance and their subsequent context switching is costly.
  • Thread instances do not provide a built-in way to re-raise exceptions back to their callers.

Item #58: Understand how using Queue for concurrency requires refactoring.

In this item, the author refactors Conway's Game of Life from the previous item in an attempt to showcase the difficulty of using Queue. He summarizes it as follows:
  • Using Queue instances with a fixed number of Thread instances improves the scalability of fan-in and fan-out.
  • It is difficult to refactor existing code to use Queue.
  • Using Queue has a fundamental limit to the total amount of I/O parallelism.

Item #59: Consider ThreadPoolExecutor when threads are necessary for concurrency.

In this item, the author uses ThreadPoolExecutor from concurrent.futures to address Conway's Game of Life from the previous items. It takes the best of both the previously discussed worlds (Thread and Queue) without the boilerplate. Nevertheless, it still does not scale well in terms of fan-out.

Item #60: Achieve highly concurrent I/O with coroutines.

In this item, the author introduces the keywords async and await, introduces the built-in library asyncio, and uses them to address Conway's Game of Life in an incredibly optimal way. Here is a simple code snippet that illustrates their use:

import asyncio
async def blocking_io(i):
    await asyncio.sleep(1)
    return f'My ID is {i} and I waited 1 second.'
async def func():
    results = []
    for i in range(10):
        results.append(blocking_io(i))
    return await asyncio.gather(*results)
print(asyncio.run(func()))

Effective Python 51 - 55

Click here for the first post, which contains the context of this series.

Item #51: Prefer class decorators over metaclasses.

Consider the following decorator:

from functools import wraps
def func_log(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
            result = func(*args, **kwargs)
            return result
        except Exception as exception:
            result = exception
            raise
        finally:
            print(f'{func.__name__}({args},{kwargs})->{result}')
    return wrapper

 
Suppose that you want to use it to log a dictionary:

class FuncLogDict(dict):
    @func_log
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    @func_log
    def __getitem__(self, *args, **kwargs):
        super().__getitem__(*args, **kwargs)
    @func_log
    def __setitem__(self, *args, **kwargs):
        super().__setitem__(*args, **kwargs)
    # ...
d = FuncLogDict()
d['foo'] = 'bar'
d['foo']

 
This is redundant. Use a class decorator instead:

import types
log_types = (
    types.MethodType,
    types.FunctionType,
    types.BuiltinMethodType,
    types.BuiltinFunctionType,
    types.MethodDescriptorType,
    types.ClassMethodDescriptorType
)
def class_log(instance):
    for key in dir(instance):
        value = getattr(instance, key)
        if isinstance(value, log_types):
            setattr(instance, key, func_log(value))
    return instance
@class_log
class ClassLogDict(dict):
    pass
d = ClassLogDict()
d['foo'] = 'bar'
d['foo']

Item #52: Use subprocess to manage child processes.

I skip this item since it depends heavily on the operating system on which Python is run but recommend perusing the documentation of subprocess and refreshing one's memory about pipes.

Item #53: Use threads for blocking I/O, avoid for parallelism.

Although the global interpreter lock (GIL) does not allow threads to run in parallel, they are useful for doing blocking I/O at the same time as computation.

from threading import Thread
class Factorize(Thread):
    def __init__(self, number):
        super().__init__()
        self.number = number
    def run(self):
        self.factors = [1]
        for i in range(2, self.number):
            if not self.number % i:
                self.factors.append(i)
        self.factors.append(self.number)
threads = []
for number in [2139079, 1214759, 1516637, 1852285]:
    thread = Factorize(number)
    thread.start()
    threads.append(thread)
for thread in threads:
    thread.join()
    print(f'{thread.number}: {thread.factors}')

Item #54: Use Lock to prevent data races in threads.

Consider

from threading import Thread
class Counter:
    def __init__(self):
        self.count = 0
    def increment(self):
        self.count += 1
def worker(counter, total):
    for _ in range(total):
        counter.increment()
total = 10 ** 5
counter = Counter()
threads = []
for _ in range(5):
    thread = Thread(target=worker, args=(counter, total))
    thread.start()
    threads.append(thread)
for thread in threads:
    thread.join()
print('expected:', total * 5, 'actual:', counter.count)

A run of this code gave me the output:

expected: 500000 actual: 406246

This is due to a race condition. One way to address it is to use the Lock class, which is a mutex:

# ...
from threading import Lock
class Counter:
    def __init__(self):
        self.lock = Lock()
        self.count = 0
    def increment(self):
        with self.lock:
            self.count += 1
# ...

Item #55: Use Queue to coordinate work between threads.

Suppose that you want to do something (ideally I/O bound) that can be structured as a pipeline. You can use multiple threads to significantly speed it up, and you can use Queue to coordinate them. Here is an abstract example:

from queue import Queue
from threading import Thread
class MyQueue(Queue):
    SENTINEL = object()
    def close(self):
        self.put(self.SENTINEL)
    def __iter__(self):
        while True:
            item = self.get()
            try:
                if item == self.SENTINEL:
                    return
                yield item
            finally:
                self.task_done()
class MyWorker(Thread):
    def __init__(self, func, in_queue, out_queue):
        super().__init__()
        self.func = func
        self.in_queue = in_queue
        self.out_queue = out_queue
    def run(self):
        for item in self.in_queue:
            self.out_queue.put(self.func(item))
def func_1(item):
    return item
def func_2(item):
    return item
def func_3(item):
    return item
queue_1 = MyQueue()
queue_2 = MyQueue()
queue_3 = MyQueue()
queue_4 = MyQueue()
threads = [
    MyWorker(func_1, queue_1, queue_2) for _ in range(10)
] + [
    MyWorker(func_2, queue_2, queue_3) for _ in range(10)
] + [
    MyWorker(func_3, queue_3, queue_4) for _ in range(10)
]
for thread in threads:
    thread.start()
for i in range(100):
    queue_1.put(i)
for queue in [queue_1, queue_2, queue_3]:
    for _ in range(10):
        queue.close()
    queue.join()
for thread in threads:
    thread.join()
print(queue_4.qsize())

Effective Python 46 - 50

Click here for the first post, which contains the context of this series.

Item #46: Use descriptors for reusable @property methods.

Consider

class GradeBook:
    def __init__(self, grade_1=0, grade_2=0):
        self._grade_1 = grade_1
        self._grade_2 = grade_2
    @staticmethod
    def is_valid(value):
        if not 0 <= value <= 100:
            raise ValueError
    @property
    def grade_1(self):
        return self._grade_1
    @grade_1.setter
    def grade_1(self, value):
        self.is_valid(value)
        self._grade_1 = value
    @property
    def grade_2(self):
        return self._grade_2
    @grade_2.setter
    def grade_2(self, value):
        self.is_valid(value)
        self._grade_2 = value

Adding grade_3, grade_4, ... requires duplicating code, and creating a new class with similar functionality requires also duplicating is_valid.

This can be addressed using a descriptor:

from weakref import WeakKeyDictionary
class Grade:
    def __init__(self):
        self._values = WeakKeyDictionary()
    def __get__(self, instance, instance_type):
        return self._values.get(instance, 0)
    def __set__(self, instance, value):
        if not 0 <= value <= 100:
            raise ValueError
        self._values[instance] = value
class GradeBook:
    grade_1 = Grade()
    grade_2 = Grade()

weakref prevents memory leaks.

Item #47: Use __getattr__, __getattribute__, and __setattr__ for lazy attributes.

Note that

class I47:
    def __getattr__(self, name):
        self.__setattr__(name, None)
        return None
i47 = I47()
i47.test # Calls __getattr__.
i47.test # Doesn't call __getattr__.

Also note that

class I47:
    def __getattribute__(self, name):
        self.__setattr__(name, None)
        return None
i47 = I47()
i47.test # Calls __getattribute__.
i47.test # Calls __getattribute__.

There are interesting use cases for these overloads, like logging. As before, be mindful of the use of super() to avoid infinite recursion.

Item #48: Validate subclasses with __init_subclass__.

Consider

class Polygon:
    sides = None
    def __init_subclass__(cls):
        super().__init_subclass__()
        if cls.sides is None or cls.sides < 3:
            raise ValueError('Polygons must have more than 2 sides.')
class Triangle(Polygon):
    sides = 3
print(1)
class Line(Polygon):
    print(2)
    sides = 2
    print(3)
print(4)

This throws an exception after printing 3 but before printing 4.

Although the use of super().__init_subclass__() is unnecessary here, it is recommended in order to handle multiple inheritance with classes that implement __init_subclass__.

This is only one use case of __init_subclass__.

Item #49: Register class existence with __init_subclass__.

Here is another interesting use case of __init_subclass__:

import json
class_registry = {}
def deserialize(data):
    params = json.loads(data)
    return class_registry[params['class']](*params['args'])
class Serializable:
    def __init__(self, *args):
        self.args = args
    def serialize(self):
        return json.dumps({
            'class': self.__class__.__name__,
            'args': self.args
        })
    def __init_subclass__(cls):
        class_registry[cls.__name__] = cls
class Point3D(Serializable):
    def __init__(self, x, y, z):
        super().__init__(x, y, z)
        self.x = x
        self.y = y
        self.z = z

For such reasons, keeping a class registry is often useful.

Item #50: Annotate class attributes with __set_name__.

Consider

class Grade:
    def __set_name__(self, _, name):
        self.name = name
        self.protected_name = '_' + name
    def __get__(self, instance, _):
        return getattr(instance, self.protected_name, 0)
    def __set__(self, instance, value):
        if not 0 <= value <= 100:
            raise ValueError
        setattr(instance, self.protected_name, value)
class GradeBook:
    grade_1 = Grade()
    grade_2 = Grade()
gb = GradeBook()
print(f'{gb.grade_1}, {gb.grade_2}, {gb.__dict__}')
gb.grade_1 = 91
gb.grade_2 = 98
print(f'{gb.grade_1}, {gb.grade_2}, {gb.__dict__}')

Compare this to Item 46.

Effective Python 41 - 45

Click here for the first post, which contains the context of this series.

Item #40: Consider composing functionality with mix-in classes.

Although you should avoid multiple inheritance, you can use a mix-in class, which is a class that only defines a small set of methods.

Consider the desire to represent an object as a dictionary:

class ToDictMixin:
    def to_dict(self):
        return self._traverse_dict(self.__dict__)
    
    def _traverse_dict(self, instance_dict):
        output = {}
        for key, value in instance_dict.items():
            output[key] = self._traverse(key, value)
        return output
    
    def _traverse(self, key, value):
        if isinstance(value, ToDictMixin):
            return value.to_dict()
        if isinstance(value, dict):
            return self._traverse_dict(value)
        if isinstance(value, list):
            return [self._traverse(key, i) for i in value]
        if hasattr(value, '__dict__'):
            return self._traverse_dict(value.__dict__)
        return value

and consider the desire to represent an object as a JSON string:

class ToJsonMixin:
    @classmethod
    def from_json(cls, data):
        kwargs = json.loads(data)
        return cls(**kwargs)
    
    def to_json(self):
        return json.dumps(self.to_dict())

Then

class BinaryTree(ToDictMixin, ToJsonMixin):
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

has a lot of useful functionality.

Item #41: Prefer public attributes over private ones.

Consider

class MyClass:
    def __init__(self, value):
        self.__value = value


my_class = MyClass(5)

Then my_class.__value will raise an AttributeError, but it still can be accessed by my_class._MyClass__value, so Python does not really enforce privacy.

To avoid making your code cumbersome and brittle, avoid doing this. Instead, do this

class MyClass:
    def __init__(self, value):
        self._value = value

and document that self._value is protected.

Item #43: Inherit from collections.abc for custom container types.

I can extend list.

class MyList(list):
    def __init__(self, elements):
        super().__init__(elements)
    
    def frequencies(self):
        count = {}
        for item in self:
            count[item] = count.get(item, 0) + 1
        return count

But what if I want to do this for something that is not inherently a list?

class BinaryNode:
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right
    
    def _traverse(self):
        if self.left:
            yield from self.left._traverse()
        yield self
        if self.right:
            yield from self.right._traverse()
    
    def __getitem__(self, index):
        for i, item in enumerate(self._traverse()):
            if i == index:
                return item
        raise IndexError('binary tree index out of range')

However, using len on a BinaryNode object will raise an exception. The question becomes: what is the least amount of methods that I need to implement to behave like a list? Enter collections.abc:

from collections.abc import Sequence


class BinaryNode(Sequence):
    # ...

This raises an exception with the names of the methods that need to be implemented.

Item #44: Use plain attributes instead of setter and getter methods.

This is done in other languages:

class IWasACSharpDev:
    def __init__(self, value):
        self._value = value
    
    def get_value(self):
        return self._value
    
    def set_value(self, value):
        self._value = value

But this is not Pythonic. We use @property if we need to do this:

class IAmAPythonDev:
    def __init__(self, value):
        self._value = value

    @property    
    def value(self):
        return self._value
    
    @value.setter
    def value(self, value):
        self._value = value

Make sure to keep them short and quick.

Item #45: Consider @property instead of refactoring attributes.

Consider

class Person:
    def __init__(self, age):
        self.age = age

suppose that the codebase sets and gets the ages of countless instances of this class, and suppose that a new law requires that a person's new age be three more than twice their old age. Only a small change needs to be made:

class Person:
    def __init__(self, age):
        self._age = age
    
    @property
    def age(self):
        return self._age * 2 + 3
    
    @age.setter
    def age(self, age):
        self._age = age