Effective Python 16 - 20

Click here for the first post, which contains the context of this series.

Item #16: Prefer get over in and KeyError to handle missing dictionary keys.

Prefer

my_result = my_dict.get(my_key, my_default_value)

over both

if my_key in my_dict:
    my_result = my_dict[my_key]
else:
    my_result = my_default_value

and

try:
    my_result = my_dict[my_key]
except KeyError:
    my_result = my_default_value

Item #17: Prefer defaultdict over setdefault to handle missing items in internal state.

Prefer

my_dict = defaultdict(set)
my_dict[my_key].add(my_value)

over

my_dict = {}
my_dict.setdefault(my_key, set()).add(my_value)

Note that setdefault always executes its second argument and is prone to difficult exception handling.

Item #18: Know how to construct key-dependent default values with __missing__.

If you need to handle a missing item by calling a function that:
  • takes at least one argument and
  • should be called only when necessary,
then defaultdict and setdefault, respectively, will not work. Instead, do something like this:

class MyDict:
    def __missing__(self, key):
        try:
            value = open(key, 'a+b')
            self[key] = value
            return value
        except OSError:
            print(f'failed to open {key}')
            raise

my_dict = MyDict()
file = my_dict[key]

Item #19: Never unpack more than three variables when functions return multiple values.

Avoid doing this since an accidental reorder is not difficult:

minimum, maximum, average, median, count = get_statistics(data)

Item #20: Prefer raising exceptions to returning None.

This looks like it works:

def careful_divide(a, b):
    return a / b if b != 0 else None

c = careful_divide(a, b)
if not c:
    print('division by 0')

But if a, b = 0, 1, then division by 0 is printed, which is undesired behavior. Instead, do something like this:

def careful_divide(a, b):
    try:
        return careful_divide(a, b)
    except ZeroDivisionError:
        raise ValueError('division by 0')

A Good Robot That Is Not So Good

You manufacture items and notice that an item is defective with probability 0.1%.

You consider purchasing a robot to help identify defective items.

The vendor claims that the robot says that
  • a defective item is defective with probability 98%, and
  • a non-defective item is non-defective with probability 99%.
Is this a good purchase?

Ultimately, you only care whether an item is defective if the robot says that it is defective.

Let \(D\) and \(N\) be the events that an item is defective and non-defective, and let \(R_D\) and \(R_N\) be the events that the robot says that an item is defective and non-defective.

Then, mathematically, you only care about \(P(D|R_D)\) and know that
  • \(P(D)=0.001\),
  • \(P(R_D|D)=0.98\), and
  • \(P(R_N|N)=0.99\).
It follows (using Bayes' theorem) that

$$\begin{align*}P(D|R_D)&=P(R_D|D)\frac{P(D)}{P(R_D)}\\\\&=P(R_D|D)\frac{P(D)}{P(R_D|D)P(D)+P(R_D|N)P(N)}\\\\&=P(R_D|D)\frac{P(D)}{P(R_D|D)P(D)+(1-P(R_N|N))(1-P(D))}\\\\&=0.98\frac{0.001}{0.98\cdot0.001+0.01\cdot0.999}\\\\&\approx\vphantom{\frac11}0.0893.\end{align*}$$

In other words, if the robot says that an item is defective, then the probability that it is defective is just under 9%.

How counter-intuitive is that?



Effective Python 11 - 15

Click here for the first post, which contains the context of this series.

Item #11: Know how to slice sequences.

>>> [1, 2, 3, 4, 5][1 : 3], [1, 2, 3, 4, 5][: -2]
([2, 3], [1, 2, 3])

Item #12: Avoid striding and slicing in a single expression.

Avoid

>>> [1, 2, 3, 4, 5][2::2]
[3, 5]

This does not work for all types, e.g., UTF-8.

Item #13: Prefer catch-all unpacking over slicing.

Let A = [1, 2, 3, 4]. Prefer

first, second, *rest = A

over

first, second, rest = A[0], A[1], A[2 :]

Item #14: Sort by complex criteria using the key parameter.

my_list.sort() does not work if the objects do not implement __lt__. This can be circumvented as follows: my_list.sort(key=lambda object: object.property), where property hopefully implements __lt__, e.g., it is an integer, a string, or user-defined.

Item #15: Be cautious when relying on dict insertion ordering.

A Python dictionary my_dict is implemented as a hash table. Therefore, for key in my_dict iterates its keys in an arbitrary order. Newer versions of Python do not have this problem.

Effective Python 06 - 10

Click here for the first post, which contains the context of this series.

Item #06: Prefer multiple assignment unpacking over indexing.

Prefer

    first_name, middle_name, last_name = full_name

over

    first_name = full_name[0]
    middle_name = full_name[1]
    last_name = full_name[2]

Item #07: Prefer enumerate over range.

Prefer

    for i, item in enumerate(my_list):
        some_function(i, item)

over

    for i in range(len(my_list)):
        some_function(i, my_list[i])

Item #08: Use zip to process iterators in parallel.

Prefer

    for expected, actual in zip(expected_results, actual_results):
        assert expected == actual

over

    for i in range(len(expected_results)):
        # error prone if len(expected_results) != len(actual_results)
        assert expected_results[i] == actual_results[i]

Note that zip(A, B) iterates at most min(len(A), len(B)) times.

Item #09: Avoid else blocks after for and while loops.

Suppose that you want to check whether a list my_list contains an item target. Although this can be accomplished as follows

    for item in my_list:
        if item == target:
            print('target found')
            break
    else:
        print('target not found')

avoid doing this (perhaps using a flag).

Item #10: Prevent repetition with assignment expressions.

Consider the following common pattern

    info = get_info()
    if has_property(info):
        do_something_with(info)
    else:
        do_something_else()

It suggests that info is used beyond the scope of if. If it is not, then remove the ambiguity as follows

    if has_property(info := get_info()):
        do_something_with(info)
    else:
        do_something_else()

This way, the scope of info is restricted to if.

Effective Python 01 - 05

I am reading the book Effective Python: 90 Specific Ways to Write Better Python (click here to see it) and will write 18 blog entries, 1 per week, each covering 5 ways. To keep everything bite-sized (no pun intended), I will write a synopsis for each way; many things will be left out, but I will try to capture the most important ones.

I will use Python 3 for the examples.

Item #01: Know which version of Python you are using.

This is achieved with the command $ python --version or at run-time:

    import sys
    print(sys.version)

Item #02: Follow the most recent PEP style guide.

It is extensive. Click here to learn more. Memorize it! A few highlights are: spaces over tabs, 4 spaces of indentation, def this_is_my_function(self):, class ThisIsMyClass:, THIS_IS_MY_GLOBAL_VARIABLE = foo, etc.

Item #03: Know the differences between bytes and str.

Python has two types that represent sequences of character data: bytes and str.

Instances of bytes contain raw, unsigned, 8-bit values that are often displayed in the ASCII encoding:

    x = b'h\x65llo'
    print(list(x), x, type(x))
    >>>
    [104, 101, 108, 108, 111] b'hello' <class, 'bytes'>

Instances of str contain Unicode code points that represent textual characters from human languages.

Item #04: Prefer interpolated f-strings over C-style format strings and str.format.

Let name = 'John Doe'. Prefer to do this: f'My name is {name}.' over 'My name is %s.' % name and 'My name is {0}.'.format(name).

This is a general rule. The documentation says that the logging library is optimized to use C-style strings; this could make a difference at large scales.

Item #05: Write helper functions instead of complex expressions.

Suppose that you have a Chess class that has a complicated function def play_best_move(self):. It is better to rewrite it as follows (to give a simple example):

    def play_best_move(self):
        legal_moves = self.get_legal_moves()
        best_move = self.rank_moves_inc(legal_moves)[-1]
        self.play_move(best_move)

Note that these are reusable functions. For example:

    def play_worst_move(self):
        legal_moves = self.get_legal_moves()
        worst_move = self.rank_moves_inc(legal_moves)[0]
        self.play_move(worst_move)

Munkres §51: Homotopy of Paths


A basic problem of topology is determining whether two spaces are homeomorphic. If so, then a continuous mapping between them with continuous inverse exists. Otherwise, if one can find a topological property that holds for one but not the other, then they cannot be homeomorphic:

  • $\left(0,1\right)$ and $\left[0,1\right]$ are not homeomorphic because the latter is compact and the former is not.
  • $\mathbb R$ and the long line are not homeomorphic because the former has a countable basis and the latter does not.
  • $\mathbb R$ and $\mathbb R^2$ are not homeomorphic because deleting a point from the latter leaves a connected space while doing so from the former does not.
However, distinguishing between $\mathbb R^2$ and $\mathbb R^3$ is more involved: deleting a point from the latter leaves a simply connected space while doing so from the former does not.

To distinguish between more spaces, a more general tool, their fundamental groups, can be used: two spaces are homeomorphic if their fundamental groups are isomorphic. For example, if a space is simply connected, then its fundamental group is trivial.

These writings build the tools necessary to talk about the fundamental group.

Claim: If $h,h':X\to Y$; $k,k':Y\to Z$; $h\simeq h'$; and $k\simeq k'$, then $k\circ h\simeq k'\circ h'$.

Proof: $(x,t)\mapsto G(F(x,t),t)$ is a homotopy.
$$\tag*{$\blacksquare$}$$

Claim: $[X,I]$ is a singleton.

Proof: $(x,t)\mapsto(1-t)f(x)+tg(x)$ is a homotopy.
$$\tag*{$\blacksquare$}$$
Claim: If $Y$ is path-connected, then $[I,Y]$ is a singleton.

Proof: If $h$ is a path from $f(1)$ to $g(0)$, then $f,g\in[I,Y]$ are closed subpaths of and thus homotopic to $f*h*g$.
$$\tag*{$\blacksquare$}$$
Claim: $I$ and $\mathbb R$ are contractible.

Proof: $(x,t)\mapsto (1-t)x$ is a homotopy, i.e., $\text{id}_I$ and $\text{id}_\mathbb R$ are nulhomotopic to $0$.
$$\tag*{$\blacksquare$}$$
Claim: If $X$ is contractible, then $X$ is path-connected.

Proof: $F(x,\cdot)$ is a path from $x$ to $x_0$, where $F$ is a homotopy between $\text{id}_X$ and $x_0$.
$$\tag*{$\blacksquare$}$$
Claim: If $Y$ is contractible, then $[X,Y]$ is a singleton.

Proof: $\text{id}_Y\circ f=f$ is homotopic to $y_0\circ f=y_0$ since $\text{id}_Y$ is homotopic to $y_0$.
$$\tag*{$\blacksquare$}$$
Claim: If $X$ is contractible and $Y$ is path-connected, then $[X,Y]$ is a singleton.

Proof: $f\circ\text{id}_X=f$ is homotopic to $f\circ x_0=f(x_0)$ since $\text{id}_X$ is homotopic to $x_0$ and all constant maps are nulhomotopic since $Y$ is path-connected.
$$\tag*{$\blacksquare$}$$

Two Pathological Examples in Functional Analysis

I have not written here for a long time, and I just realized that I never published this exercise. It was from a class in functional analysis. Its purpose was to have us explicitly demonstrate the existence of a normed vector space that is not complete, i.e., not a Banach space, and of an invertible, bounded, linear operator whose inverse is not bounded.

Equip $\mathbb N$ with the discrete topology, let $c_c:=(C_c(\mathbb N), \|\cdot\|_\sup)$, i.e., the normed, linear space over $\mathbb C$ of continuous, compactly supported functions $\mathbb N\to\mathbb C$, and note that, since $f\in c_c$ is eventually zero,$$\|f\|_\sup=\max_{l\in\mathbb N}\{|f(l)|\}.$$For every $k\in\mathbb N$, define$$\begin{align}f_k:\mathbb N&\to\mathbb C\quad\text{by}\\l&\mapsto\begin{cases}l^{-1}&\text{ if }l\leq k\text{ and}\\0&\text{ otherwise},\end{cases}\end{align}$$note that $f_k\in c_c$, let $\varepsilon>0$, let $N>\varepsilon^{-1}$, let $m,n\in\mathbb N$, and assume, without loss of generality, that $m>n>N$. Then$$\|f_m-f_n\|_\sup=m^{-1}<N^{-1}<\varepsilon,$$i.e., $(f_k)_{k=1}^\infty$ is Cauchy.

Define $f:\mathbb N\to\mathbb C$ by $l\mapsto l^{-1}$, note that $f\not\in c_c$, let $\varepsilon>0$, let $N>\varepsilon^{-1}-1$, and let $k\in\mathbb N$ such that $k>N$. Then$$\|f_k-f\|_\sup=(k+1)^{-1}<(N+1)^{-1}<\varepsilon,$$i.e., $(f_k)_{k=1}^\infty$ converges to $f$.

Therefore, $c_c$ is not complete.

Define $$\begin{align}T:c_c&\to c_c\quad\text{by}\\f=(f_1,f_2,f_3,\dots)&\mapsto(\frac{f_1}{1},\frac{f_2}{2},\frac{f_3}{3},\dots).\end{align}$$

Let $f,g\in c_c$ and let $\lambda\in\mathbb C$. Then$$\begin{align}T(\lambda f+g)&=(\frac{\lambda f_1+g_1}{1},\frac{\lambda f_2+g_2}{2},\frac{\lambda f_3+g_3}{3},\dots)\\&=(\lambda\frac{f_1}{1}+\frac{g_1}{1},\lambda\frac{f_2}{2}+\frac{g_2}{2},\lambda\frac{f_3}{3}+\frac{g_3}{3},\dots)\\&=\lambda(\frac{f_1}{1},\frac{f_2}{2},\frac{f_3}{3},\dots)+(\frac{g_1}{1},\frac{g_2}{2},\frac{g_3}{3},\dots)\\&=\vphantom{\frac11}\lambda T(f)+T(g).\end{align}$$

Let $f=(f_1,f_2,\dots)\in c_c$ and note that$$\|T(f)\|_\sup=\max_{l\in\mathbb N}\{\frac{|f_l|}{l}\}\leq\max_{l\in\mathbb N}\{\frac{\|f\|_\sup}{l}\}=\|f\|_\sup.$$

Since $T$ is linear, $T$ is injective.

Let $f=(f_1,f_2,f_3,\dots)\in c_c$ and $g=(1f_1,2f_2,3f_3,\dots)\in c_c$. Then $T(g)=f$.

Define $S:c_c\to c_c$ by $f=(f_1,f_2,f_3,\dots)\mapsto(1f_1,2f_2,3f_3,\dots)$. Then $S\circ T(f)=T\circ S(f)=f$ so that $S=T^{-1}$.

Suppose that $M$ is such that $\|T^{-1}(f)\|_\sup\leq M\|f\|_\sup$ for every $f\in c_c$, let $k\in\mathbb N$ such that $k>M$, and define $g:\mathbb N\to\mathbb C$ by $l\mapsto1$ if $l=k$ and $l\mapsto0$ otherwise. Then$$\|T^{-1}(g)\|_\sup=\max_{l\in\mathbb N}\{|lg_l|\}=k=k\|g\|_\sup>M\|g\|_\sup,$$which is a contradiction.