Effective Python 16 - 20

Click here for the first post, which contains the context of this series.

Item #16: Prefer get over in and KeyError to handle missing dictionary keys.

Prefer

my_result = my_dict.get(my_key, my_default_value)

over both

if my_key in my_dict:

my_result = my_dict[my_key]

else:

my_result = my_default_value

and

try:

my_result = my_dict[my_key]

except KeyError:

my_result = my_default_value

Item #17: Prefer defaultdict over setdefault to handle missing items in internal state.

Prefer

my_dict = defaultdict(set)

my_dict[my_key].add(my_value)

over

my_dict = {}

my_dict.setdefault(my_key, set()).add(my_value)

Note that setdefault always executes its second argument and is prone to difficult exception handling.

Item #18: Know how to construct key-dependent default values with missing.

If you need to handle a missing item by calling a function that:

takes at least one argument and
should be called only when necessary,

then defaultdict and setdefault, respectively, will not work. Instead, do something like this:

class MyDict:

def __missing__(self, key):

try:

value = open(key, 'a+b')

self[key] = value

return value

except OSError:

print(f'failed to open {key}')

raise

my_dict = MyDict()

file = my_dict[key]

Item #19: Never unpack more than three variables when functions return multiple values.

Avoid doing this since an accidental reorder is not difficult:

minimum, maximum, average, median, count = get_statistics(data)

Item #20: Prefer raising exceptions to returning None.

This looks like it works:

def careful_divide(a, b):

return a / b if b != 0 else None

c = careful_divide(a, b)

if not c:

print('division by 0')

But if a, b = 0, 1, then division by 0 is printed, which is undesired behavior. Instead, do something like this:

def careful_divide(a, b):

try:

return careful_divide(a, b)

except ZeroDivisionError:

raise ValueError('division by 0')

A Good Robot That Is Not So Good

You manufacture items and notice that an item is defective with probability 0.1%.

You consider purchasing a robot to help identify defective items.

The vendor claims that the robot says that

a defective item is defective with probability 98%, and
a non-defective item is non-defective with probability 99%.

Is this a good purchase?

Ultimately, you only care whether an item is defective if the robot says that it is defective.

Let $D$ and $N$ be the events that an item is defective and non-defective, and let $R_D$ and $R_N$ be the events that the robot says that an item is defective and non-defective.

Then, mathematically, you only care about $P(D|R_D)$ and know that

$P(D)=0.001$,
$P(R_D|D)=0.98$, and
$P(R_N|N)=0.99$.

It follows (using Bayes' theorem) that

$$\begin{align*}P(D|R_D)&=P(R_D|D)\frac{P(D)}{P(R_D)}\\\\&=P(R_D|D)\frac{P(D)}{P(R_D|D)P(D)+P(R_D|N)P(N)}\\\\&=P(R_D|D)\frac{P(D)}{P(R_D|D)P(D)+(1-P(R_N|N))(1-P(D))}\\\\&=0.98\frac{0.001}{0.98\cdot0.001+0.01\cdot0.999}\\\\&\approx\vphantom{\frac11}0.0893.\end{align*}$$

In other words, if the robot says that an item is defective, then the probability that it is defective is just under 9%.

How counter-intuitive is that?

Effective Python 11 - 15

Click here for the first post, which contains the context of this series.

Item #11: Know how to slice sequences.

>>> [1, 2, 3, 4, 5][1 : 3], [1, 2, 3, 4, 5][: -2]

([2, 3], [1, 2, 3])

Item #12: Avoid striding and slicing in a single expression.

Avoid

>>> [1, 2, 3, 4, 5][2::2]

[3, 5]

This does not work for all types, e.g., UTF-8.

Item #13: Prefer catch-all unpacking over slicing.

Let A = [1, 2, 3, 4]. Prefer

first, second, *rest = A

over

first, second, rest = A[0], A[1], A[2 :]

Item #14: Sort by complex criteria using the key parameter.

my_list.sort() does not work if the objects do not implement __lt__. This can be circumvented as follows: my_list.sort(key=lambda object: object.property), where property hopefully implements __lt__, e.g., it is an integer, a string, or user-defined.

Item #15: Be cautious when relying on dict insertion ordering.

A Python dictionary my_dict is implemented as a hash table. Therefore, for key in my_dict iterates its keys in an arbitrary order. Newer versions of Python do not have this problem.

Effective Python 06 - 10

Click here for the first post, which contains the context of this series.

Item #06: Prefer multiple assignment unpacking over indexing.

Prefer

first_name, middle_name, last_name = full_name

over

first_name = full_name[0]

middle_name = full_name[1]

last_name = full_name[2]

Item #07: Prefer enumerate over range.

Prefer

for i, item in enumerate(my_list):

some_function(i, item)

over

for i in range(len(my_list)):

some_function(i, my_list[i])

Item #08: Use zip to process iterators in parallel.

Prefer

for expected, actual in zip(expected_results, actual_results):

assert expected == actual

over

for i in range(len(expected_results)):

# error prone if len(expected_results) != len(actual_results)

assert expected_results[i] == actual_results[i]

Note that zip(A, B) iterates at most min(len(A), len(B)) times.

Item #09: Avoid else blocks after for and while loops.

Suppose that you want to check whether a list my_list contains an item target. Although this can be accomplished as follows

for item in my_list:

if item == target:

print('target found')

break

else:

print('target not found')

avoid doing this (perhaps using a flag).

Item #10: Prevent repetition with assignment expressions.

Consider the following common pattern

info = get_info()

if has_property(info):

do_something_with(info)

else:

do_something_else()

It suggests that info is used beyond the scope of if. If it is not, then remove the ambiguity as follows

if has_property(info := get_info()):

do_something_with(info)

else:

do_something_else()

This way, the scope of info is restricted to if.

Effective Python 01 - 05

I am reading the book Effective Python: 90 Specific Ways to Write Better Python (click here to see it) and will write 18 blog entries, 1 per week, each covering 5 ways. To keep everything bite-sized (no pun intended), I will write a synopsis for each way; many things will be left out, but I will try to capture the most important ones.

I will use Python 3 for the examples.

Item #01: Know which version of Python you are using.

This is achieved with the command $ python --version or at run-time:

import sys

print(sys.version)

Item #02: Follow the most recent PEP style guide.

It is extensive. Click here to learn more. Memorize it! A few highlights are: spaces over tabs, 4 spaces of indentation, def this_is_my_function(self):, class ThisIsMyClass:, THIS_IS_MY_GLOBAL_VARIABLE = foo, etc.

Item #03: Know the differences between bytes and str.

Python has two types that represent sequences of character data: bytes and str.

Instances of bytes contain raw, unsigned, 8-bit values that are often displayed in the ASCII encoding:

x = b'h\x65llo'

print(list(x), x, type(x))

>>>

[104, 101, 108, 108, 111] b'hello' <class, 'bytes'>

Instances of str contain Unicode code points that represent textual characters from human languages.

Item #04: Prefer interpolated f-strings over C-style format strings and str.format.

Let name = 'John Doe'. Prefer to do this: f'My name is {name}.' over 'My name is %s.' % name and 'My name is {0}.'.format(name).

This is a general rule. The documentation says that the logging library is optimized to use C-style strings; this could make a difference at large scales.

Item #05: Write helper functions instead of complex expressions.

Suppose that you have a Chess class that has a complicated function def play_best_move(self):. It is better to rewrite it as follows (to give a simple example):

def play_best_move(self):

legal_moves = self.get_legal_moves()

best_move = self.rank_moves_inc(legal_moves)[-1]

self.play_move(best_move)

Note that these are reusable functions. For example:

def play_worst_move(self):

legal_moves = self.get_legal_moves()

worst_move = self.rank_moves_inc(legal_moves)[0]

self.play_move(worst_move)

Munkres §51: Homotopy of Paths

◄ Munkres §52: The Fundamental Group

A basic problem of topology is determining whether two spaces are homeomorphic. If so, then a continuous mapping between them with continuous inverse exists. Otherwise, if one can find a topological property that holds for one but not the other, then they cannot be homeomorphic:

$\left(0,1\right)$ and $\left[0,1\right]$ are not homeomorphic because the latter is compact and the former is not.
$\mathbb R$ and the long line are not homeomorphic because the former has a countable basis and the latter does not.
$\mathbb R$ and $\mathbb R^2$ are not homeomorphic because deleting a point from the latter leaves a connected space while doing so from the former does not.

However, distinguishing between $\mathbb R^2$ and $\mathbb R^3$ is more involved: deleting a point from the latter leaves a simply connected space while doing so from the former does not.

To distinguish between more spaces, a more general tool, their fundamental groups, can be used: two spaces are homeomorphic if their fundamental groups are isomorphic. For example, if a space is simply connected, then its fundamental group is trivial.

These writings build the tools necessary to talk about the fundamental group.

Claim: If $h,h':X\to Y$; $k,k':Y\to Z$; $h\simeq h'$; and $k\simeq k'$, then $k\circ h\simeq k'\circ h'$.

Proof: $(x,t)\mapsto G(F(x,t),t)$ is a homotopy.

$$\tag*{$\blacksquare$}$$

Claim: $[X,I]$ is a singleton.

Proof: $(x,t)\mapsto(1-t)f(x)+tg(x)$ is a homotopy.

$$\tag*{$\blacksquare$}$$

Claim: If $Y$ is path-connected, then $[I,Y]$ is a singleton.

Proof: If $h$ is a path from $f(1)$ to $g(0)$, then $f,g\in[I,Y]$ are closed subpaths of and thus homotopic to $f*h*g$.

$$\tag*{$\blacksquare$}$$

Claim: $I$ and $\mathbb R$ are contractible.

Proof: $(x,t)\mapsto (1-t)x$ is a homotopy, i.e., $\text{id}_I$ and $\text{id}_\mathbb R$ are nulhomotopic to $0$.

$$\tag*{$\blacksquare$}$$

Claim: If $X$ is contractible, then $X$ is path-connected.

Proof: $F(x,\cdot)$ is a path from $x$ to $x_0$, where $F$ is a homotopy between $\text{id}_X$ and $x_0$.

$$\tag*{$\blacksquare$}$$

Claim: If $Y$ is contractible, then $[X,Y]$ is a singleton.

Proof: $\text{id}_Y\circ f=f$ is homotopic to $y_0\circ f=y_0$ since $\text{id}_Y$ is homotopic to $y_0$.

$$\tag*{$\blacksquare$}$$

Claim: If $X$ is contractible and $Y$ is path-connected, then $[X,Y]$ is a singleton.

Proof: $f\circ\text{id}_X=f$ is homotopic to $f\circ x_0=f(x_0)$ since $\text{id}_X$ is homotopic to $x_0$ and all constant maps are nulhomotopic since $Y$ is path-connected.

$$\tag*{$\blacksquare$}$$

Two Pathological Examples in Functional Analysis

I have not written here for a long time, and I just realized that I never published this exercise. It was from a class in functional analysis. Its purpose was to have us explicitly demonstrate the existence of a normed vector space that is not complete, i.e., not a Banach space, and of an invertible, bounded, linear operator whose inverse is not bounded.

Equip $\mathbb N$ with the discrete topology, let $c_c:=(C_c(\mathbb N), \|\cdot\|_\sup)$, i.e., the normed, linear space over $\mathbb C$ of continuous, compactly supported functions $\mathbb N\to\mathbb C$, and note that, since $f\in c_c$ is eventually zero,$$\|f\|_\sup=\max_{l\in\mathbb N}\{|f(l)|\}.$$For every $k\in\mathbb N$, define$$\begin{align}f_k:\mathbb N&\to\mathbb C\quad\text{by}\\l&\mapsto\begin{cases}l^{-1}&\text{ if }l\leq k\text{ and}\\0&\text{ otherwise},\end{cases}\end{align}$$note that $f_k\in c_c$, let $\varepsilon>0$, let $N>\varepsilon^{-1}$, let $m,n\in\mathbb N$, and assume, without loss of generality, that $m>n>N$. Then$$\|f_m-f_n\|_\sup=m^{-1}<N^{-1}<\varepsilon,$$i.e., $(f_k)_{k=1}^\infty$ is Cauchy.

Define $f:\mathbb N\to\mathbb C$ by $l\mapsto l^{-1}$, note that $f\not\in c_c$, let $\varepsilon>0$, let $N>\varepsilon^{-1}-1$, and let $k\in\mathbb N$ such that $k>N$. Then$$\|f_k-f\|_\sup=(k+1)^{-1}<(N+1)^{-1}<\varepsilon,$$i.e., $(f_k)_{k=1}^\infty$ converges to $f$.

Therefore, $c_c$ is not complete.

Define $$\begin{align}T:c_c&\to c_c\quad\text{by}\\f=(f_1,f_2,f_3,\dots)&\mapsto(\frac{f_1}{1},\frac{f_2}{2},\frac{f_3}{3},\dots).\end{align}$$

Let $f,g\in c_c$ and let $\lambda\in\mathbb C$. Then$$\begin{align}T(\lambda f+g)&=(\frac{\lambda f_1+g_1}{1},\frac{\lambda f_2+g_2}{2},\frac{\lambda f_3+g_3}{3},\dots)\\&=(\lambda\frac{f_1}{1}+\frac{g_1}{1},\lambda\frac{f_2}{2}+\frac{g_2}{2},\lambda\frac{f_3}{3}+\frac{g_3}{3},\dots)\\&=\lambda(\frac{f_1}{1},\frac{f_2}{2},\frac{f_3}{3},\dots)+(\frac{g_1}{1},\frac{g_2}{2},\frac{g_3}{3},\dots)\\&=\vphantom{\frac11}\lambda T(f)+T(g).\end{align}$$

Let $f=(f_1,f_2,\dots)\in c_c$ and note that$$\|T(f)\|_\sup=\max_{l\in\mathbb N}\{\frac{|f_l|}{l}\}\leq\max_{l\in\mathbb N}\{\frac{\|f\|_\sup}{l}\}=\|f\|_\sup.$$

Since $T$ is linear, $T$ is injective.

Let $f=(f_1,f_2,f_3,\dots)\in c_c$ and $g=(1f_1,2f_2,3f_3,\dots)\in c_c$. Then $T(g)=f$.

Define $S:c_c\to c_c$ by $f=(f_1,f_2,f_3,\dots)\mapsto(1f_1,2f_2,3f_3,\dots)$. Then $S\circ T(f)=T\circ S(f)=f$ so that $S=T^{-1}$.

Suppose that $M$ is such that $\|T^{-1}(f)\|_\sup\leq M\|f\|_\sup$ for every $f\in c_c$, let $k\in\mathbb N$ such that $k>M$, and define $g:\mathbb N\to\mathbb C$ by $l\mapsto1$ if $l=k$ and $l\mapsto0$ otherwise. Then$$\|T^{-1}(g)\|_\sup=\max_{l\in\mathbb N}\{|lg_l|\}=k=k\|g\|_\sup>M\|g\|_\sup,$$which is a contradiction.

Math Crumbs

A Blog About Mathematics and Programming • 🎉 Celebrating 10 Years 🎉

Menu

Effective Python 16 - 20

Item #16: Prefer get over in and KeyError to handle missing dictionary keys.

Item #17: Prefer defaultdict over setdefault to handle missing items in internal state.

Item #18: Know how to construct key-dependent default values with missing.

Item #19: Never unpack more than three variables when functions return multiple values.

Item #20: Prefer raising exceptions to returning None.

A Good Robot That Is Not So Good

Effective Python 11 - 15

Item #11: Know how to slice sequences.

Item #12: Avoid striding and slicing in a single expression.

Item #13: Prefer catch-all unpacking over slicing.

Item #14: Sort by complex criteria using the key parameter.

Item #15: Be cautious when relying on dict insertion ordering.

Effective Python 06 - 10

Item #06: Prefer multiple assignment unpacking over indexing.

Item #07: Prefer enumerate over range.

Item #08: Use zip to process iterators in parallel.

Item #09: Avoid else blocks after for and while loops.

Item #10: Prevent repetition with assignment expressions.

Effective Python 01 - 05

Item #01: Know which version of Python you are using.

Item #02: Follow the most recent PEP style guide.

Item #03: Know the differences between bytes and str.

Item #04: Prefer interpolated f-strings over C-style format strings and str.format.

Item #05: Write helper functions instead of complex expressions.

Munkres §51: Homotopy of Paths

Two Pathological Examples in Functional Analysis

Blog Archive

Menu

Item #16: Prefer get over in and KeyError to handle missing dictionary keys.

Item #17: Prefer defaultdict over setdefault to handle missing items in internal state.

Item #18: Know how to construct key-dependent default values with __missing__.

Item #19: Never unpack more than three variables when functions return multiple values.

Item #20: Prefer raising exceptions to returning None.

Item #11: Know how to slice sequences.

Item #12: Avoid striding and slicing in a single expression.

Item #13: Prefer catch-all unpacking over slicing.

Item #14: Sort by complex criteria using the key parameter.

Item #15: Be cautious when relying on dict insertion ordering.

Item #06: Prefer multiple assignment unpacking over indexing.

Item #07: Prefer enumerate over range.

Item #08: Use zip to process iterators in parallel.

Item #09: Avoid else blocks after for and while loops.

Item #10: Prevent repetition with assignment expressions.

Item #01: Know which version of Python you are using.

Item #02: Follow the most recent PEP style guide.

Item #03: Know the differences between bytes and str.

Item #04: Prefer interpolated f-strings over C-style format strings and str.format.

Item #05: Write helper functions instead of complex expressions.

Blog Archive

Item #18: Know how to construct key-dependent default values with missing.