Effective Python 36 - 40

Click here for the first post, which contains the context of this series.

Item #36: Consider itertools for working with iterators and generators.

These are the most important ones:
  • chain
  • repeat
  • cycle
  • tree
  • zip_longest
  • islice
  • takewhile
  • dropwhile
  • filterfalse
  • accumulate
  • product
  • permutations
  • combinations
  • combinations_with_replacement

Item #37: Compose classes instead of nesting many levels of built-in types.

Consider

class Gradebook:
    def __init__(self):
        self._grades = {}

    def add_student(self, name):
        self._grades[name] = defaultdict(list)
    
    def report_grade(self, name, subject, score, weight, notes):
        self._grades[name][subject].append((score, weight, notes))
    
    def average_grade(self, name):
        return sum(sum(score * weight for score, weight, _ in grades) for grades in self._grades[name].values()) / len(self._grades[name])

Note that it composes dictionaries and long tuples. This is confusing. Instead, do this:

Grade = namedtuple('Grade', 'score weight')


class Subject:
    def __init__(self):
        self._grades = []
    
    def report_grade(self, score, weight):
        self._grades.append(Grade(score, weight))
    
    def average_grade(self):
        return sum(grade.score * grade.weight for grade in self._grades)


class Student:
    def __init__(self):
        self._subjects = defaultdict(Subject)
    
    def get_subject(self, name):
        return self._subjects[name]
    
    def average_grade(self):
        return sum(subject.average_grade() for subject in self._subjects.values()) / len(self._subjects)


class Gradebook:
    def __init__(self):
        self._students = defaultdict(Student)
    
    def get_student(self, name):
        return self._students[name]

Although it is longer, it is easier to read and extend.

Item #38: Accept functions instead of classes for simple interfaces.

Python has first-class functions, which means that "functions and methods can be passed around and referenced like any other value in the language":

def my_key(x):
    return len(x)


my_list = ['Socrates', 'Archimedes', 'Plato', 'Aristotle']
my_list.sort(key=my_key)

But if you want to maintain state, then you can do this:

class MyKey:
    def __init__(self):
        self.count = 0
    
    def __call__(self, x):
        self.count += 1
        return len(x)


my_key = MyKey()

my_list = ['Socrates', 'Archimedes', 'Plato', 'Aristotle']
my_list.sort(key=my_key)

Item #39: Use @classmethod polymorphism to construct objects generically.

The following script is self-explanatory:

class Animal:
    def __init__(self, name):
        self.name = name
    
    def sound(self):
        raise NotImplementedError

    @classmethod
    def create_animals(cls):
        raise NotImplementedError


class Dog(Animal):
    def sound(self):
        return f'{self.name} says woof'
    
    @classmethod
    def create_animals(cls):
        return [cls(name) for name in ['Max', 'Buddy', 'Charlie']]


class Cat(Animal):
    def sound(self):
        return f'{self.name} says meow'
    
    @classmethod
    def create_animals(cls):
        return [cls(name) for name in ['Simba', 'Milo', 'Tiger']]


for animal in Dog.create_animals() + Cat.create_animals():
    print(animal.sound())

Item #40: Initialize parent classes with super.

Although multiple inheritance is not good, consider the following script, which is an example of diamond inheritance:

class MyBaseClass:
    def __init__(self, value):
        self.value = value


class TimesTwo(MyBaseClass):
    def __init__(self, value):
        MyBaseClass.__init__(self, value)
        self.value *= 2


class PlusFive(MyBaseClass):
    def __init__(self, value):
        MyBaseClass.__init__(self, value)
        self.value += 5


class MyClass(TimesTwo, PlusFive):
    def __init__(self, value):
        TimesTwo.__init__(self, value) # !
        PlusFive.__init__(self, value)


print(MyClass(3).value)

This does not work as expected; the indicated line is redundant. The correct way to achieve this is to use super:

class MyBaseClass:
    def __init__(self, value):
        self.value = value


class TimesTwo(MyBaseClass):
    def __init__(self, value):
        super().__init__(value)
        self.value *= 2


class PlusFive(MyBaseClass):
    def __init__(self, value):
        super().__init__(value)
        self.value += 5


class MyClass(TimesTwo, PlusFive):
    def __init__(self, value):
        super().__init__(value)


print(MyClass(3).value)

Effective Python 31 - 35

Click here for the first post, which contains the context of this series.

Item #31: Be defensive when iterating over arguments.

Consider

def normalize(X):
    s = sum(X)
    return [x / s for x in X]

normalize works as expected if X is a container and does not work as expected if X is a generator; this is because sum(X) exhausts the generator. Address this by checking whether X is a generator with iter(X) == X or isinstance(X, Iterator), where Iterator is imported from collections.abc.

Item #32: Consider generator expressions for large list comprehensions.

Let X be an extraordinarily large iterable. Then

for y in [f(x) for x in X]: pass

will load an extraordinarily large object into memory. On the other hand,

for y in (f(x) for x in X): pass

does not have this problem.

Item #33: Compose multiple generators with yield from.

def my_gen():
    yield from gen_1()
    yield from gen_2()
    yield from gen_3()

is shorthand for and performs better than

def my_gen():
    for i in gen_1():
        yield i
    for i in gen_2():
        yield i
    for i in gen_3():
        yield i

Item #34: Avoid injecting data into generators with send.

Consider

def double_inputs():
    while True:
        x = yield
        yield x * 2


gen = double_inputs()

next(gen)
print(gen.send(10))

next(gen)
print(gen.send(6))

next(gen)
print(gen.send(94.3))

>>>

20
12
188.6

Avoid doing this.

Item #35: Avoid causing state transitions in generators with throw.

Consider

def my_gen():
    i = 0
    while i < 10:
        try:
            i += 1
            yield i
        except GeneratorExit:
            return
        except BaseException:
            i = -1


it = my_gen()

print(next(it))
print(next(it))
print(next(it))
it.throw(BaseException())
print(next(it))

>>>

1
2
3
1

Avoid doing this.

Effective Python 26 - 30

Click here for the first post, which contains the context of this series.

Item #26: Define function decorators with functools.wraps.

Function decorators add functionality before and after the execution of the functions that they decorate. For example,

def fib(n):
    if n < 3:
        return 1
    return fib(n - 1) + fib(n - 2)

fib(500)

will probably never terminate. An idea is to cache:

cache = {}

def fib(n):
    if n in cache:
        return cache[n]
    if n < 3:
        return 1
    cache[n] = fib(n - 1) + fib(n - 2)
    return cache[n]

fib(500)

But what if you want to do this to multiple functions? An idea is to use decorators:

from functools import wraps

def my_cache(func):
    cache = {}
    @wraps(func)
    def wrapper(n):
        if n in cache:
            return cache[n]
        cache[n] = func(n)
        return cache[n]
    return wrapper

@my_cache # decorator
def fib(n):
    if n < 3:
        return 1
    return fib(n - 1) + fib(n - 2)

@my_cache
def new_fib(n):
    if n < 4:
        return 1
    return new_fib(n - 1) + new_fib(n - 2) + new_fib(n - 3)

fib(500)
new_fib(500)

You can add more than one decorator to a function; note the order in which they are applied.

Item #27: Use comprehensions instead of map and filter.

Let

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

Suppose that you are doing this:

b = []
for c in a:
    if c % 2:
        b.append(c ** 2)

A better way is to use a comprehension:

b = [c ** 2 for c in a if c % 2]

Avoid using map and filter:

b = list(map(lambda c: c ** 2, filter(lambda c: c % 2, a)))

Comprehensions can also be used with dict and set.

Item #28: Avoid more than two control sub-expressions in comprehensions.

You can stack comprehensions:

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [[x, y] for x in a if x % 2 for y in a if not y % 2]

Avoid stacking more than two, and note the order in which the for loops are executed.

Item #29: Avoid repeated work in comprehensions by using assignment expressions.

Consider

[expensive_function(x) for x in X if meets_condition(expensive_function(x))]

This is repeated work because expensive_function could be called twice for each x. Instead, use an assignment expression:

[result for x in X if meets_condition(result := expensive_function(x))]

Note that result leaks out of the scope of the comprehension.

Item #30: Consider generators instead of returning lists.

Consider a function that returns a list of anagrams of a word:

def get_anagrams(word, anagram='', is_free=None):
    if is_free is None:
        is_free = [True for _ in word]
    if not any(is_free):
        return [anagram]
    else:
        anagrams = []
        for i, _ in enumerate(is_free):
            if is_free[i]:
                is_free[i] = False
                anagrams += get_anagrams(word, anagram + word[i], is_free)
                is_free[i] = True
        return anagrams

Suppose that you want to print the first 10 anagrams of the word "incomprehensible":

for i, anagram in enumerate(get_anagrams('incomprehensible')):
    print(anagram)
    if i == 9:
        break

This will probably never terminate: the problem is that get_anagrams('incomprehensible') attempts to create a list of the 2.092279e+13 anagrams of this word. A generator does not have this problem:

def get_anagrams(word, anagram='', is_free=None):
    if is_free is None:
        is_free = [True for _ in word]
    if not any(is_free):
        yield anagram
    else:
        for i, _ in enumerate(is_free):
            if is_free[i]:
                is_free[i] = False
                yield from get_anagrams(word, anagram + word[i], is_free)
                is_free[i] = True

Effective Python 21 - 25

Click here for the first post, which contains the context of this series.

Item #21: Understand how closures interact with variable scope.

Suppose that you have a list L of numbers, have a list G of important numbers, want to sort L while giving priority to the numbers of L that are in G, and want to know if a number in L is in G:

def my_sort(L, G):
    flag = False
    def helper(x):
        if x in G:
            flag = True
            return (0, x)
        return (1, x)
    L.sort(key=helper)
    return flag

This sorts L as expected but returns False: the problem is that flag = True in helper is a new variable due to how scoping works in Python. The fix is to use the keyword nonlocal:

def my_sort(L, G):
    flag = False
    def helper(x):
        nonlocal flag
        if x in G:
            flag = True
            return (0, x)
        return (1, x)
    L.sort(key=helper)
    return flag

However, it is better practice to wrap states in classes:

class MySort:
    def __init__(self, G):
        self.G = G
        self.flag = False

    def __call__(self, x):
        if x in self.G:
            self.flag = True
            return (0, x)
        return (1, x)

def my_sort(L, G):
    my_sort = MySort(G)
    L.sort(key=my_sort)
    return my_sort.flag

Item #22: Reduce visual noise with variable positional arguments.

Suppose that you have the following function:

def my_logger(message, items):
    return f'{message}{", ".join([str(x) for x in items])}'

numbers = [1, 2, 3]
print(my_logger('I like these numbers: ', numbers))
print(my_logger('I like no numbers.', []))

Passing an empty list is noisy. Instead, use positional arguments:

def my_logger(message, *items):
    return f'{message}{", ".join([str(x) for x in items])}'

numbers = [1, 2, 3]
print(my_logger('I like these numbers: ', *numbers))
print(my_logger('I like no numbers.'))

Note that *numbers converts numbers into a tuple, which means that if numbers were a massive generator, this would be resource-intensive.

Moreover, if you were to update the signature of my_logger to something like def my_logger(date, message, *items):, then not updating all of the calls to my_logger would introduce bugs that are hard to detect.

Item #23: Provide optional behavior with keyword arguments.

Note the use of **:

def flow_rate(weight_diff, time_diff, period=1, units_per_kg=1):
    return weight_diff * units_per_kg * period / time_diff

kwargs = {
    'weight_diff': 0.5,
    'time_diff': 3,
    'period': 3600,
    'units_per_kg': 2.2
}

print(flow_rate(**kwargs))

With optional arguments, do not do this: flow_rate(0.5, 3, 3600, 2.2). Instead, do this: flow_rate(0.5, 3, period=3600, units_per_kg=2.2).

Item #24: Use None and Docstring to specify dynamic default arguments.

Suppose that you run the following script:

def append_zero(x=[]):
    x.append(0)
    return x

a = append_zero()
b = append_zero()

It turns out that a and b are the same list, so both look like [0, 0]. This is because lists are dynamic and not static, like strings.

Initialize keyword arguments that have dynamic values with None, and document this in the docstring:

def append_zero(x=None):
    '''Append a zero to a list.

    Args:
        x: list. Defaults to an empty list.
    '''
    if not x:
        x = []
    x.append(0)
    return x

Item #25: Enforce clarity with keyword-only and positional-only arguments.

Suppose that you have the following division function:

def safe_division(number, divisor, ignore_overflow, ignore_zero_division):
    try:
        return number / divisor
    except OverflowError:
        if ignore_overflow:
            return 0
        else:
            raise
    except ZeroDivisionError:
        if ignore_zero_division:
            return float('inf')
        else:
            raise

safe_division(12, 3, True, False)

This is noisy. An improvement would be to change the signature to

def safe_division(number, divisor, ignore_overflow=False, ignore_zero_division=False):
    # ...

safe_division(12, 3, ignore_overflow=True)

The problem is that this is still possible:

safe_division(12, 3, True, False)

Keyword-only arguments cannot be passed by position:

def safe_division(number, divisor, *, ignore_overflow=False, ignore_zero_division=False):
    # ...

safe_division(12, 3, True, False) # This gives an error.

Now, suppose that we change the signature to

def safe_division(numerator, denominator, *, ignore_overflow=False, ignore_zero_division=False):
    # ...

Then this name change could break multiple existing calls to the function.

Positional-only arguments cannot be passed by keyword:

def safe_division(numerator, denominator, /, *, ignore_overflow=False, ignore_zero_division=False):
    # ...

safe_division(numerator=10, denominator=2) # This gives an error.