Effective Python 56 - 60

Click here for the first post, which contains the context of this series.

Item #56: Know how to recognize when concurrency is necessary.

In this item, the author creates Conway's Game of Life and asks about its scalability in the context of a MMO. He summarizes it as follows: "Python provides many built-in tools for achieving fan-out and fan-in with various trade-offs. You should understand the pros and cons of each approach and choose the best tool for the job, depending on the situation."

Item #57: Avoid creating new Thread instances for on-demand fan-out.

Consider Conway's Game of Life, mentioned in the previous item, and suppose that you create a Thread instance for each cell. This will work, but there are tradeoffs:
  • Thread instances require special tools (like Lock) to coordinate among themselves.
  • Each Thread instance requires about 8 MB, which is high.
  • Starting a Thread instance and their subsequent context switching is costly.
  • Thread instances do not provide a built-in way to re-raise exceptions back to their callers.

Item #58: Understand how using Queue for concurrency requires refactoring.

In this item, the author refactors Conway's Game of Life from the previous item in an attempt to showcase the difficulty of using Queue. He summarizes it as follows:
  • Using Queue instances with a fixed number of Thread instances improves the scalability of fan-in and fan-out.
  • It is difficult to refactor existing code to use Queue.
  • Using Queue has a fundamental limit to the total amount of I/O parallelism.

Item #59: Consider ThreadPoolExecutor when threads are necessary for concurrency.

In this item, the author uses ThreadPoolExecutor from concurrent.futures to address Conway's Game of Life from the previous items. It takes the best of both the previously discussed worlds (Thread and Queue) without the boilerplate. Nevertheless, it still does not scale well in terms of fan-out.

Item #60: Achieve highly concurrent I/O with coroutines.

In this item, the author introduces the keywords async and await, introduces the built-in library asyncio, and uses them to address Conway's Game of Life in an incredibly optimal way. Here is a simple code snippet that illustrates their use:

import asyncio
async def blocking_io(i):
    await asyncio.sleep(1)
    return f'My ID is {i} and I waited 1 second.'
async def func():
    results = []
    for i in range(10):
        results.append(blocking_io(i))
    return await asyncio.gather(*results)
print(asyncio.run(func()))

Effective Python 51 - 55

Click here for the first post, which contains the context of this series.

Item #51: Prefer class decorators over metaclasses.

Consider the following decorator:

from functools import wraps
def func_log(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
            result = func(*args, **kwargs)
            return result
        except Exception as exception:
            result = exception
            raise
        finally:
            print(f'{func.__name__}({args},{kwargs})->{result}')
    return wrapper

 
Suppose that you want to use it to log a dictionary:

class FuncLogDict(dict):
    @func_log
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    @func_log
    def __getitem__(self, *args, **kwargs):
        super().__getitem__(*args, **kwargs)
    @func_log
    def __setitem__(self, *args, **kwargs):
        super().__setitem__(*args, **kwargs)
    # ...
d = FuncLogDict()
d['foo'] = 'bar'
d['foo']

 
This is redundant. Use a class decorator instead:

import types
log_types = (
    types.MethodType,
    types.FunctionType,
    types.BuiltinMethodType,
    types.BuiltinFunctionType,
    types.MethodDescriptorType,
    types.ClassMethodDescriptorType
)
def class_log(instance):
    for key in dir(instance):
        value = getattr(instance, key)
        if isinstance(value, log_types):
            setattr(instance, key, func_log(value))
    return instance
@class_log
class ClassLogDict(dict):
    pass
d = ClassLogDict()
d['foo'] = 'bar'
d['foo']

Item #52: Use subprocess to manage child processes.

I skip this item since it depends heavily on the operating system on which Python is run but recommend perusing the documentation of subprocess and refreshing one's memory about pipes.

Item #53: Use threads for blocking I/O, avoid for parallelism.

Although the global interpreter lock (GIL) does not allow threads to run in parallel, they are useful for doing blocking I/O at the same time as computation.

from threading import Thread
class Factorize(Thread):
    def __init__(self, number):
        super().__init__()
        self.number = number
    def run(self):
        self.factors = [1]
        for i in range(2, self.number):
            if not self.number % i:
                self.factors.append(i)
        self.factors.append(self.number)
threads = []
for number in [2139079, 1214759, 1516637, 1852285]:
    thread = Factorize(number)
    thread.start()
    threads.append(thread)
for thread in threads:
    thread.join()
    print(f'{thread.number}: {thread.factors}')

Item #54: Use Lock to prevent data races in threads.

Consider

from threading import Thread
class Counter:
    def __init__(self):
        self.count = 0
    def increment(self):
        self.count += 1
def worker(counter, total):
    for _ in range(total):
        counter.increment()
total = 10 ** 5
counter = Counter()
threads = []
for _ in range(5):
    thread = Thread(target=worker, args=(counter, total))
    thread.start()
    threads.append(thread)
for thread in threads:
    thread.join()
print('expected:', total * 5, 'actual:', counter.count)

A run of this code gave me the output:

expected: 500000 actual: 406246

This is due to a race condition. One way to address it is to use the Lock class, which is a mutex:

# ...
from threading import Lock
class Counter:
    def __init__(self):
        self.lock = Lock()
        self.count = 0
    def increment(self):
        with self.lock:
            self.count += 1
# ...

Item #55: Use Queue to coordinate work between threads.

Suppose that you want to do something (ideally I/O bound) that can be structured as a pipeline. You can use multiple threads to significantly speed it up, and you can use Queue to coordinate them. Here is an abstract example:

from queue import Queue
from threading import Thread
class MyQueue(Queue):
    SENTINEL = object()
    def close(self):
        self.put(self.SENTINEL)
    def __iter__(self):
        while True:
            item = self.get()
            try:
                if item == self.SENTINEL:
                    return
                yield item
            finally:
                self.task_done()
class MyWorker(Thread):
    def __init__(self, func, in_queue, out_queue):
        super().__init__()
        self.func = func
        self.in_queue = in_queue
        self.out_queue = out_queue
    def run(self):
        for item in self.in_queue:
            self.out_queue.put(self.func(item))
def func_1(item):
    return item
def func_2(item):
    return item
def func_3(item):
    return item
queue_1 = MyQueue()
queue_2 = MyQueue()
queue_3 = MyQueue()
queue_4 = MyQueue()
threads = [
    MyWorker(func_1, queue_1, queue_2) for _ in range(10)
] + [
    MyWorker(func_2, queue_2, queue_3) for _ in range(10)
] + [
    MyWorker(func_3, queue_3, queue_4) for _ in range(10)
]
for thread in threads:
    thread.start()
for i in range(100):
    queue_1.put(i)
for queue in [queue_1, queue_2, queue_3]:
    for _ in range(10):
        queue.close()
    queue.join()
for thread in threads:
    thread.join()
print(queue_4.qsize())

Effective Python 46 - 50

Click here for the first post, which contains the context of this series.

Item #46: Use descriptors for reusable @property methods.

Consider

class GradeBook:
    def __init__(self, grade_1=0, grade_2=0):
        self._grade_1 = grade_1
        self._grade_2 = grade_2
    @staticmethod
    def is_valid(value):
        if not 0 <= value <= 100:
            raise ValueError
    @property
    def grade_1(self):
        return self._grade_1
    @grade_1.setter
    def grade_1(self, value):
        self.is_valid(value)
        self._grade_1 = value
    @property
    def grade_2(self):
        return self._grade_2
    @grade_2.setter
    def grade_2(self, value):
        self.is_valid(value)
        self._grade_2 = value

Adding grade_3, grade_4, ... requires duplicating code, and creating a new class with similar functionality requires also duplicating is_valid.

This can be addressed using a descriptor:

from weakref import WeakKeyDictionary
class Grade:
    def __init__(self):
        self._values = WeakKeyDictionary()
    def __get__(self, instance, instance_type):
        return self._values.get(instance, 0)
    def __set__(self, instance, value):
        if not 0 <= value <= 100:
            raise ValueError
        self._values[instance] = value
class GradeBook:
    grade_1 = Grade()
    grade_2 = Grade()

weakref prevents memory leaks.

Item #47: Use __getattr__, __getattribute__, and __setattr__ for lazy attributes.

Note that

class I47:
    def __getattr__(self, name):
        self.__setattr__(name, None)
        return None
i47 = I47()
i47.test # Calls __getattr__.
i47.test # Doesn't call __getattr__.

Also note that

class I47:
    def __getattribute__(self, name):
        self.__setattr__(name, None)
        return None
i47 = I47()
i47.test # Calls __getattribute__.
i47.test # Calls __getattribute__.

There are interesting use cases for these overloads, like logging. As before, be mindful of the use of super() to avoid infinite recursion.

Item #48: Validate subclasses with __init_subclass__.

Consider

class Polygon:
    sides = None
    def __init_subclass__(cls):
        super().__init_subclass__()
        if cls.sides is None or cls.sides < 3:
            raise ValueError('Polygons must have more than 2 sides.')
class Triangle(Polygon):
    sides = 3
print(1)
class Line(Polygon):
    print(2)
    sides = 2
    print(3)
print(4)

This throws an exception after printing 3 but before printing 4.

Although the use of super().__init_subclass__() is unnecessary here, it is recommended in order to handle multiple inheritance with classes that implement __init_subclass__.

This is only one use case of __init_subclass__.

Item #49: Register class existence with __init_subclass__.

Here is another interesting use case of __init_subclass__:

import json
class_registry = {}
def deserialize(data):
    params = json.loads(data)
    return class_registry[params['class']](*params['args'])
class Serializable:
    def __init__(self, *args):
        self.args = args
    def serialize(self):
        return json.dumps({
            'class': self.__class__.__name__,
            'args': self.args
        })
    def __init_subclass__(cls):
        class_registry[cls.__name__] = cls
class Point3D(Serializable):
    def __init__(self, x, y, z):
        super().__init__(x, y, z)
        self.x = x
        self.y = y
        self.z = z

For such reasons, keeping a class registry is often useful.

Item #50: Annotate class attributes with __set_name__.

Consider

class Grade:
    def __set_name__(self, _, name):
        self.name = name
        self.protected_name = '_' + name
    def __get__(self, instance, _):
        return getattr(instance, self.protected_name, 0)
    def __set__(self, instance, value):
        if not 0 <= value <= 100:
            raise ValueError
        setattr(instance, self.protected_name, value)
class GradeBook:
    grade_1 = Grade()
    grade_2 = Grade()
gb = GradeBook()
print(f'{gb.grade_1}, {gb.grade_2}, {gb.__dict__}')
gb.grade_1 = 91
gb.grade_2 = 98
print(f'{gb.grade_1}, {gb.grade_2}, {gb.__dict__}')

Compare this to Item 46.

Effective Python 41 - 45

Click here for the first post, which contains the context of this series.

Item #40: Consider composing functionality with mix-in classes.

Although you should avoid multiple inheritance, you can use a mix-in class, which is a class that only defines a small set of methods.

Consider the desire to represent an object as a dictionary:

class ToDictMixin:
    def to_dict(self):
        return self._traverse_dict(self.__dict__)
    
    def _traverse_dict(self, instance_dict):
        output = {}
        for key, value in instance_dict.items():
            output[key] = self._traverse(key, value)
        return output
    
    def _traverse(self, key, value):
        if isinstance(value, ToDictMixin):
            return value.to_dict()
        if isinstance(value, dict):
            return self._traverse_dict(value)
        if isinstance(value, list):
            return [self._traverse(key, i) for i in value]
        if hasattr(value, '__dict__'):
            return self._traverse_dict(value.__dict__)
        return value

and consider the desire to represent an object as a JSON string:

class ToJsonMixin:
    @classmethod
    def from_json(cls, data):
        kwargs = json.loads(data)
        return cls(**kwargs)
    
    def to_json(self):
        return json.dumps(self.to_dict())

Then

class BinaryTree(ToDictMixin, ToJsonMixin):
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

has a lot of useful functionality.

Item #41: Prefer public attributes over private ones.

Consider

class MyClass:
    def __init__(self, value):
        self.__value = value


my_class = MyClass(5)

Then my_class.__value will raise an AttributeError, but it still can be accessed by my_class._MyClass__value, so Python does not really enforce privacy.

To avoid making your code cumbersome and brittle, avoid doing this. Instead, do this

class MyClass:
    def __init__(self, value):
        self._value = value

and document that self._value is protected.

Item #43: Inherit from collections.abc for custom container types.

I can extend list.

class MyList(list):
    def __init__(self, elements):
        super().__init__(elements)
    
    def frequencies(self):
        count = {}
        for item in self:
            count[item] = count.get(item, 0) + 1
        return count

But what if I want to do this for something that is not inherently a list?

class BinaryNode:
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right
    
    def _traverse(self):
        if self.left:
            yield from self.left._traverse()
        yield self
        if self.right:
            yield from self.right._traverse()
    
    def __getitem__(self, index):
        for i, item in enumerate(self._traverse()):
            if i == index:
                return item
        raise IndexError('binary tree index out of range')

However, using len on a BinaryNode object will raise an exception. The question becomes: what is the least amount of methods that I need to implement to behave like a list? Enter collections.abc:

from collections.abc import Sequence


class BinaryNode(Sequence):
    # ...

This raises an exception with the names of the methods that need to be implemented.

Item #44: Use plain attributes instead of setter and getter methods.

This is done in other languages:

class IWasACSharpDev:
    def __init__(self, value):
        self._value = value
    
    def get_value(self):
        return self._value
    
    def set_value(self, value):
        self._value = value

But this is not Pythonic. We use @property if we need to do this:

class IAmAPythonDev:
    def __init__(self, value):
        self._value = value

    @property    
    def value(self):
        return self._value
    
    @value.setter
    def value(self, value):
        self._value = value

Make sure to keep them short and quick.

Item #45: Consider @property instead of refactoring attributes.

Consider

class Person:
    def __init__(self, age):
        self.age = age

suppose that the codebase sets and gets the ages of countless instances of this class, and suppose that a new law requires that a person's new age be three more than twice their old age. Only a small change needs to be made:

class Person:
    def __init__(self, age):
        self._age = age
    
    @property
    def age(self):
        return self._age * 2 + 3
    
    @age.setter
    def age(self, age):
        self._age = age