Effective Python 56 - 60

Click here for the first post, which contains the context of this series.

Item #56: Know how to recognize when concurrency is necessary.

In this item, the author creates Conway's Game of Life and asks about its scalability in the context of a MMO. He summarizes it as follows: "Python provides many built-in tools for achieving fan-out and fan-in with various trade-offs. You should understand the pros and cons of each approach and choose the best tool for the job, depending on the situation."

Item #57: Avoid creating new Thread instances for on-demand fan-out.

Consider Conway's Game of Life, mentioned in the previous item, and suppose that you create a Thread instance for each cell. This will work, but there are tradeoffs:
Thread instances require special tools (like Lock) to coordinate among themselves.
Each Thread instance requires about 8 MB, which is high.
Starting a Thread instance and their subsequent context switching is costly.
Thread instances do not provide a built-in way to re-raise exceptions back to their callers.

Item #58: Understand how using Queue for concurrency requires refactoring.

In this item, the author refactors Conway's Game of Life from the previous item in an attempt to showcase the difficulty of using Queue. He summarizes it as follows:
Using Queue instances with a fixed number of Thread instances improves the scalability of fan-in and fan-out.
It is difficult to refactor existing code to use Queue.
Using Queue has a fundamental limit to the total amount of I/O parallelism.

Item #59: Consider ThreadPoolExecutor when threads are necessary for concurrency.

In this item, the author uses ThreadPoolExecutor from concurrent.futures to address Conway's Game of Life from the previous items. It takes the best of both the previously discussed worlds (Thread and Queue) without the boilerplate. Nevertheless, it still does not scale well in terms of fan-out.

Item #60: Achieve highly concurrent I/O with coroutines.

In this item, the author introduces the keywords async and await, introduces the built-in library asyncio, and uses them to address Conway's Game of Life in an incredibly optimal way. Here is a simple code snippet that illustrates their use:

import asyncio
async def blocking_io(i):
await asyncio.sleep(1)
return f'My ID is {i} and I waited 1 second.'
async def func():
results = []
for i in range(10):
results.append(blocking_io(i))
return await asyncio.gather(*results)
print(asyncio.run(func()))

Effective Python 51 - 55

Click here for the first post, which contains the context of this series.

Item #51: Prefer class decorators over metaclasses.

Consider the following decorator:

from functools import wraps
def func_log(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
            result = func(*args, **kwargs)
            return result
        except Exception as exception:
            result = exception
            raise
        finally:
            print(f'{func.__name__}({args},{kwargs})->{result}')
    return wrapper

Suppose that you want to use it to log a dictionary:

class FuncLogDict(dict):
    @func_log
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    @func_log
    def __getitem__(self, *args, **kwargs):
        super().__getitem__(*args, **kwargs)
    @func_log
    def __setitem__(self, *args, **kwargs):
        super().__setitem__(*args, **kwargs)
    # ...
d = FuncLogDict()
d['foo'] = 'bar'
d['foo']

This is redundant. Use a class decorator instead:

import types
log_types = (
    types.MethodType,
    types.FunctionType,
    types.BuiltinMethodType,
    types.BuiltinFunctionType,
    types.MethodDescriptorType,
    types.ClassMethodDescriptorType
)
def class_log(instance):
    for key in dir(instance):
        value = getattr(instance, key)
        if isinstance(value, log_types):
            setattr(instance, key, func_log(value))
    return instance
@class_log
class ClassLogDict(dict):
    pass
d = ClassLogDict()
d['foo'] = 'bar'
d['foo']

Item #52: Use subprocess to manage child processes.

I skip this item since it depends heavily on the operating system on which Python is run but recommend perusing the documentation of subprocess and refreshing one's memory about pipes.

Item #53: Use threads for blocking I/O, avoid for parallelism.

Although the global interpreter lock (GIL) does not allow threads to run in parallel, they are useful for doing blocking I/O at the same time as computation.

from threading import Thread

class Factorize(Thread):

def __init__(self, number):

super().__init__()

self.number = number

def run(self):

self.factors = [1]

for i in range(2, self.number):

if not self.number % i:

self.factors.append(i)

self.factors.append(self.number)

threads = []

for number in [2139079, 1214759, 1516637, 1852285]:

thread = Factorize(number)

thread.start()

threads.append(thread)

for thread in threads:

thread.join()

print(f'{thread.number}: {thread.factors}')

Item #54: Use Lock to prevent data races in threads.

Consider

from threading import Thread

class Counter:

def __init__(self):

self.count = 0

def increment(self):

self.count += 1

def worker(counter, total):

for _ in range(total):

counter.increment()

total = 10 ** 5

counter = Counter()

threads = []

for _ in range(5):

thread = Thread(target=worker, args=(counter, total))

thread.start()

threads.append(thread)

for thread in threads:

thread.join()

print('expected:', total * 5, 'actual:', counter.count)

A run of this code gave me the output:

expected: 500000 actual: 406246

This is due to a race condition. One way to address it is to use the Lock class, which is a mutex:

# ...

from threading import Lock

class Counter:

def __init__(self):

self.lock = Lock()

self.count = 0

def increment(self):

with self.lock:

self.count += 1

# ...

Item #55: Use Queue to coordinate work between threads.

Suppose that you want to do something (ideally I/O bound) that can be structured as a pipeline. You can use multiple threads to significantly speed it up, and you can use Queue to coordinate them. Here is an abstract example:

from queue import Queue

from threading import Thread

class MyQueue(Queue):

SENTINEL = object()

def close(self):

self.put(self.SENTINEL)

def __iter__(self):

while True:

item = self.get()

try:

if item == self.SENTINEL:

return

yield item

finally:

self.task_done()

class MyWorker(Thread):

def __init__(self, func, in_queue, out_queue):

super().__init__()

self.func = func

self.in_queue = in_queue

self.out_queue = out_queue

def run(self):

for item in self.in_queue:

self.out_queue.put(self.func(item))

def func_1(item):

return item

def func_2(item):

return item

def func_3(item):

return item

queue_1 = MyQueue()

queue_2 = MyQueue()

queue_3 = MyQueue()

queue_4 = MyQueue()

threads = [

MyWorker(func_1, queue_1, queue_2) for _ in range(10)

] + [

MyWorker(func_2, queue_2, queue_3) for _ in range(10)

] + [

MyWorker(func_3, queue_3, queue_4) for _ in range(10)

]

for thread in threads:

thread.start()

for i in range(100):

queue_1.put(i)

for queue in [queue_1, queue_2, queue_3]:

for _ in range(10):

queue.close()

queue.join()

for thread in threads:

thread.join()

print(queue_4.qsize())

Effective Python 46 - 50

Click here for the first post, which contains the context of this series.

Item #46: Use descriptors for reusable @property methods.

Consider

class GradeBook:

def __init__(self, grade_1=0, grade_2=0):

self._grade_1 = grade_1

self._grade_2 = grade_2

@staticmethod

def is_valid(value):

if not 0 <= value <= 100:

raise ValueError

@property

def grade_1(self):

return self._grade_1

@grade_1.setter

def grade_1(self, value):

self.is_valid(value)

self._grade_1 = value

@property

def grade_2(self):

return self._grade_2

@grade_2.setter

def grade_2(self, value):

self.is_valid(value)

self._grade_2 = value

Adding grade_3, grade_4, ... requires duplicating code, and creating a new class with similar functionality requires also duplicating is_valid.

This can be addressed using a descriptor:

from weakref import WeakKeyDictionary

class Grade:

def __init__(self):

self._values = WeakKeyDictionary()

def __get__(self, instance, instance_type):

return self._values.get(instance, 0)

def __set__(self, instance, value):

if not 0 <= value <= 100:

raise ValueError

self._values[instance] = value

class GradeBook:

grade_1 = Grade()

grade_2 = Grade()

weakref prevents memory leaks.

Item #47: Use getattr, getattribute, and setattr for lazy attributes.

Note that

class I47:

def __getattr__(self, name):

self.__setattr__(name, None)

return None

i47 = I47()

i47.test # Calls __getattr__.

i47.test # Doesn't call __getattr__.

Also note that

class I47:

def __getattribute__(self, name):

self.__setattr__(name, None)

return None

i47 = I47()

i47.test # Calls __getattribute__.

There are interesting use cases for these overloads, like logging. As before, be mindful of the use of super() to avoid infinite recursion.

Item #48: Validate subclasses with __init_subclass__.

Consider

class Polygon:

sides = None

def __init_subclass__(cls):

super().__init_subclass__()

if cls.sides is None or cls.sides < 3:

raise ValueError('Polygons must have more than 2 sides.')

class Triangle(Polygon):

sides = 3

print(1)

class Line(Polygon):

print(2)

sides = 2

print(3)

print(4)

This throws an exception after printing 3 but before printing 4.

Although the use of super().__init_subclass__() is unnecessary here, it is recommended in order to handle multiple inheritance with classes that implement __init_subclass__.

This is only one use case of __init_subclass__.

Item #49: Register class existence with __init_subclass__.

Here is another interesting use case of __init_subclass__:

import json

class_registry = {}

def deserialize(data):

params = json.loads(data)

return class_registry[params['class']](*params['args'])

class Serializable:

def __init__(self, *args):

self.args = args

def serialize(self):

return json.dumps({

'class': self.__class__.__name__,

'args': self.args

})

def __init_subclass__(cls):

class_registry[cls.__name__] = cls

class Point3D(Serializable):

def __init__(self, x, y, z):

super().__init__(x, y, z)

self.x = x

self.y = y

self.z = z

For such reasons, keeping a class registry is often useful.

Item #50: Annotate class attributes with __set_name__.

Consider

class Grade:

def __set_name__(self, _, name):

self.name = name

self.protected_name = '_' + name

def __get__(self, instance, _):

return getattr(instance, self.protected_name, 0)

def __set__(self, instance, value):

if not 0 <= value <= 100:

raise ValueError

setattr(instance, self.protected_name, value)

class GradeBook:

grade_1 = Grade()

grade_2 = Grade()

gb = GradeBook()

print(f'{gb.grade_1}, {gb.grade_2}, {gb.__dict__}')

gb.grade_1 = 91

gb.grade_2 = 98

print(f'{gb.grade_1}, {gb.grade_2}, {gb.__dict__}')

Compare this to Item 46.

Effective Python 41 - 45

Click here for the first post, which contains the context of this series.

Item #40: Consider composing functionality with mix-in classes.

Although you should avoid multiple inheritance, you can use a mix-in class, which is a class that only defines a small set of methods.

Consider the desire to represent an object as a dictionary:

class ToDictMixin:

def to_dict(self):

return self._traverse_dict(self.__dict__)

def _traverse_dict(self, instance_dict):

output = {}

for key, value in instance_dict.items():

output[key] = self._traverse(key, value)

return output

def _traverse(self, key, value):

if isinstance(value, ToDictMixin):

return value.to_dict()

if isinstance(value, dict):

return self._traverse_dict(value)

if isinstance(value, list):

return [self._traverse(key, i) for i in value]

if hasattr(value, '__dict__'):

return self._traverse_dict(value.__dict__)

return value

and consider the desire to represent an object as a JSON string:

class ToJsonMixin:

@classmethod

def from_json(cls, data):

kwargs = json.loads(data)

return cls(**kwargs)

def to_json(self):

return json.dumps(self.to_dict())

Then

class BinaryTree(ToDictMixin, ToJsonMixin):

def __init__(self, value, left=None, right=None):

self.value = value

self.left = left

self.right = right

has a lot of useful functionality.

Item #41: Prefer public attributes over private ones.

Consider

class MyClass:

def __init__(self, value):

self.__value = value

my_class = MyClass(5)

Then my_class.__value will raise an AttributeError, but it still can be accessed by my_class._MyClass__value, so Python does not really enforce privacy.

To avoid making your code cumbersome and brittle, avoid doing this. Instead, do this

class MyClass:

def __init__(self, value):

self._value = value

and document that self._value is protected.

Item #43: Inherit from collections.abc for custom container types.

I can extend list.

class MyList(list):

def __init__(self, elements):

super().__init__(elements)

def frequencies(self):

count = {}

for item in self:

count[item] = count.get(item, 0) + 1

return count

But what if I want to do this for something that is not inherently a list?

class BinaryNode:

def __init__(self, value, left=None, right=None):

self.value = value

self.left = left

self.right = right

def _traverse(self):

if self.left:

yield from self.left._traverse()

yield self

if self.right:

yield from self.right._traverse()

def __getitem__(self, index):

for i, item in enumerate(self._traverse()):

if i == index:

return item

raise IndexError('binary tree index out of range')

However, using len on a BinaryNode object will raise an exception. The question becomes: what is the least amount of methods that I need to implement to behave like a list? Enter collections.abc:

from collections.abc import Sequence

class BinaryNode(Sequence):

# ...

This raises an exception with the names of the methods that need to be implemented.

Item #44: Use plain attributes instead of setter and getter methods.

This is done in other languages:

class IWasACSharpDev:

def __init__(self, value):

self._value = value

def get_value(self):

return self._value

def set_value(self, value):

self._value = value

But this is not Pythonic. We use @property if we need to do this:

class IAmAPythonDev:

def __init__(self, value):

self._value = value

@property

def value(self):

return self._value

@value.setter

def value(self, value):

self._value = value

Make sure to keep them short and quick.

Item #45: Consider @property instead of refactoring attributes.

Consider

class Person:

def __init__(self, age):

self.age = age

suppose that the codebase sets and gets the ages of countless instances of this class, and suppose that a new law requires that a person's new age be three more than twice their old age. Only a small change needs to be made:

class Person:

def __init__(self, age):

self._age = age

@property

def age(self):

return self._age * 2 + 3

@age.setter

def age(self, age):

self._age = age

Math Crumbs

A Blog About Mathematics and Programming • 🎉 Celebrating 10 Years 🎉

Menu

Effective Python 56 - 60

Item #56: Know how to recognize when concurrency is necessary.

Item #57: Avoid creating new Thread instances for on-demand fan-out.

Item #58: Understand how using Queue for concurrency requires refactoring.

Item #59: Consider ThreadPoolExecutor when threads are necessary for concurrency.

In this item, the author uses ThreadPoolExecutor from concurrent.futures to address Conway's Game of Life from the previous items. It takes the best of both the previously discussed worlds (Thread and Queue) without the boilerplate. Nevertheless, it still does not scale well in terms of fan-out.

Item #60: Achieve highly concurrent I/O with coroutines.

Effective Python 51 - 55

Item #51: Prefer class decorators over metaclasses.

Item #52: Use subprocess to manage child processes.

Item #53: Use threads for blocking I/O, avoid for parallelism.

Item #54: Use Lock to prevent data races in threads.

Item #55: Use Queue to coordinate work between threads.

Effective Python 46 - 50

Item #46: Use descriptors for reusable @property methods.

Item #47: Use getattr, getattribute, and setattr for lazy attributes.

Item #48: Validate subclasses with __init_subclass__.

Item #49: Register class existence with __init_subclass__.

Item #50: Annotate class attributes with __set_name__.

Effective Python 41 - 45

Item #40: Consider composing functionality with mix-in classes.

Item #41: Prefer public attributes over private ones.

Item #43: Inherit from collections.abc for custom container types.

Item #44: Use plain attributes instead of setter and getter methods.

Item #45: Consider @property instead of refactoring attributes.

Blog Archive

Menu

Item #56: Know how to recognize when concurrency is necessary.

Item #57: Avoid creating new Thread instances for on-demand fan-out.

Item #58: Understand how using Queue for concurrency requires refactoring.

Item #59: Consider ThreadPoolExecutor when threads are necessary for concurrency.

In this item, the author uses ThreadPoolExecutor from concurrent.futures to address Conway's Game of Life from the previous items. It takes the best of both the previously discussed worlds (Thread and Queue) without the boilerplate. Nevertheless, it still does not scale well in terms of fan-out.

Item #60: Achieve highly concurrent I/O with coroutines.

Item #51: Prefer class decorators over metaclasses.

Item #52: Use subprocess to manage child processes.

Item #53: Use threads for blocking I/O, avoid for parallelism.

Item #54: Use Lock to prevent data races in threads.

Item #55: Use Queue to coordinate work between threads.

Item #46: Use descriptors for reusable @property methods.

Item #47: Use __getattr__, __getattribute__, and __setattr__ for lazy attributes.

Item #48: Validate subclasses with __init_subclass__.

Item #49: Register class existence with __init_subclass__.

Item #50: Annotate class attributes with __set_name__.

Item #40: Consider composing functionality with mix-in classes.

Item #41: Prefer public attributes over private ones.

Item #43: Inherit from collections.abc for custom container types.

Item #44: Use plain attributes instead of setter and getter methods.

Item #45: Consider @property instead of refactoring attributes.

Blog Archive

Item #47: Use getattr, getattribute, and setattr for lazy attributes.