21. Object Oriented Programming#

This is an somewhat advanced section and is available for reference.

Objects are very powerful, and you can harness this power for your own designs!

21.1. Writing a class#

Writing a class should be second nature, and very natural to you - just as natural as writing a function!

def f(x: float) -> float:
    return x**2


print(f"{f(3) = }")
f(3) = 9
class F:
    def __call__(self, x: float) -> float:
        return x**2


f_inst = F()
print(f"{f_inst(3) = }")
f_inst(3) = 9

21.2. How classes work (Advanced, for self reading)#

Classes are basically bags of objects with a few special features. So what separates them from a dict?

21.2.1. The type#

All objects are “based” on a class (AKA type). You can call type(object) or object.__class__ to get the class of an object. Types can “inherit” from other types, which just means they build on the other. The type is generally the “shared” part for objects, while the object itself holds the “unique” part that makes an instance special. Types tend to have methods, and instances tend to have member data, but since everything is an object, and functions really are just callable objects, the distinction between a variable and a function is a bit fuzzy. And that’s okay.

Think of it this way: I am an instance of a Person. A Person inherits from a LivingThing. So I am an object, a Person is my class, and a Person inherits from LivingThing. You could have more levels of inheritance, like Mammal, if you want; in “real life”, we tend to use this a lot, since it helps us structure our thinking. It helps structure our programs, too!

21.2.1.1. The metaclass and how to avoid using it#

Metaclasses may sound scary, but it’s really just a feature of the Python syntax. When you write class X, normally this uses type to make the new class for you - type is the metaclass. You can customize this by setting a different metaclass (usually subclasses of type!). And, most importantly, subclasses inherit metaclasses too, so users are not exposed to metaclasses. This is how abc.ABC works - it’s really just a simple do-nothing class with abc.ABCMeta as the metaclass - and abstract base classes are implemented through a metaclass. The metaclass will check and make sure there are no remaining abstract methods when you are creating an instance of the class.

In modern Python, almost everything you would have done with a metaclass is now possible via special overloads or normal features - classes now always preserve order (thanks to dictionaries becoming ordered), __init_subclass__ lets you customize what happens when users subclass your class, and __prepare__ lets you modify the preparation of a new class namespace.

Let’s do something evil just to show how it works:

def collect(a, b, c, **kwargs):
    return a, b, c, kwargs


class Example(metaclass=collect):
    a = 1


Example
('Example',
 (),
 {'__module__': '__main__', '__qualname__': 'Example', 'a': 1},
 {})

First, notice that Example is not a class - it’s not an instance of type, it’s an “instance” of collect (which is just a function, so it’s the function return value). The first argument was the class name (very powerful feature of class definitions - they always have a name!); the second is the list of bases (we didn’t inherit from anything, so it’s empty), and the third is the new namespace (just a dict). The __module__ and the __qualname__ were sef for free for us, otherwise, it’s just the things we defined inside the class body. Finally, it can take keyword arguments - Python supports arbitrary keyword arguments when defining a class (except for metaclass=, which is taken for setting the metaclass), though that’s a very advanced usage. __prepare__ can handle these keyword arguments normally.

21.2.2. The attribute names#

There are several different ways to name attributes of your class or instance:

  • .some_item: This is a publicly accessible attribute, users are expected to see it / use it. It could be a method, instance member, or a property - properties can be settable or read-only.

  • ._some_item: This is a “hidden” method - by social construct, users should not touch this, it’s expected to be internal or “protected”. Tab completion will usually ignore things that start with an _, unless you add the underscore then press tab. A subclass can access this.

  • .__some_item: This is only (easily) accessible from inside the current class. Outside the class, including in a subclass, it’s “mangled” by adding the class name. This is not intended to be “more secure” than a hidden method, but is intended for the special case where a class and a subclass need to define the same thing but have it different between them. Very rarely used.

  • .__some_item__: This is a “dunder” attribute, and Python has many of them - they have special meaning to Python. It is not officially a good idea to add your own, as Python reserves them for future use, though some libraries (especially popular libraries) do add custom ones.

  • Extra: ._some_item_: If you need a “protocol”, that is, a method that users should implement but should not directly call, then single underscores on both sides is an occasional convention that is not discouraged by Python authors, unlike adding new dunder methods. _repr_html_ in IPython is an example of this convention in use.

21.2.3. MRO (Method Resolution Order)#

So, if you have an object obj, and you call obj.something, where does something come from, and is it a function or a variable? Hopefully, you know the answer to that last part of the question: there is no difference between a function and a variable; it’s an object, and objects might be callable. The lookup order is called the MRO, or Method Resolution Order. It is:

  1. The object itself. If the object has a "something" in __slots__ or __dict__, then that’s what you get.

  2. The class. If it does not exist in the object, then it is pulled from the class.

  3. The parents of the class. Then the parent (or parents) are checked. If there are multiple parents, the parents are checked left to right.

  4. The parent’s parents (recursively). Then the parent of parents are checked, all the way up until you hit object, which is the final parent and base for all classes.

You can see steps 2-4 using .__mro__ on the class:

F.__mro__
(__main__.F, object)

21.2.4. Descriptors#

When you find a object, what do you actually return? Python has an descriptor system that is called when you access a class member. If the object has a __get__ method (or __set__ if you are setting something), then that is called with the instance as the argument and the result is returned. This is how methods work - (all) functions have a __get__, so when they are inside a class, they get called with self first; they return a “bound” method, one that will always include the instance in the call. There is also a __delete__. While it’s technically part of class creation, __set_name__ is also very useful for descriptors.

Why have this when you already have __getattribute__, __setattr__, and __delattr__? Descriptors are defined on member of the class, rather than the class itself!

21.2.4.1. Common descriptors#

Descriptors are at work in several areas of the core Python syntax. Ever wondered why Class.x(instance) and instance.x() were the same? A function object has a __get__ that can tell if it’s running from a class or an instance. Functions inside classes are not actually special, all functions simply implement descriptors!

If you look at any function, you’ll see it has a __get__:

f.__get__
<method-wrapper '__get__' of function object at 0x7fb8c01c11b0>

Let’s just make a little descriptor that prints out what it sees when it gets used.

class CheckDescriptor:
    def __get__(self, obj, cls):
        print(f"{obj = }\n{cls = }")


class HasDescriptor:
    d = CheckDescriptor()


inst = HasDescriptor()

If we access the attribute from the class, the descriptor receives None for the instance, and the class.

HasDescriptor.d
obj = None
cls = <class '__main__.HasDescriptor'>

And if you trigger the descriptor from an instance, you instead get that instance as the first argument:

inst.d
obj = <__main__.HasDescriptor object at 0x7fb8c01caf80>
cls = <class '__main__.HasDescriptor'>

Now you can see how bound methods could be implemented via __get__ on functions.

Let’s try to make a little descriptor that returns it’s own name, but only when accessed from an instance.

class SelfName:
    def __set_name__(self, owner, name):
        self.name = name

    def __get__(self, obj, cls):
        return self if obj is None else self.name
class Holder:
    x = SelfName()
    y = SelfName()
Holder.x
<__main__.SelfName at 0x7fb8c01cbfd0>
Holder().x
'x'

One common use for getting the name is to make a instance variable that you can store data in.

Descriptors are also how @classmethod and @staticmethod are implemented. They have different descriptors over a usual function. There are several other descriptors in Python, including the features that make __slots__, __dict__, and super() work. But there’s one more major usage of descriptors you are likely to see a lot and even use: @property.

21.2.4.2. Property#

A property is an easy way to make something that looks like a member but actually calls methods when it is accessed, set, or even deleted. Here’s how you’d make this:

class Example:
    @property
    def x(self):
        return 42

    @x.setter
    def x(self, value):
        if value != 42:
            raise ValueError("Must be 42!")

    @x.deleter
    def x(self):
        raise ValueError("You can't delete 42!")
ex = Example()
ex.x
42

Feel free to play with this - ex.x can’t be deleted or set to anything other than 42! In practice, the functions are optional - most properties don’t have a deleter, and many of them don’t have a setter. See the Python docs for a great overview of descriptors and properties, including how you could implement the decorator yourself using descriptors.

21.2.5. Special methods are accessed only on the class#

Special methods are slightly special. If you perform an action that calls a special method, it explicitly calls the class, not the instance. So for example, f(3) calls F.__call__(f, 3), not f.__call__(3). This was mostly an optimization, but it means you can’t provide special behavior for an instance of a class that is not shared by the rest of the class. Which is probably good.

21.2.6. Accessing something dynamic#

Python lets you control this lookup with two special methods. One is __getattr__, which gets called if no attribute is found using the normal lookup method described above. You could use this to make a class that has dynamic methods beyond what it normally has. The second method is __getattribute__, which is called before anything else is looked up - you can literally do anything here, but at the cost that you have to then implement or revert to normal attribute lookup yourself to make the class usable, and simply getting at things like the data members in the class is a pain, since it will call itself. This should only be used in emergencies!

For setting, you don’t need this split, so there’s just __setattr__.

21.2.7. __dict__ vs. __slots__#

Most user classes are __dict__ classes, which store a dict on every instance. This is partially because it’s what you get by default, and partially because it fits with the philosophy of Python.

class Simple:
    a = 1

    def __init__(self):
        self.b = 2
s = Simple()
s.__dict__
{'b': 2}
s.c = 3
s.__dict__
{'b': 2, 'c': 3}

In the above example, “b” is in the instance’s __dict__. And so is c, once we add it. There’s no limit to what we can add to the dictionary, really. Every instance of Simple could have a completely different __dict__. This is really bad design, of course - all instances should have the same data members (which means you should always assign everything initially in __init__.

By the way, what’s s.a? It’s a class variable. It’s shared between all instances of the class, just like a method would be (remember, they are not different). You very rarely need them; they are much more like global variables than anything else. The good news is that if you write self.a = ..., then a is added to the __dict__, and due to the look up I listed above, that’s what gets used from now on for that instance, not the class variable. But, really, just don’t do it.

Slots classes were mostly a performance/memory addition; storing the keys for every instance was a huge waste. In modern Python, there are optimizations under the hood, and __dict__ classes do share keys when they can, but __slots__ still has a nice correctness feature. Let’s try the above with slots:

class Strict:
    __slots__ = ("a", "b")


s = Strict()
s.a = 1
s.b = 2
# s.c = 3
# s.__dict__

Now try to assign anything to the instance that is not in slots. It doesn’t work! No more mistakes misspelling a member; it just won’t work. And you can read the allowed members without digging into __init__! Notice there is no __dict__ now.

The process for __slots__ is rather weird, though. To avoid a __dict__, you need to have every class in the MRO to have __slots__. If any parent doesn’t have a slots, adding a slots would break it, so Python just ignores you if you inherit from a non-slots class. And to find a classes slots, you manually have to merge all __slots__ in the MRO, it doesn’t do it for you.

Conceptual question: Does object have __slots__?

Super advanced aside: Just because I had not seen this mentioned before, and I really needed it once: You can add "__dict__" (or "__weakref__") to a __slots__ list! This allows you to have a dictionary of items that can be added to for each instance and have a fixed set of items that are not in that dictionary on each instance too. Just in case you ever need something like that, you now can’t say you’ve never heard it before.

21.3. Uses for classes#

21.3.1. Data + functions#

They allow you to bundle data with the functions that run on them. In a language like Python without (much) type overloading, this is important for design (and good for tab completion).

class Vector:
    def __init__(self, x: float, y: float) -> None:
        self.x = x
        self.y = y

    def mag(self) -> float:
        return (self.x**2 + self.y**2) ** 0.5


Vector(3, 4)
<__main__.Vector at 0x7fb8c01cab30>

Note: developing a class that represents data has a bit of boilerplate involved, so attrs (third-party) or dataclasses (built-in for Python 3.7+, installable in 3.6) has a trick to make them much more easily, and automatically add things like nice repr’s. We’ll cover the parts of the syntax here later.

# Built in for Python 3.7+, install otherwise
from dataclasses import dataclass


@dataclass
class Vector:
    x: float
    y: float

    def mag(self) -> float:
        return (self.x**2 + self.y**2) ** 0.5


Vector(3, 4)
Vector(x=3, y=4)
v = Vector(3, 4)
v.mag()
5.0

If you use dataclasses, you even get automatic features from future Python versions, like pattern matching support, for free!

21.3.2. Functors#

I just told you it was a bad idea to set something outside your local scope, and often not even a good idea to just view something outside the local scope. So how do you write something that has scope? Use a class as a functor! Compare this:

_start = 0


def incr() -> int:
    global _start
    _start += 1
    return _start
incr()
1
incr()
2

With this:

class Incr:
    def __init__(self, incr: int = 0) -> None:
        self.incr = incr

    def __call__(self) -> int:
        self.incr += 1
        return self.incr
incr = Incr()
incr()
1
incr()
2

This is explicit, clear, I can have multiple instances without having them interfere, I can see exactly what’s going on without having to trace down a global, and I can even set the default value when I make the Incr instance!

21.3.3. Modularity#

Let’s say you have an algorithm with three parts. If you make a class that calls three member functions, you can allow a user to replace just one of the functions, and use the original first two. Classes are great for organization and code reuse because of this.

class RunSomethingHard:
    def part1(self) -> None:
        print("Working hard")

    def part2(self) -> None:
        print("Working harder")

    def part3(self) -> None:
        print("That was hard!")

    def run(self) -> None:
        self.part1()
        self.part2()
        self.part3()
inst = RunSomethingHard()
inst.run()
Working hard
Working harder
That was hard!

Now, look at how I can swap out part of the calculation without rewriting from scratch!

class NewRunSomethingHard(RunSomethingHard):
    def part2(self) -> None:
        print("Nah, this is easy")
inst = NewRunSomethingHard()
inst.run()
Working hard
Nah, this is easy
That was hard!

Don’t assume because something is on this list it is the best way to solve a problem, it’s just a way, and something you’ll likely commonly see (GEANT4 uses this heavily, for example). In this case, there may be other, better ways to share code depending on your problem. Think before you write!

21.3.4. eDSL (embedded Domain Specific Language)#

You can customize almost every behavior of a class to make them very natural for whatever you are doing.

class Path(str):
    def __truediv__(self, other: str):
        return self.__class__(f"{self}/{other}")
Path(".") / "myfile" / "program.py"
'./myfile/program.py'

Just in case you want to make a Path class like the one above - don’t, use pathlib instead. We could have written self.__class__ as Path, but then this would not subclass correctly and besides, using the class name inside the class is ugly and makes it harder to rename. If you return a normal string, then you can’t keep applying /.

Also, I left off the return type annotation for this example, as to do them properly I need to use a TypeVar.

21.3.5. Mixins (advanced)#

If you follow good practices, you can even make collections of behaviors and mix them into other classes - specifically, a mixin should not have an __init__ or any new data members. (The second requirement is more important than the first, if you are careful to use super(). Let’s rewrite the last example with mixins:

class PathMixin:
    def __truediv__(self, other: str):
        return self.__class__(f"{self}/{other}")


class Path(str, PathMixin):
    pass


Path(".") / "myfile" / "program.py"
'./myfile/program.py'

Statically typing mixins requires a Protocol, so I’ve left it off of this example too.

21.3.6. Other: ABC, Protocols, and more#

Other useful things to look into are ABCs (Abstract Base Classes) and Protocols. These let you a) require certain methods be implemented by users, and b) formalize “duck typing”. ABC’s are a run-time feature, and kind of half-broken; you have to instantiate an ABC class to get the benefit of the checking. Protocols are a “type-check time” feature, and are far better and don’t require special inheritance, but only are enforced by type checkers (see a later section!).

21.4. Design considerations#

Object oriented programming has been known to make it easy to create spaghetti messes of code. The following tips will help you not fall into the trap and end up with poorly designed code. “Make it a class”, by itself, will not magically make your code better.

21.4.1. Modular design#

You should break your code into concepts, and classes should help map those concepts to the computer. Different components of a detector might be classes, with an instance for each component. A vector, a URL, a remote data source, etc. You might have a class representing a unit of an analysis, and use either inheritance (okay) or a Protocol (better, reduces coupling) to have real data processing vs. simulation generation. Etc.

21.4.2. Unit test#

We will mention testing later, and there is another course on it, but I’m focusing on the word unit. You need to be able to run your classes standalone, in unit tests, and not only in place. This keeps the design modular - you will resist the desire to make a class that needs a class to make another class inside a class that only works with the file that sits on your work laptop, etc. And you’ll be free to redesign parts without having to worry about everything breaking down.

Always use pytest for unit testing.

21.4.3. Inheritance#

You should be very cautious with inheritance. It’s a very powerful tool and should not be scary, but it’s so easy to misuse. These are the problems:

  • It is often in direct contrast with modular design (you are linking things via inheritance) and hard to truly unit test (one class might not be testable without the other).

  • It makes it harder to reason about where a method comes from when reading and debugging. Is it this class? A parent?

However, it is also very useful. Read this article (which might be a little overly-harsh on subclassing, but it has good points) and ask yourself: Do you need inheritance? Would composition or a Protocol work better? Certainly, many times, it is fine, but don’t let the other considerations get swept aside if you use it.

21.4.4. Functional design tips#

Classes with multiple states are bad: If you have classes that init, then call some function like .initialize() or .read_data(), this is bad design. It’s better to do the read in the __init__ or as a classmethod. If you have two distinct states, then you have to design every method to be aware of both possible states! It’s much better to either do the init at the beginning, or have two classes, one for each state.

Never assign new members outside of init: This is similar to the above point; if you assign a new member after init, then you now have two states; a “without” and a “with”, so now you have to check hasattr in every other method, since you can’t be sure what order the methods will be called in. Better to at least assign None in __init__. Or use dataclasses/attrs.

Consider full functional design: Python is object oriented, but it does support functional design, and it’s worth thinking about it. Functional design has functions that are stateless (no mutable state), so you can chain them. So instead of:

a = Thing()
a.init()
a.cleanup()

You would use:

a = Thing().init().cleanup()

The only reason the first example could be better is due to memory usage/copying, which usually isn’t a problem in Python. If it’s an error to call cleanup() before init(), you can enforce that statically via classes in the second example, while the first example will just crash at runtime.

Support static design wherever possible: We will cover this later, but if you add things that are not statically sound, you probably should redesign. Simply supporting static typing tends to push you toward better design, and away from problems like those listed above.