1. Intro to classes#

Your first step into intermediate Python begins with classes. Classes are at the core of Python: everything is an object in Python, which means everything has a class. Even built-in objects written in C are still Python objects with classes.

1.1. What is an object?#

An object is simply a collection of data and functions that operate on that data.

For example, let’s say we wanted to represent our home directory as an object. It might look something like this:

home_directory:
  string_location = "/home/me"
  exists(self) -> bool

This object holds a single data “member” (string_location), and has a function, called a “method”, to see if the directory exists.

We could produce lots of these, each with different string_location values, and we could use them in our code to track directories and see if they exist. All of these objects are interchangeable, and all of them have identical functions - only the contents of the data are different. This suggests we could make a further improvement to the model. (Unless we were in JavaScript, by the way, where this really is how objects were implemented!)

1.2. What is a class?#

Now we will make a “template” for creating new objects; called a class.

import os


class Path:
    def __init__(self, string_location):
        self.string_location = string_location

    def exists(self):
        return os.path.exists(self.string_location)
usr_bin = Path("/usr/bin")
usr_bin.exists()
True

The __init__ method is special to Python: if you “call” the class, Python will create a new instance of the class, then call it’s __init__ method, passing in the new, empty instance into “self”. Inside this method, we add string_location to self.

Notice that Python automatically knows that calling a method on a class instance should pass the instance as the first arguments. We could have written this instead, which is identical:

Path.exists(usr_bin)
True

But it’s a lot more convenient and concise to call it on the instance itself.

The thing called usr_bin only caries the data we’ve assigned to it:

usr_bin.__dict__
{'string_location': '/usr/bin'}

It remembers its class, though:

usr_bin.__class__
__main__.Path

When you try to access attributes, it checks, the object first, then it tries the class:

usr_bin.__class__.__dict__
mappingproxy({'__module__': '__main__',
              '__init__': <function __main__.Path.__init__(self, string_location)>,
              'exists': <function __main__.Path.exists(self)>,
              '__dict__': <attribute '__dict__' of 'Path' objects>,
              '__weakref__': <attribute '__weakref__' of 'Path' objects>,
              '__doc__': None})

There’s some autogenerated stuff in there, but you can see exists is there too!

1.2.1. Advanced: subclassing#

Why stop there? It’s often useful to organize in further levels. This is accomplished by subclassing - a class can be “based on” another class - what this means is that the most specific class is checked first, but then it keeps going up the chain. This lookup mechanism is called the mro (method resolution order), and you can check it explicitly:

usr_bin.__class__.__mro__
(__main__.Path, object)

All classes are eventually subclasses of object - the last item in this list. That’s where the default behaviors come from.

1.3. Simpler: dataclasses#

If you come from a compiled language, the syntax for making a class might be unusual for you. You might be more used to simply listing the members and methods together, something like this:

class BadPath:
    string_location = ...

    def exists(self):
        return os.path.exists(self.string_location)

Question: Why is this wrong?

Answer: This member variable is on the class. That means all BadPath’s would have the same string_location! We also don’t know what to assign it too (it should be assigned when you make an instance for an instance variable).

But… Wouldn’t it be nice if we didn’t have to be so repetitive? Well, we can have the best of both worlds:

import dataclasses


@dataclasses.dataclass
class DataPath:
    string_location: str

    def exists(self):
        return os.path.exists(self.string_location)

We just add a decorator (we’ll cover these later, for now it’s just a marker that processes this class into the correct output. In fact, that’s kind of always what they are). We also add a type annotation since Python doesn’t allow a variable declaration without at least a type annotation or a value. The type annotation just tells the reader (not Python) what type this expects. Python still doesn’t care what you really do with this.

Now, we get an __init__ for free!

DataPath("/usr/local")
DataPath(string_location='/usr/local')

In fact, we got a lot more for free. Notice now nicely it printed out? Compare that to our old class:

Path("/usr/local")
<__main__.Path at 0x7f4c9ee69ff0>

That’s the default object repr, which just tells you the __class__.__name__ and memory location (ugh), instead of something more helpful. We would have had to do a lot more work to make a nice class with the vanilla syntax!

There are a lot of useful options in dataclasses that can help you make useful classes; here are most of them:

  • init: Make a init function (default: True)

  • repr: Make a nice repr (default: True)

  • eq: Allow equality (default: True)

  • order: Allow comparisons (default: False)

  • frozen: Disallow mutation (default: False)

  • slots: Keep the class from accepting new members (Python 3.10+, default: False, slots classes have no __dict__)

  • kw_only: Do not allow pass-by position (Python 3.10, default False, frees up subclassing a lot)

  • match_args: Support Python 3.10 pattern matching via position (Python 3.10, default: True)

You can also control each attribute (field in dataclass terms) with options, and you can specify __post_init__, which runs after the generated __init__.

If you like dataclasses, feel free to check out attrs, which inspired dataclasses and is a little more powerful, cattrs, which handles conversions for both the stdlib dataclasses and attrs, as well as pydantic, which is an all-in-one solution for data conversion and validation too, but less flexible.

1.4. Using classes#

Let’s look at a built-in class, int:

my_int = int(3)

Since this is so common, there’s a built in shortcut for this - we could have used my_int = 3 directly - Python turns numbers into integers when it sees them. We can call methods, too:

my_int.bit_length()
2

It takes 2 bits to be able to represent this integer. Python uses many more than that, but this is useful information about integers.

Note: you cannot write 3.bit_length(); due to the Python parser, this is invalid syntax due to Python thinking you’ve started writing a float. You can, however, do this with a float. 2.0.is_integer() is valid, for example. As is (2).bit_length().

1.5. Special methods#

We can’t go very far without writing a special method - __init__ we’ve already seen. Python has a lot of special methods that have double underscores before and after the name - called “dunder methods”. These customize all sorts of things about the class. Let’s try to make our “plain” class look more like our “dataclass”:

class Path:
    def __init__(self, string_location):
        self.string_location = string_location

    def exists(self):
        return os.path.exists(self.string_location)

    def __repr__(self):
        return f"{self.__class__.__name__}(string_location={self.string_location!r})"
Path("/usr/local")
Path(string_location='/usr/local')

This looks like our DataPath now! We’ve customized what the “representation” of the object of this class look like. We could also separately control what the string representation (__str__) look like, which will allow the printed form and the REPL form to look different, which is a really nice feature of Python missing from some other languages like Matlab. repr is usually programmer friendly, and str is usually user friendly.

You can also control what most operators do on the class, like comparison:

DataPath("/usr/local") == DataPath("/usr/local")
True
Path("/usr/local") == Path("/usr/local")
False

Yeah, like __repr__, dataclasses generated a reasonable default __eq__ method for us, while the vanilla class just falls back on object’s __eq__, which checks to see if the objects share the same memory (which these do not).

We could add this manually. Let’s use inheritance to add it, since we are ~~lazy~~ good programmers and don’t like repeating ourselves:

class EqPath(Path):
    def __eq__(self, other):
        return self.string_location == other.string_location
EqPath("/usr/local") == EqPath("/usr/local")
True

What if we wanted to sort paths alphabetically? Neither of our class families support it out of the box:

# sorted([EqPath("/loc/a"), EqPath("/loc/b")])
# sorted([DataPath("/loc/a"), DataPath("/loc/b")])

Adding this would require adding __lt__, or dataclasses to the rescue again with order=True):

import dataclasses


@dataclasses.dataclass(order=True)
class DataPath:
    string_location: str

    def exists(self):
        return os.path.exists(self.string_location)
print(*sorted([DataPath("/loc/b"), DataPath("/loc/a")]), sep="\n")
DataPath(string_location='/loc/a')
DataPath(string_location='/loc/b')

How does this work? It treats these like a tuple when sorting: the first field is sorted first.