Python Object Model

20. Python Object Model#

This is an advanced section and is available for reference.

20.1. Everything in Python is an object!#

Okay, technically a PyObject* in CPython - we’ll focus on CPython most of the time today. Most other implementations (especially PyPy) are pretty similar.

An object is an “instance” of a “type”, or a “class”, which describes what that sort of object does.

Types and basic objects have some optimizations in CPython, for speed and also to keep the language from being infinitely recursive - but they are still PyObject*’s.

Also, “built-in” classes are a little special.

v = 4

Even a simple int has methods:

v.bit_length()

Let’s define a function (I’ll be using proper Python types when simple to add, we will cover typing later):

def f(x: int) -> int:
    return x**2

f(4)

This definition really is an assignment, a new function is made and assigned to f. But, since the assignment is not arbitrary, that is, it clearly has a name, functions remember their name!

f.__name__

'f'

f, v = v, f
# What is the v now? f? How about the name?

Remember: everything is an object. Functions are “First Class” objects in Python, meaning they behave exactly like any other object. All objects are “First Class”, there are no lesser or “other” type of objects in Python. Functions just happen to be callable. Other objects can be callable, too, depending on their class and the presence of a __call__ method in it.

20.2. Mutability#

Many built-in objects are immutable, so it can hide the fact that Python does not have the concept of “pass by value”; the labels you see really are all pointing to PyObject*’s that are being managed by Python’s garbage collector.

If you write x = y, then x and y refer to the same object. Always. This can’t be overridden. However, it’s hard to see that if you can’t mutate the value, so there’s no “side effect” to this. Side effects only happen for mutable objects.

x = 3
y = x
x += 1
print("What is {y = }?")

What is {y = }?

Add an f before the string to see the answer.

bool, int, float, str, bytes, tuple, and frozenset are immutable built-ins in Python. Singletons (like None, Ellipsis, True, and False) are immutable, too. Now, let’s try a mutable object. list, set, dict, and generic classes/objects are mutable.

x = [3]
y = x
x[0] += 1
print("What is {y = }?")

What is {y = }?

Why?

The problem was that when the object was immutable, it does not define in-place operations. Inplace operators like += actually fall back to out-of-place operations and assignment, like x = x + 1; they create new objects. When it was a mutable object (a list), that does have in-place operations, so it was able to change it in-place.

Here’s a quick example, showing the fall-back behavior of inplace operations if __iadd__ is missing:

class Addable:
    def __init__(self, value: int) -> None:
        self.value = value

    # Leaving off the return type to avoid discussing it here
    def __add__(self, number: int):
        return Addable(self.value + number)


x = Addable(3)
y = x
x += 4
x is y

False

20.2.1. Advanced aside: Why did list inplace addition work?#

Quick aside for advanced Pythonistas, this is tricky. x[0] returns an int. So why is this any different than before? Let’s explore, using mock:

from unittest.mock import MagicMock

ListProxy = MagicMock("ListProxy")
y = ListProxy()
y[0] += 1
rich.print(y.mock_calls)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 6
      4 y = ListProxy()
      5 y[0] += 1
----> 6 rich.print(y.mock_calls)

NameError: name 'rich' is not defined

You can see that Python has special support for this syntax, it pulls out the item inside, and sets it, then stores it. There are other special syntax treatments in Python as well, all designed to make the language more friendly and powerful:

1 < 2 < 3

True

print(not 3 in {1, 2, 3})
print(3 not in {1, 2, 3})

False
False

Can we glean anything from the original example, though? Yes, assignment is special. There are only four forms of normal assignment:

x = y = 1    # Can't be overridden
x, y = 4, 5  #  ditto, tuple assignment
x[y] = 2     # __setitem__ 
x.y = 3      # __setattr__

That’s it. These are not valid assignments in Python (they are valid in the C family, for example):

x(y) = 1    # There is no assignment for __call__
x + y = 2   # Arbitrary expressions are not allowed
(x = 2) + 2 # Can't be nested
# So on.

This was limiting, so Python 3.8 added a nestable assignment, := (walrus operator), though it was somewhat controversial. It does not work anywhere that normal = works, to avoid confusion.

Inplace assignment operators follow the same rules, so they are “special”, and not allowed anywhere.

20.3. Scope#

Python has the concept of scope, but not in very many places. Functions and class definitions create scope, and modules have scope. (Generator expressions have scope too in Python 3, though they didn’t before). That’s about it. So you can write this:

if True:
    x = 3
print(x)

If if had scope, then x would not be accessible outside the if. This is simple and useful, but you have to be careful to stay clean and tidy. For example, if that if was False, this would suddenly break. It’s valid to use the loop variable after a loop ends, etc. There’s really not much scope at all!

Because it shows up in so few places, and is so close to automatic, in can bite you once in a while if you don’t keep it in mind:

x = 1

def f() -> None:
    print(x)

f()
print(x)

1
1

But we will try an assignment:

x = 1


def f() -> None:
    x = 2
    print(x)


f()
print(x)

2
1

So x in the function is not the same x out of the function now! (Try printing before assigning to it, or try changing it inplace. Even better, what happens if it is mutable?)

If you want a rule to put in your pocket:

Accessing a variable uses the first variable it finds going up in scope
Setting a variable always uses the local scope

If this is really what you want, you can use nonlocal to access a variable one-scope-up but not global, or global to declare a variable with global scope.

x = 1


def f() -> None:
    global x
    x = 2
    print(x)


f()
print(x)

2
2

Need a practical rule?

Always pass variables out explicitly, be cautious with using anything not clearly global in a function.

This means you should never see global, as it’s only needed for setting variables. Global read-only variables (the only safe kind) are sometimes ALL_CAPS. (Hint: for typed code, you can add Final).

For example, a better way to write the above function:

x = 1

def f(x: int) -> int:
    return 2

x = f(x)
print(x)

Now it’s completely explicit, at the call site, that it’s going to modify x. And, as a bonus, you could even use it on other variables now, not just x!

20.4. Memory and the Garbage Collector#

Python is a garbage collected language. All Python objects have a refcount, which tells the garbage collector how many “labels” or how many ways you can access that object. When you can’t get to the object anymore, the garbage collector has the right to remove the object. It usually, depending on settings, runs roughly once per line, but it doesn’t have to.

As a consequence, never depend on an object being deleted to perform some action at some time. More on that later, when we cover context managers.

Let’s look at it:

import gc
import sys


class Boom:
    def __del__(self) -> None:
        print("Boom!")


ob = Boom();

gc.is_tracked(ob)

True

sys.getrefcount(ob)

del ob

Boom!

If all went well, you probably should have seen “Boom” above. Now let’s try a variation:

ob = Boom()
ob

<__main__.Boom at 0x7f373c69edd0>

del ob

Notice anything? Probably not. Objects only get deleted when the refcount goes to 0 (okay, technically 1, since the GC holds a reference to it too; otherwise it wouldn’t know what to delete. So it goes to 0 as it gets deleted by the gc). IPython tracks outputs; you can access all of them:

Out[25]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[24], line 1
----> 1 Out[25]

KeyError: 25

There it is! Since we can access it, it’s not collectible. (In fact, we can now access it from two Out’s! )

That’s just one case where an object doesn’t get garbage collected until the final cleanup. You can also turn the garbage collector off, it could be running at a different setting, etc.

Also, during the cleanup phase at the end, everything starts getting deleted. So even modules might not be safe to call in cleanups. You have been warned.

Don’t use __del__ in almost all cases, except for emergency cleanup. There is a better way if you want “action at a distance”, as I like to call it; we will see it when we get to context managers.

20.5. Importing#

Caveat: A few minor details below will not be quite right for namespace packages, which do not contain an __init__.py. They are not heavily used, and designed for a very specific purpose which we won’t cover (combining packages physically sitting in different places). And the only change is I’d have to be much more wordy but would still basically be saying the same things.

20.5.1. Importing Basics#

import a.b       # line 1
from a import b  # line 2

So, question: is b a module (b.py), a package (b/__init__.py), or an object if line 1 is valid? How about 2?

The rule: The left part must be a package or module. The right part of a from ... import ... statement can be anything.

All packages that appear in this statement (separated by dots) have their __init__.py run unconditionally.

Suggestion: The from a import b syntax should be reserved for b being an object. It’s more confusing if it’s a module/package. This syntax is usually best reserved for CamelCase classes or maybe very commonly used functions. Both syntaxes support a final as clause to rename the thing being imported - so import a.b as b will clearly tell your reader that b must be a module.

So, a second question: is this valid?

import a
a.b

If b is an object in a, then it is. If b is a package or module, this only works it imported in __init__.py (in which case it is an object in a).

20.5.2. Circular imports#

One dreaded message in Python is the circular import failure. This happens when you have something like this:

# chicken.py

from egg import make_egg

def make_chicken():
    return make_egg()

# egg.py

from chicken import make_chicken

def make_egg():
    return make_chicken()

Obviously, you’ll have a problem when you call this, but you can’t even load one of these files! You get a circular import error.

To solve it, import modules/packages instead of objects. This is fine:

# chicken.py

import egg

def make_chicken():
    return egg.make_egg()

# egg.py

import chicken

def make_egg():
    return chicken.make_chicken()

The bodies of the functions are not run on import, so when egg imports chicken tries to import egg, the fact that it’s not a complete module yet is not a problem. Just accessing anything inside egg would be, but that happens in the function definition, which doesn’t run yet.

Now, you still have a chicken and egg problem if you run it, but at least you can import the files.

20.5.3. Where do imports come from?#

If I import a, where is a? Python works down sys.path:

sys.path starts with "." - be careful about using this, and it does override system site packages
Then the standard library follows
Then the system site packages and user site packages

So a package containing package/sys.py is fine, even package/__init__.py will be able to import the normal sys, and has to import package.sys or .sys to get the local sys.py.

You may see from __future__ import absolute_import. This was from Python 2, causing the import system to work like Python 3 and prioritize the system site packages over local packages. With the default behavior, Python 2 could not access the system modules if a local module had the same name!

20.5.4. Relative imports#

from .sys import my_function

These all you to avoid writing the name of your library again, but have some drawbacks. They only work in the syntax above (from), they only work when running as a package (so a __init__.py), you can’t use them in a non-library __name__ == "__main__" file that you directly run (but python -m ... runs are fine). If you are avoiding circular imports, this can be tricky to use, since you need to import modules, and putting the module on the far right is not ideal.

The main reasons to use this are to allow your library to be renamed easily - one place this might happen is in vendoring. Most normal libraries do not need to support vendoring; only libraries that are used by a tool like pip need to support it.