20. Python Object Model#
This is an advanced section and is available for reference.
20.1. Everything in Python is an object!#
Okay, technically a
PyObject*
in CPython - we’ll focus on CPython most of the time today. Most other implementations (especially PyPy) are pretty similar.
An object is an “instance” of a “type”, or a “class”, which describes what that sort of object does.
Types and basic objects have some optimizations in CPython, for speed and also to keep the language from being infinitely recursive - but they are still PyObject*’s.
Also, “built-in” classes are a little special.
v = 4
Even a simple int has methods:
v.bit_length()
3
Let’s define a function (I’ll be using proper Python types when simple to add, we will cover typing later):
def f(x: int) -> int:
return x**2
f(4)
16
This definition really is an assignment, a new function is made and assigned to f
. But, since the assignment is not arbitrary, that is, it clearly has a name, functions remember their name!
f.__name__
'f'
f, v = v, f
# What is the v now? f? How about the name?
Remember: everything is an object. Functions are “First Class” objects in Python, meaning they behave exactly like any other object. All objects are “First Class”, there are no lesser or “other” type of objects in Python. Functions just happen to be callable. Other objects can be callable, too, depending on their class and the presence of a __call__
method in it.
20.2. Mutability#
Many built-in objects are immutable, so it can hide the fact that Python does not have the concept of “pass by value”; the labels you see really are all pointing to PyObject*’s that are being managed by Python’s garbage collector.
If you write x = y
, then x and y refer to the same object. Always. This can’t be overridden. However, it’s hard to see that if you can’t mutate the value, so there’s no “side effect” to this. Side effects only happen for mutable objects.
x = 3
y = x
x += 1
print("What is {y = }?")
What is {y = }?
Add an
f
before the string to see the answer.
bool
, int
, float
, str
, bytes
, tuple
, and frozenset
are immutable built-ins in Python. Singletons (like None
, Ellipsis
, True
, and False
) are immutable, too.
Now, let’s try a mutable object. list
, set
, dict
, and generic classes/objects are mutable.
x = [3]
y = x
x[0] += 1
print("What is {y = }?")
What is {y = }?
Why?
The problem was that when the object was immutable, it does not define in-place operations. Inplace operators like +=
actually fall back to out-of-place operations and assignment, like x = x + 1
; they create new objects. When it was a mutable object (a list), that does have in-place operations, so it was able to change it in-place.
Here’s a quick example, showing the fall-back behavior of inplace operations if __iadd__
is missing:
class Addable:
def __init__(self, value: int) -> None:
self.value = value
# Leaving off the return type to avoid discussing it here
def __add__(self, number: int):
return Addable(self.value + number)
x = Addable(3)
y = x
x += 4
x is y
False
20.2.1. Advanced aside: Why did list inplace addition work?#
Quick aside for advanced Pythonistas, this is tricky. x[0]
returns an int. So why is this any different than before? Let’s explore, using mock:
from unittest.mock import MagicMock
ListProxy = MagicMock("ListProxy")
y = ListProxy()
y[0] += 1
rich.print(y.mock_calls)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 6
4 y = ListProxy()
5 y[0] += 1
----> 6 rich.print(y.mock_calls)
NameError: name 'rich' is not defined
You can see that Python has special support for this syntax, it pulls out the item inside, and sets it, then stores it. There are other special syntax treatments in Python as well, all designed to make the language more friendly and powerful:
1 < 2 < 3
True
print(not 3 in {1, 2, 3})
print(3 not in {1, 2, 3})
False
False
Can we glean anything from the original example, though? Yes, assignment is special. There are only four forms of normal assignment:
x = y = 1 # Can't be overridden
x, y = 4, 5 # ditto, tuple assignment
x[y] = 2 # __setitem__
x.y = 3 # __setattr__
That’s it. These are not valid assignments in Python (they are valid in the C family, for example):
x(y) = 1 # There is no assignment for __call__
x + y = 2 # Arbitrary expressions are not allowed
(x = 2) + 2 # Can't be nested
# So on.
This was limiting, so Python 3.8 added a nestable assignment,
:=
(walrus operator), though it was somewhat controversial. It does not work anywhere that normal=
works, to avoid confusion.
Inplace assignment operators follow the same rules, so they are “special”, and not allowed anywhere.
20.3. Scope#
Python has the concept of scope, but not in very many places. Functions and class definitions create scope, and modules have scope. (Generator expressions have scope too in Python 3, though they didn’t before). That’s about it. So you can write this:
if True:
x = 3
print(x)
3
If if
had scope, then x would not be accessible outside the if. This is simple and useful, but you have to be careful to stay clean and tidy. For example, if that if
was False
, this would suddenly break. It’s valid to use the loop variable after a loop ends, etc. There’s really not much scope at all!
Because it shows up in so few places, and is so close to automatic, in can bite you once in a while if you don’t keep it in mind:
x = 1
def f() -> None:
print(x)
f()
print(x)
1
1
But we will try an assignment:
x = 1
def f() -> None:
x = 2
print(x)
f()
print(x)
2
1
So x in the function is not the same x out of the function now! (Try printing before assigning to it, or try changing it inplace. Even better, what happens if it is mutable?)
If you want a rule to put in your pocket:
Accessing a variable uses the first variable it finds going up in scope
Setting a variable always uses the local scope
If this is really what you want, you can use nonlocal
to access a variable one-scope-up but not global, or global
to declare a variable with global scope.
x = 1
def f() -> None:
global x
x = 2
print(x)
f()
print(x)
2
2
Need a practical rule?
Always pass variables out explicitly, be cautious with using anything not clearly global in a function.
This means you should never see global
, as it’s only needed for setting variables. Global read-only variables (the only safe kind) are sometimes ALL_CAPS. (Hint: for typed code, you can add Final
).
For example, a better way to write the above function:
x = 1
def f(x: int) -> int:
return 2
x = f(x)
print(x)
2
Now it’s completely explicit, at the call site, that it’s going to modify x
. And, as a bonus, you could even use it on other variables now, not just x
!
20.4. Memory and the Garbage Collector#
Python is a garbage collected language. All Python objects have a refcount, which tells the garbage collector how many “labels” or how many ways you can access that object. When you can’t get to the object anymore, the garbage collector has the right to remove the object. It usually, depending on settings, runs roughly once per line, but it doesn’t have to.
As a consequence, never depend on an object being deleted to perform some action at some time. More on that later, when we cover context managers.
Let’s look at it:
import gc
import sys
class Boom:
def __del__(self) -> None:
print("Boom!")
ob = Boom();
gc.is_tracked(ob)
True
sys.getrefcount(ob)
2
del ob
Boom!
If all went well, you probably should have seen “Boom” above. Now let’s try a variation:
ob = Boom()
ob
<__main__.Boom at 0x7f373c69edd0>
del ob
Notice anything? Probably not. Objects only get deleted when the refcount goes to 0 (okay, technically 1, since the GC holds a reference to it too; otherwise it wouldn’t know what to delete. So it goes to 0 as it gets deleted by the gc). IPython tracks outputs; you can access all of them:
Out[25]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[24], line 1
----> 1 Out[25]
KeyError: 25
There it is! Since we can access it, it’s not collectible. (In fact, we can now access it from two Out’s! )
That’s just one case where an object doesn’t get garbage collected until the final cleanup. You can also turn the garbage collector off, it could be running at a different setting, etc.
Also, during the cleanup phase at the end, everything starts getting deleted. So even modules might not be safe to call in cleanups. You have been warned.
Don’t use __del__
in almost all cases, except for emergency cleanup. There is a better way if you want “action at a distance”, as I like to call it; we will see it when we get to context managers.
20.5. Importing#
Caveat: A few minor details below will not be quite right for namespace packages, which do not contain an __init__.py
. They are not heavily used, and designed for a very specific purpose which we won’t cover (combining packages physically sitting in different places). And the only change is I’d have to be much more wordy but would still basically be saying the same things.
20.5.1. Importing Basics#
import a.b # line 1
from a import b # line 2
So, question: is b
a module (b.py
), a package (b/__init__.py
), or an object if line 1 is valid? How about 2?
The rule: The left part must be a package or module. The right part of a from ... import ...
statement can be anything.
All packages that appear in this statement (separated by dots) have their __init__.py
run unconditionally.
Suggestion: The
from a import b
syntax should be reserved forb
being an object. It’s more confusing if it’s a module/package. This syntax is usually best reserved for CamelCase classes or maybe very commonly used functions. Both syntaxes support a finalas
clause to rename the thing being imported - soimport a.b as b
will clearly tell your reader thatb
must be a module.
So, a second question: is this valid?
import a
a.b
If b
is an object in a
, then it is. If b
is a package or module, this only works it imported in __init__.py
(in which case it is an object in a
).
20.5.2. Circular imports#
One dreaded message in Python is the circular import failure. This happens when you have something like this:
# chicken.py
from egg import make_egg
def make_chicken():
return make_egg()
# egg.py
from chicken import make_chicken
def make_egg():
return make_chicken()
Obviously, you’ll have a problem when you call this, but you can’t even load one of these files! You get a circular import error.
To solve it, import modules/packages instead of objects. This is fine:
# chicken.py
import egg
def make_chicken():
return egg.make_egg()
# egg.py
import chicken
def make_egg():
return chicken.make_chicken()
The bodies of the functions are not run on import, so when egg imports chicken tries to import egg, the fact that it’s not a complete module yet is not a problem. Just accessing anything inside egg would be, but that happens in the function definition, which doesn’t run yet.
Now, you still have a chicken and egg problem if you run it, but at least you can import the files.
20.5.3. Where do imports come from?#
If I import a
, where is a
? Python works down sys.path
:
sys.path
starts with"."
- be careful about using this, and it does override system site packagesThen the standard library follows
Then the system site packages and user site packages
So a package containing package/sys.py
is fine, even package/__init__.py
will be able to import the normal sys
, and has to import package.sys
or .sys
to get the local sys.py
.
You may see
from __future__ import absolute_import
. This was from Python 2, causing the import system to work like Python 3 and prioritize the system site packages over local packages. With the default behavior, Python 2 could not access the system modules if a local module had the same name!
20.5.4. Relative imports#
from .sys import my_function
These all you to avoid writing the name of your library again, but have some drawbacks. They only work in the syntax above (from
), they only work when running as a package (so a __init__.py
), you can’t use them in a non-library __name__ == "__main__"
file that you directly run (but python -m ...
runs are fine). If you are avoiding circular imports, this can be tricky to use, since you need to import modules, and putting the module on the far right is not ideal.
The main reasons to use this are to allow your library to be renamed easily - one place this might happen is in vendoring. Most normal libraries do not need to support vendoring; only libraries that are used by a tool like pip
need to support it.