Previous rubric:
Style
- Include page numbers? 10 pts
- Fonts legible? 5 pts
- Some reasonable figures/images? 10 pts
- Reasonable amount of text per slide (not too much)? 5 pts
- Attractive slides? 5 pts
- Multiple obvious grammatical/spelling errors? 10 pts
Content
- Intro
- Clear to non-expert? 15 pts
- Explains why the audience should be interested? 15 pts
- Plans/results
- Clear description of what was done or will be done? 15 pts
Speaking
- Stayed on time? (UG) 5 pts
- Avoid directly reading slides? 5 pts
Final presentation:
Sign up for a time slot!
Parts
- Title slide with your name.
- Intro (similar to before, 2-3 slides)
- Procedures (discussion of what was done, 2-3 slides)
- Results (discussion of what was accomplished, 2-3 slides)
- Future plans (what can be done, 1 slide)
- Conclusion (Quick recap in 1 slide)
- References (1 slide)
- Backup (anything)
Guides
- Have at least 2 images. More is better.
- You must have slide numbers.
- Stay on time. You have 10 minutes, plus ~2 minutes for questions. If you go over, you will have to stop.
- Keep text per slide down - The 1-6-6 rule is too strict, but keep it in mind. Maybe see this page.
- Think visual - how do you make your content clear and have an impact?
Project report
Format
- You will "submit" a git repository with your code.
- It does not have to be in git. But recommended. If not, you should be able to send a zip file with it.
- Should be reproducible - if I have access to your data, I should be able to run your code and get your results.
- Structure should be similar to recommended structure in class
- Must have 1+
.py
files with implementations - Should have notebook(s) or python scripts that run the library parts
- Should have a list of some sort of requirements / used packages
- Data does not have to be included
- Must have 1+
- Two documents required:
- README.md (or similar): basic instructions to someone who wants to run your code
- Writeup as notebook or document: see below
Writeup
- Report can be written in a notebook (with or without a bit of example code)
- Should contain similar sections the slides (so see above)
- Should be 2-4 pages in length, not counting code.
Reproducing your environment
Several methods:
setup.py
: make an installable package- Good for libraries, but doesn't keep exact version numbers
- Harder to setup
environment.xml
: list of packages for Conda- Still only one version
- Requires Conda
requirements.txt
: List of PyPI packages- Directly supported by Pip
- Only PyPI packages
Pipfile
andPipfile.lock
: list of packages for Pipenv- Stores "nice" and "reproducible" version
- Always and automatically generated by Pipenv
- Only PyPI packages
Example: Pipenv
pipenv install numpy uproot
This creates a virtual environment, a Pipfile
and a Pipfile.lock
. Pipfile will list general, explicit dependencies. Pipfile.lock will list all installed packages, including requirement from pip, and will have exact versions (and is an ugly file). You can do a few extra things in the Pipfile too, like set up cross-dependencies (install this back-ported package only on old Python, for example), require a Python version, and more.
Example: PvFinder
This is a project I'm working on. Notice several things:
- Pipfile: I don't actually use this (I'm using Conda) but I'm still tracking dependencies
- README.md: A nice, markdown readme is in several directories
model/
: The Python code is in here. I picked a bad name, but I am stuck with it.notebooks/
: Most of the "user" code is here, with explanations and plots. I clear the notebooks before committing them.scripts/
: Files in here are run directly from the shell.tests/
: I have only a few tests, but few is better than none.- Data is stored in another location, but symlinked in.
- I make large changes in a branch, then make a Pull Request, keeps nice history and helps collaborators
Python language features
We mostly skipped more advanced features or features that are really new.
- Lambda functions: You can just write regular functions.
- Advanced iterators/generators: Nice, but not critical for scientific code.
- Async/Await: I've been trying for years to use this in scientific code. Haven't found a use yet. Maybe IPython 7's new support will make it useful for animations? Very much a Python 3 feature.
- Static type hints: Nice tools (editors like PyCharm, static analyzer like MyPy) could make this useful, but it adds lots of extra bits of code and is very much a Python 3 feature. Can slow your code down a bit (better in Python 3.7 and will be even better in Python 4)
- How to write a decorator: Very tricky to get just right.
- Metaclasses and advanced class creation: Even if you think you need this, you probably don't.
- Threading: Python really can't do faster computation in multiple threads; you should use something like Numba to do so. Regular threading (and async) is designed for IO, networking, etc; places where you sit and wait.
- Python 2: Even though many experiments are still stuck on Python 2, it is a dying language. Next year, many more libraries will drop support from new releases. Most of what you know can be used or adapted to Python 2 if you have to.
x : int = 0 # A static code analysis tool will complain if you set x to a non-int value.
Related topics
- Shells: we didn't cover shells - it's a key part of working on any Unix system.
- Several old shells exist, like SH (1979) and CSH (1978). Don't use them.
- Bash) (1989) is the most common shell. It's what people mean by a shell most of the time. It's default pretty much anywhere, including macOS.
- New shells are fancier or more user friendly. ZSH is Bash on steroids, FISH is a re-imagining of a shell for the 1990s (that's considered new for shells), and Xonsh is a shell written in Python 3.
- Containers: like Docker
- Run a pristine custom Linux environment anywhere in seconds
- Too high a learning curve to cover - with great power, ...!
- Used for everything; you can get a container for any Linux OS, Anaconda, LaTeX, ROOT, and more.
- Compiled extensions
- Would need another language, like C or C++11; and Numba covers many use cases already!
- PEPs: Python Enhancement Proposals
- This is how Python is developed
- Almost all features were a PEP at one point
- Currently bogged down in finding a new governance model after Guido Van Rossum stepped down.
- Utilities
- Formatting tools can check your format against PEP 8 (formatting guidelines)
- Sphinx builds documentation for your code
- CookieCutter can make a new project from a template
- Continuous integration services test your git repository on every commit, publish docs or a website, make binaries, push to PyPI, and more.
setup.py
- You can make an installable package
- Uses setuptools (third-party, but ubiquitous like pip) or distutils (standard library, but not recommended)
- Not that user friendly yet - see flit - new tools require pip 10+
- More libraries
- Plumbum makes writing shell scripts in Python simple (note: I'm the maintainer)
- More about Jupyter notebooks
- How to write markdown, LaTeX math, hidden text, colors, and other ways to make great notebooks
- How to make a slide show from a notebook
- Examples of Jupyter Lab instead of plain Jupyter Notebook
Takeaways
- Know what to look for
- Know where to look
- Code is a product of research, just like a paper. It should be made presentable.
- Reading code is harder than writing code
- Reading good code is easier than writing good code
- Good code is tested, clear, and simple
- Less code is easier to maintain/debug than more code most of the time
- Understand the algorithm (maybe with a toy implementation), then use the existing tools if possible
- Only make code uglier for performance if it matters! Check!
- Python 3.6 was the most exciting Python release in the last 10 years, and probably for a few years to come. (3.7 was nice, but mostly focused on performance and security, and 3.8 will be bogged down in politics).
class MyBadVector:
x = 0
y = 0
z = 0
# v = MyBadVector(1,2,3) # NO!
# print(v) # Ugly!
# MyBadVector.x <- Is stored in class, not instance!
Python's best shot at providing something like this is a namedtuple - a very useful concept, but not really a very good class:
from collections import namedtuple
MyTupleVector = namedtuple('MyTupleVector', ('x', 'y', 'z'))
v = MyTupleVector(1,2,3)
print(v)
# Behaves like a tuple:
x,y,z = v
print(x)
# But also has names!
print(v.x)
There are a few options, but not many.
To really get what we want, we have to write a lot of boiler plate code that is always the same:
class MyProperVector:
__slots__ = ("x", "y", "z") # optional, used to make the class faster and smaller
def __init__(self, x=0, y=0, z=0):
self.x = x # Each argument is listed 3 (or 4) times!
self.y = y # Easy to make mistake
self.z = z
def __repr__(self):
# All classes need something like this, all about the same
return f"{self.__class__.__name__}(x={self.x}, y={self.y}, z={self.z})"
v = MyProperVector(1,2,3)
print(v)
The relatively recent but very popular Attrs project was designed to fix this. Here's what it looks like.
import attr # note: if you don't have it (you probably do), it's called attrs not attr in PyPI
@attr.s
class MyAttrsVector:
x = attr.ib(0)
y = attr.ib(0)
z = attr.ib(0)
v = MyAttrsVector(1,2,3)
print(v)
You can optionally add auto-slots, types (as an argument or in Python 3.6 style), defaults, conversion functions, validation functions, immutability, and more!
You automatically get (but can control) __init__
, comparisons, and __repr__
, and can also get __slots__
, __hash__
, and a few more.
This was so popular Python 3.7 has added a version of it to the standard library! A few "magical" features are not included, like __slots__
(Attrs actually creates a new class to add slots, while this could in very special and rare cases cause issues).
# Requires Python 3.7:
from dataclasses import dataclass
@dataclass
class MyDataClassVector:
x : float = 0
y : float = 0
z : float = 0
v = MyDataClassVector(1,2,3)
print(v)