Search
Sharing your Code

Week 14 Day 3: Sharing and documenting code

Objectives

  • Go over the final presentation
  • Go over the final project report
  • Talk about reproducible research
  • Talk about what we didn't talk about.

Presentations

Previous rubric:

Style

  • Include page numbers? 10 pts
  • Fonts legible? 5 pts
  • Some reasonable figures/images? 10 pts
  • Reasonable amount of text per slide (not too much)? 5 pts
  • Attractive slides? 5 pts
  • Multiple obvious grammatical/spelling errors? 10 pts

Content

  • Intro
    • Clear to non-expert? 15 pts
    • Explains why the audience should be interested? 15 pts
  • Plans/results
    • Clear description of what was done or will be done? 15 pts

Speaking

  • Stayed on time? (UG) 5 pts
  • Avoid directly reading slides? 5 pts

See 100 point talk rubric

Final presentation:

Sign up for a time slot!

Parts

  • Title slide with your name.
  • Intro (similar to before, 2-3 slides)
  • Procedures (discussion of what was done, 2-3 slides)
  • Results (discussion of what was accomplished, 2-3 slides)
  • Future plans (what can be done, 1 slide)
  • Conclusion (Quick recap in 1 slide)
  • References (1 slide)
  • Backup (anything)

Guides

  • Have at least 2 images. More is better.
  • You must have slide numbers.
  • Stay on time. You have 10 minutes, plus ~2 minutes for questions. If you go over, you will have to stop.
  • Keep text per slide down - The 1-6-6 rule is too strict, but keep it in mind. Maybe see this page.
  • Think visual - how do you make your content clear and have an impact?

Project report

Format

  • You will "submit" a git repository with your code.
    • It does not have to be in git. But recommended. If not, you should be able to send a zip file with it.
  • Should be reproducible - if I have access to your data, I should be able to run your code and get your results.
  • Structure should be similar to recommended structure in class
    • Must have 1+ .py files with implementations
    • Should have notebook(s) or python scripts that run the library parts
    • Should have a list of some sort of requirements / used packages
    • Data does not have to be included
  • Two documents required:
    • README.md (or similar): basic instructions to someone who wants to run your code
    • Writeup as notebook or document: see below

Writeup

  • Report can be written in a notebook (with or without a bit of example code)
  • Should contain similar sections the slides (so see above)
  • Should be 2-4 pages in length, not counting code.

Example of common Python environments

Reproducing your environment

Several methods:

  • setup.py: make an installable package
    • Good for libraries, but doesn't keep exact version numbers
    • Harder to setup
  • environment.xml: list of packages for Conda
    • Still only one version
    • Requires Conda
  • requirements.txt: List of PyPI packages
    • Directly supported by Pip
    • Only PyPI packages
  • Pipfile and Pipfile.lock: list of packages for Pipenv
    • Stores "nice" and "reproducible" version
    • Always and automatically generated by Pipenv
    • Only PyPI packages

Example: Pipenv

pipenv install numpy uproot

This creates a virtual environment, a Pipfile and a Pipfile.lock. Pipfile will list general, explicit dependencies. Pipfile.lock will list all installed packages, including requirement from pip, and will have exact versions (and is an ugly file). You can do a few extra things in the Pipfile too, like set up cross-dependencies (install this back-ported package only on old Python, for example), require a Python version, and more.

Why?

Once you have this:

  • Others can make a virtual environment and run your code in 1-2 lines
  • You have a history of exactly what worked if something breaks
  • You can even give access to your code in a service like Binder; no install required
  • It's easy for you to set up your code on a new computer

Example: PvFinder

This is a project I'm working on. Notice several things:

  • Pipfile: I don't actually use this (I'm using Conda) but I'm still tracking dependencies
  • README.md: A nice, markdown readme is in several directories
  • model/: The Python code is in here. I picked a bad name, but I am stuck with it.
  • notebooks/: Most of the "user" code is here, with explanations and plots. I clear the notebooks before committing them.
  • scripts/: Files in here are run directly from the shell.
  • tests/: I have only a few tests, but few is better than none.
  • Data is stored in another location, but symlinked in.
  • I make large changes in a branch, then make a Pull Request, keeps nice history and helps collaborators

What did we miss?

Other topics we did not cover this semester because we were too short on time.

Python language features

We mostly skipped more advanced features or features that are really new.

  • Lambda functions: You can just write regular functions.
  • Advanced iterators/generators: Nice, but not critical for scientific code.
  • Async/Await: I've been trying for years to use this in scientific code. Haven't found a use yet. Maybe IPython 7's new support will make it useful for animations? Very much a Python 3 feature.
  • Static type hints: Nice tools (editors like PyCharm, static analyzer like MyPy) could make this useful, but it adds lots of extra bits of code and is very much a Python 3 feature. Can slow your code down a bit (better in Python 3.7 and will be even better in Python 4)
  • How to write a decorator: Very tricky to get just right.
  • Metaclasses and advanced class creation: Even if you think you need this, you probably don't.
  • Threading: Python really can't do faster computation in multiple threads; you should use something like Numba to do so. Regular threading (and async) is designed for IO, networking, etc; places where you sit and wait.
  • Python 2: Even though many experiments are still stuck on Python 2, it is a dying language. Next year, many more libraries will drop support from new releases. Most of what you know can be used or adapted to Python 2 if you have to.
x : int = 0 # A static code analysis tool will complain if you set x to a non-int value.
  • Shells: we didn't cover shells - it's a key part of working on any Unix system.
    • Several old shells exist, like SH (1979) and CSH (1978). Don't use them.
    • Bash) (1989) is the most common shell. It's what people mean by a shell most of the time. It's default pretty much anywhere, including macOS.
    • New shells are fancier or more user friendly. ZSH is Bash on steroids, FISH is a re-imagining of a shell for the 1990s (that's considered new for shells), and Xonsh is a shell written in Python 3.
  • Containers: like Docker
    • Run a pristine custom Linux environment anywhere in seconds
    • Too high a learning curve to cover - with great power, ...!
    • Used for everything; you can get a container for any Linux OS, Anaconda, LaTeX, ROOT, and more.
  • Compiled extensions
    • Would need another language, like C or C++11; and Numba covers many use cases already!
  • PEPs: Python Enhancement Proposals
    • This is how Python is developed
    • Almost all features were a PEP at one point
    • Currently bogged down in finding a new governance model after Guido Van Rossum stepped down.
  • Utilities
    • Formatting tools can check your format against PEP 8 (formatting guidelines)
    • Sphinx builds documentation for your code
    • CookieCutter can make a new project from a template
    • Continuous integration services test your git repository on every commit, publish docs or a website, make binaries, push to PyPI, and more.
  • setup.py
    • You can make an installable package
    • Uses setuptools (third-party, but ubiquitous like pip) or distutils (standard library, but not recommended)
    • Not that user friendly yet - see flit - new tools require pip 10+
  • More libraries
    • Plumbum makes writing shell scripts in Python simple (note: I'm the maintainer)
  • More about Jupyter notebooks
    • How to write markdown, LaTeX math, hidden text, colors, and other ways to make great notebooks
    • How to make a slide show from a notebook
    • Examples of Jupyter Lab instead of plain Jupyter Notebook

Takeaways

  • Know what to look for
  • Know where to look
  • Code is a product of research, just like a paper. It should be made presentable.
  • Reading code is harder than writing code
  • Reading good code is easier than writing good code
  • Good code is tested, clear, and simple
  • Less code is easier to maintain/debug than more code most of the time
  • Understand the algorithm (maybe with a toy implementation), then use the existing tools if possible
  • Only make code uglier for performance if it matters! Check!
  • Python 3.6 was the most exciting Python release in the last 10 years, and probably for a few years to come. (3.7 was nice, but mostly focused on performance and security, and 3.8 will be bogged down in politics).

Bonus feature: DataClasses/Attrs

We can't end without seeing some code! So let's look at a feature you probably thought existed when you tried to write your first class. A "data" class.

class MyBadVector:
    x = 0
    y = 0
    z = 0
    
# v = MyBadVector(1,2,3) # NO!
# print(v) # Ugly!
# MyBadVector.x <- Is stored in class, not instance!

Python's best shot at providing something like this is a namedtuple - a very useful concept, but not really a very good class:

from collections import namedtuple
MyTupleVector = namedtuple('MyTupleVector', ('x', 'y', 'z'))
v = MyTupleVector(1,2,3)
print(v)
MyTupleVector(x=1, y=2, z=3)
# Behaves like a tuple:
x,y,z = v
print(x)

# But also has names!
print(v.x)
1
1

There are a few options, but not many.

To really get what we want, we have to write a lot of boiler plate code that is always the same:

class MyProperVector:
    __slots__ = ("x", "y", "z") # optional, used to make the class faster and smaller
    
    def __init__(self, x=0, y=0, z=0):
        self.x = x # Each argument is listed 3 (or 4) times!
        self.y = y # Easy to make mistake
        self.z = z
        
    def __repr__(self):
        # All classes need something like this, all about the same
        return f"{self.__class__.__name__}(x={self.x}, y={self.y}, z={self.z})"

v = MyProperVector(1,2,3)
print(v)
MyProperVector(x=1, y=2, z=3)

The relatively recent but very popular Attrs project was designed to fix this. Here's what it looks like.

import attr # note: if you don't have it (you probably do), it's called attrs not attr in PyPI
@attr.s
class MyAttrsVector:
    x = attr.ib(0)
    y = attr.ib(0)
    z = attr.ib(0)
v = MyAttrsVector(1,2,3)
print(v)
MyAttrsVector(x=1, y=2, z=3)

You can optionally add auto-slots, types (as an argument or in Python 3.6 style), defaults, conversion functions, validation functions, immutability, and more!

You automatically get (but can control) __init__, comparisons, and __repr__, and can also get __slots__, __hash__, and a few more.

This was so popular Python 3.7 has added a version of it to the standard library! A few "magical" features are not included, like __slots__ (Attrs actually creates a new class to add slots, while this could in very special and rare cases cause issues).

# Requires Python 3.7:

from dataclasses import dataclass
@dataclass
class MyDataClassVector:
    x : float = 0
    y : float = 0
    z : float = 0  
v = MyDataClassVector(1,2,3)
print(v)
MyDataClassVector(x=1, y=2, z=3)