By Will Thong, Wed 02 August 2023, in category Programming
For my first foray into contributing to open source projects, I thought there’d be
nothing better than making it meta by helping out the brilliant project which this blog
is built with: the Pelican static site generator! So I looked
in its Issues for a good-first-issue
tag. This
issue immediately jumped out to me:
it was replacing an exising feature, so there was a clear yardstick for success, and
guardrails to prevent me from inflicting too much damage. I also thought it would teach
me something about how Python handles time zones. I actually ended up learning a lot
more than I’d expected, and hopefully the following can help anyone else hoping to
replace pytz
with Python’s native alternatives.
Pelican relied on a third-party dependency, pytz
.
This was a popular library which defined timezones, allowing us (for instance) to
accurately work out which of an article published at 7.15pm in New York time and an
article published at 9.00pm in London time was published first. However, Python 3.9
brought this functionality into Python’s standard library through
zoneinfo
and the developers wanted
the external dependency removed.
Having forked and branched the projeect, I got to reading the source code. I began by
grep
ping1 for pytz
, and in each case working out why pytz
was being used.
First, I identified that pytz
was being used in contents.py
. Its timezone
method
was used to convert the user-selected ‘TIMEZONE’ setting from a string in the list of
TZ database time zones
(see Pelican
documentation) into a
pytz.timezone
object. This object would then be used to specify a timezone for the
timestamp for each piece of content. Looking at the zoneinfo
documentation, it seemed that its own
ZoneInfo
class could be initialised using the same string. I tested this in a Python
shell before replacing the timezone
method with the ZoneInfo
class. Similarly,
contents.py
was also using pytz
to define a default timezone to work out if a draft
was in the future or the past. If the draft had a timezone in it, pytz
was needed to
establish the timezone-aware current time (that is, the time when the contents were
being generated). Fixing this was a simple matter of replacing the pytz.utc
timezone
with native Python’s timezone.utc
.
The next place pytz
cropped up was in pelican_quickstart.py
: when the user runs the
script after first installing Pelican, it generates a list of timezone options to offer
the user.2 Originally, list comprehension was used to iterate through the list of
timezones in pytz.all_timezones
. It would also store the timezones in a list so that,
having compared lower-cased versions of the user’s input and the timezone, it could
store the properly-capitalised version (a UI feature which permits the user latitude in
how they type the timezone). The equivalent of pytz.all_timezones
in native zoneinfo
was the available_timezones()
function. However, upon replacing the list comprehension
and lookups, I encountered a bug. pytz.all_timezones
is a LazyList, whereas
zoneinfo
‘s available.timezones()
function returns a set. As sets don’t have indexes,
I couldn’t use a list to store the timezones as in the original code. Instead, I chose a
dictionary to record lowercase versions of each timezone (the key) against its proper
capitalisation (the value), the latter being finally stored in the settings.
The final place pytz
was used was utils.py
, where the set_date_tzinfo
function
converted timezone-naïve dates to timezone-aware dates. Similarly to contents.py
, this
was a simple drop-in replacement.
Having made my changes, I tested running pelican
on my system’s default version of
Python: 3.11. The first problem was a complaint that tzlocal
was missing, so I simply
imported it in the offending script (pelican_quickstart.py
).
Then, conscious that Pelican also supports Python 3.7 and 3.8 (neither of which includes
zoneinfo
), I tested pelican
in those Python versions using
pyenv
to select the appropriate Python version and
poetry
to install Pelican’s dependencies. Of course I
got an error when the various scripts tried to import zoneinfo
. Luckily (and as
referenced in the initial issue), a
backport exists so I added it to the
Poetry configuration for Python versions earlier than 3.9 (I later learned I should’ve
put it into setup.py
too). I initially used try
/except
block to import either
zoneinfo
or backports.zoneinfo
for earlier version of Python, then using further
try
/except
blocks to call the relevant module. However, one of Pelican’s
maintainers, Deniz, helpfully pointed out that it’d be
simpler and more efficient to import the backport with the same name in the initial
try
/except
block, thus obviating the need for any future error handling.
Having pushed changes to my fork and made a pull request, I was surprised to notice that
my commit had changed nearly every single line in contents.py
. This made it much
harder for code reviewers to see what changes I had made. I discovered that these
changes were automatically made by my editor, Neovim, upon save, according to the code
formatter I had installed, Black. The lesson was
twofold. First, when working with others’ code, it is important not to format the entire
file! An easy way of doing this in Vim is to temporarily not run any post-save scripts
with :noa w
. Second, it is essential to check with git status
before making any
commits! Doing so would have shown me that I had inadvertently made a lot of changes.
Another challenge popped up after making my pull request. Unlike Linux and macOS,
Windows does not provide the IANA database of
timezones. To fix
this, I used backports-zoneinfo[tzdata]
.
I really enjoyed learning more about how Pelican and Python work under the hood.
Alongside the lessons detailed in the above section, this bug fix also taught me lots
about the open source software development process. I learned how to use pyenv
to
ensure compatability across different Python versions and how to use poetry
to handle
dependencies. I learned some of the nuances of importing dependencies into Python
programs. Working with GitHub Actions, meanwhile, gave me a practical understanding of
how continuous integration and continuous delivery, specifically automated testing,
ensures that code changes can be integrated back into a codebase.
Pelican’s maintainers, in particular Justin Mayer, were incredibly helpful and encouraging through the process of my first open source contribution. This was a great experience and I’d recommend it to anyone!
I think we still use the grep
verb even if the program we’re actually using is
the faster ripgrep
! ↩
Logically, I suppose I should have dealt with this before
looking at contents.py
, but in the event as it was a marginally more complex problem
I’m happy that rg
‘s alphabetic order presented them to me the ‘wrong’ way round. ↩