Overall, the code is classic Python code: not evil, but not great either. Let's dive in.
Here is the source code organization:
First thing I notice is that it looks small, import time logic. Relying on code executing at import time is notoriously evil as it breaks various tools e.g. . Not only it breaks essential tooling, but because the code executes at import time, depending on how imports happen, the code will execute in some order instead of another without having touched the file where the code is, making the behavior of an application or worse a library unpredictable.reports 647 lines of code, so that could definitely be a single module file. Again reports 128 lines in . I recommend against having code inside and outside imports or a trivial function, because I rarely see code in those files, which in turn makes it difficult to remember to check that it is in fact trivial or empty files, and check in particular that it does not contain
Luckily, in the case of loconotioncontains only command line interface logic, it is wrapped inside a function and only executed when , so that is OK .
Let's move to. It is a very tiny file with 52 lines of code. I do not create files before refactoring. Predicting whether a file will be useful help for the reader is difficult to do before the code is written. Also it is much easier to navigate a single file than a directory and many small files. A file is not a zero-cost abstraction in terms of cognitive load for the reader or the writer (if there is such a thing such as "zero cognitive cost abstraction"). The writer needs to figure a good name for the file, and the reader must keep track of where objects definitions are located with the supplementary vocabulary that was invented ad-hoc to host in the case of two classes each of which contains one method called... ! By the way, whether you use a "code navigator" so called jump-to-definition or just a grep-like tool does not matter: one file with only 600 lines will ALWAYS beat one directory and three files.
Further reading, the thing that strikes in this particular case, but that dominates the world wide python mainstream mind share: . You might disagree with me about the broader Python ecosystem (this is not the last review!). So, let's just focus on this case:
What this function does ahem I mean to say is "check whether a page is fully loaded" because notion will load lazily a page and its content, so loconotion need the page to be fully loaded by the headless browser (in the snippet that is the variable ), before reading the complete html and writing it to a local file. Something is odd. I mean, even if you do not know the semantic of a class that inherits nothing and has a single method is "prolly evil".
My first take would be to replace thiswith one or more function, it does not loose generality or expressive power.
One thing that I do systematically: first return trivial cases, so that I do not need to think about it while reading the rest of the code, for instance:
The above is much more readable as follow:
Do not be fooled by theoperator, it does not matter whether the predicate is "reversed", deal with simple cases first. Also, it saves a level of indentation.
If you really need a boolean, it is equivalent to:
All things considered, the following is fine:
There is singlethat is around 600 lines,
The name of theis but it does much more,
Too much indentation, that can be reduced with the previous trick where first the code deal with simple cases,
Scraping the page which includes loading the complete page should be separated into different functions to make it more testable,
Recursion is not pythonic.
There is several small nitpicks to do about e.g. multi-line assignments to multiple variables, and in general multi-line statements: that hurts readability.
The code is well documented and it is easy to figure what happens. Also, it does the job!
It 280 lines of code
It useand instead of soup
It useinstead of
It useinstead of
There is no configuration file
There is no "nice urls" and some urls are definitly ugly e.g. a page inside a table.
It took me 8 hours
I do not believe the code is perfect, but that is a code base I can work with. There is one thing to change to make really testable: move the code fetching the page outsidefunction. It is not perfect, but that is what I use to render this website
The only feature I miss is an feed :)