In November of 2022 I took part in the National Novel Generation Month (NaNoGenMo) and had a blast! It was my first time joining this prestigious event after reading about it in the /r/procedural_generation subreddit. I was immediately captivated by its simple premise: “Spend the month of November writing code that generates a novel of 50k+ words.”
What constitutes a novel is pretty broad on purpose:
- Two machine-learning models, one generating nonsensical poems, and the other trying to make sense of them writing in the persona of the distressed poet? #27
- A version of Moby Dick in which every word only appears once and is consecutively blacked-out? PDF
- “meow” repeated 50,000 times? yes
It’s a novel!
All of this really resonated with me, because I am fascinated by procedural generation, a melting pot of code, algorithms, randomness, and art. And while you can use any technology to produce basically any result (as long as it contains 50k words), you have to start and finish in the month of November, which is a nice challenge that prevents scope-creep. And finally I am a lazy software developer: I love the idea of writing code that does all the hard work for me.
On top of all that: NaNoGenMo is not a contest. You simply try to do something cool and show it to your fellow participants, no winners, no losers, just, err, art.
While scrolling through the entries of the pasts years (which I highly recommend), I quickly thought about a “self-describing novel”. A software development project that documents its own development, as a kind of novel: Novelopment. With every commit to the git repository the novel would get longer. Bold feature development, followed by embarrassing reverts or quick fixes. That beautiful moment when the first contributor joins the original author! And finally the last commit - maybe for now, maybe forever. Thats a novel if I ever read one!
And that’s what I started on the 3rd of November. I could tell you about the development step by step, but might as well let Novelopment itself do the talking [post continues below]:
The bitterly excellent story of novelopment
While you may have been enticed to grab this book because of its title “The bitterly excellent story of novelopment”, this is actually the story of 1 human building novelopment in 35 commits.
The saga started whilst first time contributor dubbl authored a commit with the message “setup basic project structure”, a commit with the message “add main.py and parse first repository” and a commit claiming to “add apache 2.0 license” on Thursday, November 3rd 2022. Around 3 days down the road on November 6th 2022 dubbl authored a commit described as “add more structure to novel output”, a tiny commit claiming to “use pycorpora for random adjective in title”, a tiny commit with the message “add README.md” and a tiny commit claiming to “add seeding of PRNG”.
Working on it
On Sunday, November 13th 2022 aforementioned dubbl created a tiny commit called “add handling of title for local repositories”, a commit claiming to “add and use black code formatter”, a commit claiming to “data mine the repository for events (commits) and their actors”, a tiny commit described as “introduce simplenlg to pluralize actor_word”, a tiny bug fixing commit with the message “fix linting issues”, a commit called “add initial content determiner”, a tiny commit called “update readme with new parameters” and a tiny defect fixing commit called “rename src to novelopment, fix logging”. About 1 week later on November the 20th 2022 they crafted a commit with the message “add basic document planner and realizer” and a commit claiming to “add very basic sentence aggregation in realizer”. More than 1 week later on November 28th 2022 dubbl composed a commit called “start work on aggregator”, a commit called “add complementizer handling to realizer” and a commit claiming to “handle multiple complements/objects”. The very same one wrote a commit claiming to “add aggregation on time and cue phrase support”, a tiny commit claiming to “handle multi-line commit messages” and a tiny commit claiming to “exclude merge commits” on Tuesday, November 29th 2022. Exactly 1 day down the road dubbl adds a commit with the message “add document planning for second committer and the end”, a commit described as “start referring expressions generator”, a commit called “move get_word to lexicon”, a commit called “add entity description generator”, a commit with the message “add time expressions”, a tiny commit called “conclude the sagas final sentence”, a tiny commit described as “detect and describe reverting commits”, a commit with the message “updates deps, add jinja2”, a commit called “add html rendering”, a commit claiming to “add ebooklib dependency” and a commit with the message “add epub export option” on Wednesday, November 30th 2022.
The end (for now)
On November 30th 2022 the previously mentioned dubbl composes a tiny commit with the message “set version to 1.0” and for now the coverage concludes.
Beautiful stuff, but not quite 50,000 words! To reach this threshold I eventually ran the script against popular Python webframework Flask - a project whose 4000 commits were enough to tell a 57,890 words novel with 359,519 characters on 139 pages (when exporting the .epub to PDF). What a read! Funnily enough one of the admins of NaNoGenMo found himself in that novel:
On October 25th 2017 first time contributor hugovk added a tiny commit with the message “Remove IRC notifications”.
If you want to read more about my development progress during November you can check out Novelopments three Dev Logs. One thing missing is that I had to add a “emergency security fix” a couple days after the end of the event, to correctly escape commit messages - novelopment uses Jinja2, and that does not autoescape input by default, which screwed up the layout of the epub exports. 😬
Apart from the artistic aspect of the event, I was also motivated to use it as an excuse to learn more about Natural Language Generation (NLG). While many may immediately think about the prompt based neural network language models like GPT-3 or ChatGPT, rule-based NLG is much older and has been fascinated researchers since ELIZA appeared in the 1960s. I have been into this kind of stuff since I wrote my own little chatbot B.I.L.L. in PHP when I was 16 years old. Learning from its chat partners, it quickly became pretty… NSFW. I should have probably told Microsoft and Facebook about this issue…
Anyway, todays most popular library for the so called (surface) realization is SimpleNLG (created in 2009), which has been ported from Java to many other languages, among them Python, and which I also used for Novelopment. Novelopment tries to follow the 6 stages of NLG (adding a 0th stage of “Data mining” at the beginning), which I also cover in Dev Log entry 2. To learn more about the current state of the art of NLG I recommend to check out the very interesting blog of the original author of SimpleNLG Edud Reiter.
Next year I’d definitely like to join again. Maybe I’ll sprinkle some machine learning over my project - something like the smallest possible neural network that still can produce something novel-ish? Maybe something entirely different.
Regarding this blog I’d like to post more, smaller blog post this year.
This blog post is mainly motivated by the fact that I boldly signed up to “Bring Back Blogging”.
Pledging to write 3 articles in January seemed easy enough in December, despite my terrible track record.
But that was promptly followed by a broken elbow at the start of January, which was quite a blow to my blogging ambitions (and maybe also an attempted divine intervention? That didn’t work!). But here we are, on January 31st, with a nearly healed elbow and at least one post. More to follow!