devlog
manga

(en) Dev Log: decatholac MANGO

2023-05-26

It’s been 7 months since I first ran decatholac MANGO (dM). This is the first piece of software I made by myself that I “use” on a daily basis (I mean, that’s just the nature of it) and I’ve started to rely on it so much. This little bot has become a favorite of mine, and so I want to write about it here.

The Journey

It all started with my love for Bokuyaba and Yankee-kun to Hakujou Girl. These manga is published online on a system where the latest chapter (or the second latest chapter) can be read for free but only for a limited time (usually a week or two, until the next chapter gets released). Because of this, I need to keep up and check for new releases regularly. Problem is, not just these two, I have a long list of other manga that I want to keep up with too, and I don’t want to have to check all of their pages every day.

The simple answer would be RSS, right? That’s what it’s for, after all. But sadly, some of them don’t serve updates in RSS format. But there should still be a way…

And that’s why I decided to build rssgen. It’s a Node CLI app that crawls sites by spawning a headless browser via puppeteer, taking the HTML, parsing the contents, and then building an RSS file off of it. As a bonus, it also uploads the RSS files to some free web hosting service I registered, so now I can just subscribe to those RSS via my RSS feed app. Because I want it to be synced between my devices, I used one that’s hosted on the web.

(Later on, I remade the thing in Rust for better speed, but it’s still more or less the same.)

A few months later, I had an idea to remake the thing again from scratch, this time as a Discord bot. I always have Discord running on my PC and/or my phone, so it’ll be great if I can get notified there. I intended on building it in Rust like last time, but the Discord bot library for Rust didn’t seem to be stable yet, so I chose to go with Go instead.

The Answer

Since I’ve already has this “framework” of how rssgen works, I built dM to work with a similar flow. Still, I want to make a Discord bot this time, some things needed to be changed. Let me tell you its components…

Go Gofers

On execution, the app spawns a bunch of “gofers” that visit each manga sites and bring the contents in. rssgen opens a bunch of headless Chrome windows and loads the HTML, all together. Running them serially would make it take too much time, after all.

Recreating that parallelism for dM was a cinch because Go made it easy to make parallel processes. Plus, I didn’t even have to mess with message channels because the gofer processes don’t share data with each other. Each gofer saves the chapter data into the database and then disappears. After every gofer has finished working, it’s time for the Announcer to run.

Bot Announces

Since I don’t use RSS anymore, the upload function was replaced by The Announcer that sends messages to my Discord server, notifying me of new chapters for each manga.

I implemented a very simple, almost primitive way of checking whether a chapter has been announced or not: by just taking note of when the Announcer last announced. Chapters that was logged into the database after that date will be marked as new and will be announced the next time The Announcer runs.

Three Parsers

I think this is the meat of the app; this is what rssgen was mainly created to do, and I strived to enhance it in dM.

Some sites serve the chapter list in pure HTML, and some fetch it from a JSON endpoint. To make things easy, I made rssgen fetch the final loaded HTML and parse the contents, so that it doesn’t matter whether it has the chapter list data from the get-go inside the HTML or whether it got the chapters off a JSON endpoint via AJAX request. I supplied the CSS selectors so that it knows which one is the chapter list and it’ll compile a list of the chapters available.

But loading the pages using a browser was networkly slow and resourcely heavy. On top of that, this time, the final format of the chapters list is no longer a RSS file, but a database table. Which means I have to parse RSS files from manga sites that already serve RSS anyway. Seeing that I’ll have to build at least two parsers (HTML and RSS), I decided to just ditch the headless browser method and just have THREE parsers: JSON, HTML, and RSS.

This method is significantly faster because I no longer need to have my PC load the JavaScript and the images and the CSS and all that. I just have it fetch a plain text source and parse them immediately. This way, loading 20 pages AND parsing them doesn’t even take five seconds.

Oh, and Go makes it easy to write and run tests, too. Rather than trying out the parsers by building an executable and running it, I can just write a test case and run it separately without triggering actual Gofer fetching or Discord botting. I like tests. Cool!

Always Online dRM (decatholac Repetitious MANGO)

rssgen would’ve exited after the upload/announce process, but that’s not the case anymore. Since dM is a Discord bot, I made it so that it stays on standby, and will repeat the fetch and announce process after a set interval.

Not just that, because this is a Discord bot and Discord bots are interactive, I made it so that we can trigger the fetch and the announce process manually by giving it a command in the Discord server.

Same Old TOML

rssgen had the configuration and parser logic stored in .env and JS files, but I made the jump and used a “specialized format” for config files when making rsstygen: TOML. TOML seems to be the standard for Rust programs since it’s what the built-in dev toolkit uses. I did not change this for dM, and it still uses the TOML format for configuration. This format allows arrays and hashmaps; it’s pretty powerful. I think I’ll keep using this format for config files of other apps I make!

The Recollection

It works really well! I mean, it’s not that complicated, but still. Sometimes I have to go back to the code, tweak the gofers and parsers a little bit so that it can parse some new format of some new site that I want manga updates from.

Snippet of configuration for parsing Comic Pixiv.

Looking at you, Comic Pixiv. You and your fancy request headers…

Sometimes newest chapters are paid-only so I have to wait another week. I don’t want to write logic for that so I just have dM announce it anyway on my Discord server and then I’ll just manually mark it with a reaction.

Screenshot of decatholac MANGO announcing new manga chapters in a Discord channel.

✅ means I’ve read it, and 🅱️ means it’s Blocked for free readers. Simple!

The Lingua Franca

Building dM made me rethink Rust. At first I wanted to learn it because I want to know something more low-level than I’m used to, and it was either Rust or C#. I admit, Rust is hard for me (hehe, sorry for being a dumbass). Glad I tried Go. Jumping into Go was a smoother process than I thought it would be. For a language on par with Rust and C#, it was pretty easy to wrap my head around.

Still, there were quirks that I did not foresee that became a roadblock in the development process:

CGO and Target Machine’s Environment

There were two SQLite libraries for Go: go-sqlite3 and sqlite. They’re both the same thing, but the latter has no CGO (the calling of C code in GO) dependency.

I did not think that would matter. After all, it’s C. Any OS can run C. But surprise surprise, the C code being called as dependency by the former sqlite library isn’t packaged along with the binary I built, but instead, taken from the machine it’s run on.

When I shipped the executable to a VPS I borrowed from my friend, it won’t run. My machine and the VPS was both Ubuntu, but they have different versions of the C binaries. I had to swap out the library, tweak the code a little bit, and rebuild the executable for it to run on that machine. In the future, I’ll have to remember to choose CGO-free libraries so that I can have an easier time shipping programs.

stdout/stderr Confusion

At first, I used log.Println() to output messages. I did not notice that it prints to stderr instead of stdout. When I ran it using pm2 in a VPS, the output log was empty, and the messages are all logged into the error log.

The correct way to output to stdout was to use fmt.Println(). Fortunately, it wasn’t hard to change all the function calls to it.