Peg and string
Photo by Yulia Ilina from Pexels

The Mystery of the Hanging ’Peg

Brett Uglow

--

I love a good IT mystery! Who broke the system? Was it an innocuous software update? New hardware? Or was it a random cosmic-ray that flipped a bit in the datacenter? But — as always — the most important question is not who, but why did the system break?

During the Christmas holidays, I stumbled into my own IT mystery… the Mystery of the Hanging ’Peg.

It all started with Picasa

Picasa was a fantastic, free, easy to use photo gallery viewing tool that also included photo editing. I loved it. And the thing I loved most about Picasa was the ability to mark photos as favourites, then view a gallery containing just the favourites.

Then Google bought Picasa, and killed it.

“Don’t be evil.” – Googles famous guiding principle/recommendation/suggestion/feel-good marketing phrase.

Over the years, my photo & video storage needs have increased. I needed to find a replacement for Picasa.

Cloud storage

When smart phones came along, vendors began to offer the convenience of backing up your photos to cloud storage, for free! This seemed like a great idea until it became clear that:

  • Free storage was not endless, especially as media sizes grew.
  • Some photos should never be put into the cloud in case the service is hacked.

“Local Cloud” Storage

Hosting the family photos on my local network seemed like the most sensible option, and most NAS devices offer some sort of photo app to do this. But none of them are as good as Picasa.

After much searching, I came across Thumbsup – a NodeJS-based static photo gallery generator that uses ffmpeg to generate thumbnails of videos. (If you’re looking for something that isn’t PHP-based and runs in Docker, check it out). The reason I decided to use Thumbsup? Support for `picasa.ini` files, which contain the information about which photos are favourites!

But Thumbsup only reads those files, and I wanted to be able to write to them while viewing the gallery. It was time to introduce an innocuous software update!

A simple plan

I have a lot of experience building websites and APIs. To be able to update a file through a website (the Thumbsup-generated site), I would need an API that I could call from the webpage. This meant I needed an API server to be running in the background, ready to update the `Picasa.ini` files whenever I pressed the ⭐️-button for a photo. Piece of cake.

To keep things simple, I decided to build a Docker image that extends from the official Thumbsup Docker image. My image would contain the API server – written as a NodeJS Express app – and a custom Thumbsup theme that would support calling the API when the ⭐️-button was pressed.

The final piece of the plan was to regenerate the website whenever the API was called. This would allow the favourite-state of a photo to be displayed correctly when the page was revisited. Thankfully, Thumbsup is really fast at rebuilding because it can detect which files have changed and just regenerate the content for the changed files. So all the API needed to do after updating the `picasa.ini` file was to call the “build” command.

The child-process module

NodeJS provides a built-in module for calling commands: child-process. In NodeJS vernacular, the module allows the NodeJS program to create a subprocess (run a command). This is exactly what I needed to do.

The child-process provides 4 different ways to start a subprocess, and each option – as I discovered – is suitable for particular use-cases. Upon a cursory reading of the options, I decided to use the exec() option. This option seemed like the best as it allowed me to pass the entire build-command as a single string-argument:

thumbsup --input /input --output /output --config /app/fixtures/config.json

The other options were either not what I wanted (fork(), execFile()) or looked more complex (spawn()).

So I wrote the API as an Express app in NodeJS, tested it locally, and everything seemed fine. After a few nights of development, I had all of the pieces together. It was time to upload the Docker image to the network-attached storage (NAS) device, and try it out.

Down the rabbit hole

After a few false starts with simple config errors and minor bugs, it actually started to work on the NAS! I could see log file entries showing ffmpeg was generating thumbnails from videos, ImageMagik was doing the same for photos, and website files were being generated. But after about 10–15 minutes the Docker image would show high RAM usage but low CPU usage, and the log file wasn’t being updated. Something was wrong.

“Must be not enough RAM,” I told myself. The NAS I had came with just 2GB of RAM. In a remarkable act of foresight, I had ordered additional RAM before embarking on this journey, but it was still a week away from being delivered. What to do? I stopped and started the Docker image.

Surprisingly, I could see more progress in the logs! Ffmpeg was processing the next video, and the next one… until it seemed to hang again.

I looked at the Thumbsup GitHub repo. We’re other people seeing the same issue? Nope. I checked the ffmpeg website. Was ffmpeg known for hanging? Not really. In fact, from what I had read, ffmpeg seemed to be pretty reliable.

Digging deeper

When less-experienced software engineers come to me with a problem, there’s a checklist I unconsciously work through to help determine where the root cause of the problem might lie. One of the items in that checklist is, “Is it likely that you are the first person in the world to have faced this problem?” 99-times-out-of-100, the answer is “no”.

The next step is to search the internet for keywords that other people would use when they came across the same problem. Most of the time, search results point to the venerable Stack Overflow, or to an issue in a git-repo. In my case, this search turned up empty. Had I really stumbled into a 1% bug? That seemed unlikely – writing an API that executes a command is not exactly a rare thing to do. It just had to be something wrong with my software, so I went back to the drawing board.

Thumbsup had an option to increase the amount of logging. I turned that on, and I could see more details about what Thumbsup was doing with each photo and video. But the Docker process still kept hanging. In fact, it seemed to take less time to hang than before I increased the logging level! Hmmmm.

The penny drops

When reading the child-process docs initially, I remembered reading something to do with a buffer-size. I re-opened the docs and checked the exec() command, and there was this option:

maxBuffer <number> Largest amount of data in bytes allowed on stdout or stderr. If exceeded, the child process is terminated and any output is truncated. See caveat at maxBuffer and Unicode. Default: 1024 * 1024.

Could it really be that simple – that Docker was hanging because the build-command was consuming the entire exec() buffer?

After a few minutes, I had changed the code to use spawn() instead, which by-default will pipe the subprocess’ output to the parent process instead of using a buffer. I rebuilt the Docker image, deployed it to the NAS, crossed my fingers and started it up. 10 minutes. 15 minutes. 20 minutes. 30 minutes without a hang! It was working!

“The Mystery of the Hanging ’Peg” was caused by choosing the wrong NodeJS child-process function to run a subprocess. When the subprocess used up its output buffer, it terminated and without a log message to explain what had happened.

--

--

Brett Uglow

Life = Faith + Family + Others + Work + Fun, Work = Software engineering + UX + teamwork + learning, Fun = travelling + hiking + games + …