Scaling Notifications On Elgg To Support Rich, Context-Aware Emails

One of the core aspects of a social networking site is its ability to notify its users by leveraging different frameworks. Social networks that have complex access restrictions are entirely different beasts to build and scale compared to sites that are either mostly open, or are those where the content generation can only be done by a handful of users.
I have been running an Elgg site for an old client since 2009, which is a private gated network. At an early stage itself we ran into problems with the newsletter that had to go out to the entire user base. This was from a time when products like MailChimp were not an option and we were also working with a fairly limited budget. At the first stage, we mitigated the problem by using a job queue that was built on MySQL.
As any engineer will tell you that a job queue based on an RDBMS that can only run one worker, or even worse depends heavily on locking to run multiple workers is not a job queue. Eventually, it will cause more trouble than what it is worth and that is what we got into. Besides, as an Elgg site grows and you introduce more features to it, something that can farm out jobs and handle them asyc is worth its weight in gold.
Eventually, I wound up creating a simple set-up using Beanstalkd. The notification handler and the generic mail handlers are overwritten to add jobs to the Beanstalk queue and a PHP worker job (managed by Supervisord) processes the jobs in the background. I could go a level deeper and even leave out the individual job creation to Beanstalk itself, but the current approach seems to be holding up well for the moment, so, that next step can easily wait for a while longer.
Couple of pitfalls you need to watch out for, should you attempt to do the same thing:
1. Content encoding. This will drive you nuts if your scripts, DB tables and the CLI environment are different in how their locales are set up. Do not assume that everything that works in the browser will work the same in CLI. It won’t.
2. Access: The CLI script loads the Elgg environment and has no user. So, be aware of any functions that use sessions to return results.
3. Valid entities: PHP will error out when faced with an attempt to call a method on a non-object. If you don’t kick or bury a job (which is not possible when the script exits with an invalid object error) that is causing the error, the script will endlessly start and stop again. You have to obsessively check every object for validity before you attempt to do anything with it.
4. Use MailCatcher on your development set up. It will save you a ton of time, even though it does make the server itself a bit sluggish.
There are few other options available in the Elgg ecosystem to do the same like Jettmail and the upcoming Async notifications feature in Elgg 1.9. But both have their own complexities and issues and I could not wait till 1.9 and I needed something that didn’t require as much fiddling as Jettmail.
It is also possible to further extend this kind of development to leverage some of the transactional email services out there to use the inbound email feature to post to Elgg with webhooks. There are, though, no plans to roll that out right now and I will update this post if we ever get around to doing that.

Never mind.