Scaling Notifications On Elgg To Support Rich, Context-Aware Emails

One of the core aspects of a social networking site is its ability to notify its users by leveraging different frameworks. Social networks that have complex access restrictions are entirely different beasts to build and scale compared to sites that are either mostly open, or are those where the content generation can only be done by a handful of users.

I have been running an Elgg site for an old client since 2009, which is a private gated network. At an early stage itself we ran into problems with the newsletter that had to go out to the entire user base. This was from a time when products like MailChimp were not an option and we were also working with a fairly limited budget. At the first stage, we mitigated the problem by using a job queue that was built on MySQL.

As any engineer will tell you that a job queue based on an RDBMS that can only run one worker, or even worse depends heavily on locking to run multiple workers is not a job queue. Eventually, it will cause more trouble than what it is worth and that is what we got into. Besides, as an Elgg site grows and you introduce more features to it, something that can farm out jobs and handle them asyc is worth its weight in gold.

Eventually, I wound up creating a simple set-up using Beanstalkd. The notification handler and the generic mail handlers are overwritten to add jobs to the Beanstalk queue and a PHP worker job (managed by Supervisord) processes the jobs in the background. I could go a level deeper and even leave out the individual job creation to Beanstalk itself, but the current approach seems to be holding up well for the moment, so, that next step can easily wait for a while longer.

Couple of pitfalls you need to watch out for, should you attempt to do the same thing:

1. Content encoding. This will drive you nuts if your scripts, DB tables and the CLI environment are different in how their locales are set up. Do not assume that everything that works in the browser will work the same in CLI. It won’t.

2. Access: The CLI script loads the Elgg environment and has no user. So, be aware of any functions that use sessions to return results.

3. Valid entities: PHP will error out when faced with an attempt to call a method on a non-object. If you don’t kick or bury a job (which is not possible when the script exits with an invalid object error) that is causing the error, the script will endlessly start and stop again. You have to obsessively check every object for validity before you attempt to do anything with it.

4. Use MailCatcher on your development set up. It will save you a ton of time, even though it does make the server itself a bit sluggish.

There are few other options available in the Elgg ecosystem to do the same like Jettmail and the upcoming Async notifications feature in Elgg 1.9. But both have their own complexities and issues and I could not wait till 1.9 and I needed something that didn’t require as much fiddling as Jettmail.

It is also possible to further extend this kind of development to leverage some of the transactional email services out there to use the inbound email feature to post to Elgg with webhooks. There are, though, no plans to roll that out right now and I will update this post if we ever get around to doing that.

Running 3.8.0-29 Kernel On ElementaryOS Luna

After a bit of tweaking and fiddling I have managed to get the 3.8.x kernel running on the Acer Aspire V5 431. Unlike the previous time when I tried it and failed to get bcmwl-kernel-source to compile from the package manager, this time it worked with a different approach. Thanks to this post on AskUbuntu, I picked up the latest bcmwl-kernel-source (6.30.223.30) and installed it.

The package installs without any issues and it enables WiFi for the machine. If hit the problem where the driver is shown and installed and activated, yet, you can’t seem to get the WiFi going, just make sure the other WiFi modules are blacklisted and disabled.

My blacklist looks something like this:

blacklist b44
blacklist b43legacy
blacklist b43
blacklist brcm80211
blacklist brcmsmac
blacklist ssb

You also have to make sure that the ‘b43′ is commented out in cat /etc/modules if it is present there.

I have also been able to make the Huawei EC1260 Wireless Data Modem (Tata photon+ being my provider) to work with the kernel. You will need to configure usb_modeswitch for that. After which the device will show up with the 12d1:140b profile.

The profile data looks like this:

DefaultVendor= 0x12d1
DefaultProduct=0x140b
#HuaweiMode=1
MessageEndpoint=0x08
MessageContent=”55534243123456780000000000000011062000000100000000000000000000″
NeedResponse=1
CheckSuccess=10
DisableSwitching=0

The 3.8.x kernel seems to be pretty good. The machine runs a lot cooler than what it has with the 3.2.x kernel and I am yet to run into any issues. The older kernel seemed to have the odd lock-up now and then. I have not experienced that in a day or two now. It has been a wrthwhile upgrade for me.

Moving Away From OS X, Switching Over Fully To Linux

Most of the reasons for the move has already been documented in a previous post, so I’ll skip the immediate compulsions that pushed me in this direction. Even while I writing that post, I was not very sure if it would all come together well in the end. After  much experimentation (and some really frustrating times) I’m glad to say that the transition is complete and I won’t be going back to an Apple laptop for a while.

The overall Linux on desktop experience is a marked improvement from the last time I had attempted it. This was during a time when I was only glad to tinker around endlessly and when it was more than OK for me to insert a module into the kernel to get the sound card to work. That time, though, is long gone and I prefer having systems with me that just stay out of the way. Which was why OS X and the Apple laptops were wonderful for me.

That said, I have recently been feeling that the premium you pay for getting that experience is a bit over the top with Apple. But replicating that experience on another platform (Windows does not cut it for me because I am simply way too used to having a *nix environment to work than due to any other reasons) has been more than a painful experience every time I have tried it.

In a lot of ways, the Linux on desktop story right now resembles a lot of what the Android story was like around the time of Froyo. And that comparison is meant cover only the technical aspects, you can safely ignore the market share part of the story. Even with this marked improvement, it will be a long long time before Linux becomes a serious player in the desktop/laptop market.

Coming back to the comparison, I find the quality of apps on Linux have improved significantly. They are still not as pretty or as consistent as OS X apps, but the story is a drastic improvement from the earlier times. Then there are the projects like elementaryOS, where the teams have made a concerted effort to make everything a lot more consistent and well thought out.

In the overall picture, none of that will matter. Most of the big companies that sell desktops and laptops are all primarily tied to Microsoft and the ecosystem around it. There have been efforts like Dell’s Developer Edition, but those are hardly mainline efforts and since we are living in an age where a platform is no longer simply about the hardware and the OS, without major muscle behind it, the Desktop Linux story will always be a minor one.

For me, the Linux story has so far been extremely positive so far. Save the exception of not being able to run iTunes without virtualization or emulation (one of the sad outcomes of the demise of Flipkart’s digital music business), there is nothing that I have been unable to do on Linux that I was able to do on OS X. The UI/UX aspect is no longer an issue with eOS, which, surprisingly feels a lot less OS X once you start using it a lot more.

There are some terrors that remind me of the good old days of Desktop Linux when everything was a lottery, but once you get a stable system in place the beast just keeps chugging on and stays out of your way and I do foresee a long and fruitful association for us this time around.

Do Not Upgrade Kernel While Using elemetaryOS On Acer Aspire V5-431

Edit: Figured out a way to run the 3.8.x series kernel here. I am running 3.8.0-31 at the moment, without any issues. This, though, is not recommended by the eOS team and should something go wrong, you will be on your own.

One of the best post-installation resources on elemetaryOS is the elementaryupdate.com site. They conclude their post on what more you can do to customize and update the OS after installing the current version (Luna), with a recommendation to upgrade the kernel to raring-lts. If you do this on the Acer Aspie V5-431, you will break your Broadcom BCM43228 (14e4:4359) driver as the bcmwl-kernel-source module will not build on the 3.8.0-29-generic kernel and many hours of frustration will follow.

In short, stick to the 3.2.x.x series kernels till the eOS team will suggest otherwise, as they do recommend sticking to the 3.2.x.x series in this post. There are good reasons to move to the latest kernel as a lot of things seem to work better — auto-dimming of the display for one — with the new kernel, but this kind of breakage is severe and it will be a good idea to stay away from any kernel upgrades that don’t get pushed through the software update process.

This is really one of the annoying things about using Linux on the desktop as you would expect something that worked out-of-the-box in an older version of the kernel to do the same in a much newer version. I fully understand the reasons why things work this way, but it is extremely poor user experience and even for someone like me, who is a bit better than the average user in figuring out these things, it is frustrating and a waste of time.

Revisiting Linux With elementaryOS, Acer Aspire V5

With the old Macbook getting on in age (it is an early 2008 model MacBook4,1) the move to find a replacement for it was always on the cards. The machine had served me well, travelling with me to different parts of India, including high-altitude passes in Himalayas. Of late, even after a complete reinstall, the machine has been showing its age and with persistent heating problems and lock-ups, the writing was quite clearly on the wall. I could get it repaired, which I eventually will, but the board only supports DDR2 and the memory is maxed out as it is at 4GB. The only other option is to upgrade to a SSD, fix the problems and hope for the best after that.

The primary candidate for the replacement was to go for the 13″ Macbook Air. After the millionth (failed) attempt to find a reasonably priced Linux laptop solution that just stayed out of the way, I was pretty sure that I’d have to stick to OS X and Apple, and have no choice but to gulp down the high premium that Apple charges for the fire-and-forget experience it is more than justifiably famous for. In the midst of all of this, I ran into this interesting so-called Linux laptop from Acer. It is called the Aspire V5-431 and I found a pretty decent price at Flipkart for it.

At this point, I must digress a bit about the non-Apple laptops. Dear god, some of them,  especially the Lenovo ultrabooks, are such a ‘slavish’ ripoff of the Apple laptop line up. I can imagine smartphones looking much like each other these days. There are not too many different ways in which you can design a phone, but that’s not the case with laptops and it is really shameful the extent to which the copying happens here. I guess none of these copies are much of  a threat to Apple in the market, so it is probably not worth suing the manufacturers for it, but it still is not a great thing to see. The V5-431 also suffers from a bit of this ‘inspiration’ problem, but it is hard to mistake it for an Apple unit.

The laptop comes pre-installed with Linpus Linux, which is instantly discarded by most users. But having a Linux laptop meant that I could have some degree of certainty that most of the bits and pieces would work well should I run some other Linux distro on it. It has been a while since I have used a Linux desktop as my main platform and it seems that while the underlying platform has changed a lot (and for the better), the user experience is still ghastly and inconsistent, featuring interfaces and UX that can only be created and loved by engineers.

That was when I came upon this project called elementaryOS. It is based on Ubuntu (current version is built on Precise: 12.04.2), but has an awful lot of work that has gone into making the front end user experience clean, consistent and fast. It is hard to miss the very obvious OS X inspiration in a lot of the visual elements, but once you start use it a bit more, the differences start to show up and it does that in a nice way. Linux on the desktop/laptop has been begging for something like this for years and I am really thrilled to see someone finally do it right. If you care to take apart the bits on top, you’ll find a familiar Ubuntu installation underneath, but, you really should not bother doing that.

I have gone through some three re-installs for the OS so far due to various reasons. One thing you need to watch out for, while sorting out eOS on the V5-431 is to stick to the 32-bit OS as things get quite a bit crazy should you attempt mixing 686 and X86_64 platforms while using virtualization. The eOS 32-bit kernel is PAE-enabled, so you can use more than 4GB RAM on the machine, but I would highly recommend sticking to 32-bit on everything (OS, Vritualbox, any guest OS) and you’ll not have a reason to complain. I discovered all of this the hard way as my primary requirement is to have a working Vagrant installation on the laptop and eventually had to go through redoing the base box in 32-bit (the original from the Macbook was 64-bit Centos 6.4) in the end.

The experience has been pleasant so far with the laptop. I have ordered more memory (8GB, to be precise) and even at 2GB the machine feels a lot faster and stabler than the ailing Macbook. I will hold off on getting a SSD at least for now as I feel the machine is quick enough for me at the moment and the extra memory will only make things much better. After many attempts at customizing the interface what I have realized is that it is best left alone. The developers have done a great job of selecting the defaults and 9/10 times the modifications you’ll make are not going to make it any better. The only thing you’ll need is to install the non-free TTF fonts, enable them in your browser’s font selection and get on with the rest of it.

Other than that, the main issue is of color calibration of the monitor. The default install has a blue-ish tint with the monitor and the blacks don’t render true on it, which was infuriating when you get that on a glossy screen. I finally fixed the problem by calibrating the display on a Windows installation and pulling out the ICC profile from it. I’ll share the link to the profile at the end of this post and if you have the same machine and are running Linux on it, use it. It makes a world of a difference. You will have to install Gnome Color Manager to view the profiles.

After all of that, the machine seems quite a good deal for me. It does not heat up too much, is extremely quiet and weighs a bit over 2-kilos. The 14″ screen is real estate I appreciate a lot, coming from the 13″ Macbook. The external display options are standard VGA and HDMI. My primary 22″ monitor has only DVI-D and DVI-Sub inputs, so I’m waiting for the delivery of a convertor cable to hook it up to that one. The battery is a not the best, though. Acer has cut some corners on that, but you can’t have everything at such a low price. Even with the memory upgrade, the machine will still cost me less than 1/3rd of what a new Macbook Air (the base model, that is) will do right now. I’m getting around 2.5 hours on real hard core usage, which is not bad at all.

The stack is otherwise quite stable. It reads something like below:

  • Google Chrome
  • LibreOffice
  • Virtualbox
  • Vagrant
  • Sublime Text 2
  • Skype
  • Dropbox
  • VLC
  • Darktable

I’m not exactly a power user and 90% of my work is done in a text editor, web browser and VLC, but the combination of eOS and the Aspire V5-431 is something that I can easily suggest to a lot of people looking to break away from regular Linux/Windows/OS X and that too at a good price. There is a new version of the laptop that is out with the next generation of the chip, but I have not seen any great benefits that you’ll get from that upgrade which will cost a bit more. You can spend that money on getting more RAM instead.

eOS is also a nice surprise and it is a pretty young project. With time it will only get better and eventually become quite distinct from an OS that looks similar to OS X.

 

Encryption Is Not The Answer

The strangest reaction about the entire privacy mess that is unraveling is the quest for even stronger encryption. If you ask me, that is trying to solve the problem at the wrong end. We can, in theory, go in for unbreakable-grade encryption and hope to keep everything away from prying eyes. That would have been fine if the problem we are dealing with was limited to having an expectation that your communications will be private by default.

The question we need to ask is if all this snooping actually delivers the results that we are looking for. And it is a particularly tricky one to answer as a certain amount of force, once applied, will produce at least minimum level of results. Any sort of enforcement, once deployed, will result in at least some impact on crime anywhere in the world. So, yes, you will catch at least a few bad guys by doing things to catch the bad guys.

Thus, things turn a bit more nuanced than the binary “will it” or “won’t it”. It also becomes a question of efficiency and effectiveness. And this is where the tenuous contract between the state and its subjects comes into play. Historically, this has always been a clearly understood tradeoff. In exchange for giving up absolute freedoms and an absolute right to privacy, the state provides you a stable and secure socio-economic environment.

You Are With Us, But The Data Says You Are Against Us

The efficiency and effectiveness of any system is not always determined by how wide a coverage can the system aim to accomplish. A good example of this is prohibition. That system worked by outlawing production of alcoholic beverages. The coverage was complete, yet, it was hardly foolproof and led to other major problems. In instances like these, the contract is greatly strained and other than the exceptions of war or episodes of tyrannical rule, it inevitably breaks.

The power of any state, especially democratic ones, is drawn heavily from allowing the majority of the population to feel the state looks after their best interests. This keeps the state and the subjects on the same lines of the divide, even when the state has always been more powerful than the individual. This works well only when the system assumes that the majority of the participants are good people, with reasonable margin for error.

The same tradeoff, in free societies, allows you to keep knives at home without suspected of being a killer, even as many (albeit a smaller number) have killed others using a knife. If one fine morning, the state starts treating anyone who has a knife as a potential killer, the system will eventually break down. A state’s power may be considerable, but it is still a power granted by the majority of its subjects. The moment a state makes almost all of its subjects suspects in crimes that may or may not happen, the contract breaks and it breaks for good.

If you concern yourself with the systems — the design or study of it — one that will stand out before long is that there is no perfect system or law. The best ones are the ones that aim to get it wrong the least number of times, with allowances for fair redressal than the ones that aim to get it right all the time and try to be absolute. In a healthy system, the subjects don’t have an expectation that the state will always be right and the state does not have an expectation that the subjects are always wrong. This is what keeps the tradeoff a viable option for both parties and like any good bargain, it requires both parties to behave within expected lines.

A healthy system is less likely to punish the innocent, even at the cost of letting more of the guilty escape punishment.

The breakdown aside, there is the question of efficiency. Systems that try to examine every interaction will always provide the initial rounds of success. Over time, though, the participants in any evolving system (consciously or sub-consciously) adapt to the examination and soon you have a system that tracks everything, yet it catches nothing as you have now given the majority of the population an incentive to be evasive (for the fear of wrongful prosecution). It is easier to find 50 bad apples in a batch of 200 than it is to find them in a batch of 200,000.

In one fell swoop, you have made every subject a potentially bad person, leaving the utterly distasteful task of proving the negative as the default. Even if you ignore the issue of false positives, such systems are impossible to sustain over longer periods of time as they get more and more expensive through time, while becoming less efficient.

Role Of Computing

Major developments in computing in the new millennium can be broken down into two things. First is the ability to capture vast amounts of data. Second is the ability to process them in parallel and find patterns in them. Collectively, we have come to call this “big data” inside and outside tech these days.

We have always had the ability to capture data. The concept of accounting itself is almost as old as the human civilization. Data has always been part of our lives; it is only the extent of the data that was captured that has grown over time. Given enough resources, you can capture pretty much everything, but data itself is worthless if you can’t process it. This is the reason why we never thought much of big data until now.

One of the greatest silent revolutions of computing in the past decade has been the shift from identification through the specific to identification through patterns. In the late 1990s, when the internet was taking its baby steps to becoming the giant it is today, the identification of you, as an individual, was dependent on what you chose to declare about yourself.

There were other subtle hints that were used, but most of anyone’s idea of who you were was dependent on what you chose to disclose. Over time, that changed to observing everything you would do and figuring out really who you were likely to be, based on the actions of a known group of people whose actions match your actions, even if what they have declared about themselves have nothing in common with what they system has decided they are about.

In daily life, you see this in action in contextual advertising and recommendation systems. In fact, almost the entire sub-industry of predictive analysis depends on making inferences such as these. This, aided by the vast amount of public data that we produce these days, has meant that profiling a person (provided there exists a vast amount of profiled known data) as of a particular type can now be done in seconds, compared to weeks or months earlier.

“If he looks like a traitor, walks like a traitor, and talks like a traitor, then he probably is a traitor”

The above line could easily fit how any overly suspicious state thinks of its subjects, but it is just an adaptation of the most famous example of inductive reasoning called the ‘Duck Test‘. The earlier concept of knives in societies will make a lot more of sense when seen in the light of this test and big data.

Even in earlier times, we could collect all information about every knife made and sold in a country, but mining useful intelligence out of it was a hard job and even harder was to get it done at a reasonable speed. After all, there was no point in finding out now that Person A, who bought a knife 6-months-ago, was likely to commit murder, which he in fact did 4-months-ago.

The advances in computing now enable us to predict who all are likely to buy a knife in the next four months and given the profile of activity of murderers in our records, we can also predict who, of the lot of knife buyers in the last three moths, all are likely to commit murder in the coming months, at what time of the day and which day of the week.

That has to be a good thing, right?

Not really.

How Wrong Does Wrong Have To Be To Be Really Wrong?

If you are smart, the truth that you quickly learn from dealing with large amounts of data is that it is an imperfect science. The science is good enough to build an advertising business that will wrongly recommend tampons to someone who is very much a male or wrongly suggest an ex-husband as a potential mate on a social networking site; but it is nowhere close to being good enough to identify potentially bad people, based on patterns and inferences.

If we go back to the earlier point about what constitutes a good system — something that gets it wrong least number of times, systems that are built on aggregating data (or metadata) are terrible ones. It is not that these systems don’t get it right; they do and probably even to the extent of 70-80% of the times, but they also get it terribly wrong the other 20% of the time. When you get an advertising or recommendation system wrong, it causes a bit of embarrassment and maybe much ire, but you get a surveillance system wrong and you wind up putting way too many innocent people behind bars and destroy their lives.

People who work with big data in advertising and other online operations will be the first ones to tell you that these systems need constant tweaking and that they’re always prone to known and unknown biases based on the sampling and collection. In working with big data sets, the first assumption you make is that you are probably seeing what you want to see as what you are collecting often has the bias of your desired outcome built into it.

The Sordid Tale Of Bad Outcomes Born Of Good Intentions

With all of these flaws, why is there this major attraction in intelligence, law & enforcement communities to wholly embrace these flawed technologies? The answer lies in how the nature of conflict has changed in the 21st century.

Once upon a time, wars were simple affairs. A strong army would, more often than not, decimate a weak one, take over the lands, wealth and people of the defeated and expand their kingdom. These used to be pretty isolated and straightforward affairs.

Modern warfare bears little resemblance to any of that. For one, absolute might has become of less relevance in these times. The fear of a lone bomber these days cause more invisible damage than an actual bomb that kills many. This asymmetry has brought about a substantial shift in placing an absolute importance on prevention than retaliation.

The good intention is prevention. The bad outcome is all the snooping and data collection.

Enforcement and intelligence, anywhere, loves preventive measures. The fine balancing act of imprisoning 20 innocents to catch two really guilty to save 20 million has always been a debate that rarely finds a conclusion that is agreeable to everyone.

What makes the outcome so dangerous is that such profiling is based on actions that are performed by the majority of the population who have absolutely nothing in common with a person looking to blow up something.

Problem is that drawing such inferences gives enforcement and intelligence a magical shortcut to identifying subsets of people who can be further investigated on the basis of their belonging to the same bucket of people. Given how the inferences are made, it is easy to be bucketed in the same group if you have the same usage profile on a handful of harmless websites as a known suspect has.

And given the fact that pretty much everyone would have done something that’s not entirely right at some point in their lives, this also opens up a vast avenue for abuse by an overactive arm of enforcement, purely based on suspicion than any actual fact.

More Encryption Is Not The Answer

Coming back to where we started from, the fact is that encrypting anything and everything does not keep you safe from any of this. In fact, using so much of encryption will probably identify you as someone suspicious from the outset and that suspicion can be used to procure permission that will either force you or organizations that are intermediaries (ISPs, web hosts, the list is endless) to cooperate.

Another reason why encryption fails is this: even on a fully encrypted loop, if the other party you are communicating with is susceptible to pressure, all that is required is for the other party to silently cooperate with whoever is investigating them. That requires no brute forcing or any other fancy tech. It just requires a single weak link the chain and, unfortunately, the chain has many weak links.

In conclusion, the problem at hand is not a quandary that is technical in nature. It is one that is about the relationship between the state and its subjects. In a rather strange twist of fate, this is exactly what the terrorists wanted the modern state to become — one that lives in fear and lets that fear oppress its subjects.

Once we reach that point it is a endless slide down the rabbit hole and I am afraid we won’t realize the extent of that slide before a lot more of damage is done.

 

On The Content Business

Fair disclosure: I have no idea why Jeff Bezos bought WaPo. You won’t find much about that in this post. This is going to be a rambling, ill-focused post.

Much of the discussion around the content business eventually comes around to the question of paywalls and subscriptions. I feel this is the wrong approach to trying to find a future for an industry that always has had a key role to play in the society. The business of content has not been supported by subscriptions for a long time and that has been the case even before the internet became as big as it is right now.

The scale that the bigger content businesses achieved during their glory days was not because the consumers of the content were paying a price that was close to what it took to produce that piece of content. The scale was there because of the advertising the content producers could bring in. The majority of the damage has happened on that front and trying to repair that damage by getting subscriptions to cover for it is bound to fail.

The business of content is really quite simple:

What do you publish?

How is it consumed?

Who gets to consume what is published.

The three factors together makes a publication a platform play for advertising. Yes, subscriptions are there, but they only make for a bit of nice loose change in the larger picture.

A hypothetical publication, if it attempts to explain its business of content may wind up looking like this:

What do you publish: A weekly magazine on automobiles.

How is it consumed: Print and internet.

Who gets to consume it: 20-45 year-old, 80% male, from the top three metros in the country.

Where internet has been destroying the old content business is in identifying the ‘who gets to consume it’ part of the business. You are not going to make up for the losses on that front by trying to fix the subscriptions part of the business. That horse bolted long long ago and the fact is that, as a large publication, you can’t hope to survive and revive based on how you can charge your subscribers more.

The key question is: how can you deliver a better audience for your advertisers, without compromising the quality of what you publish? There seems to be little effort being put into addressing that crucial question. Audiences these days, like good content, need to be curated and nurtured.

It won’t, though, be an easy thing to do as traditional advertising is used to picking quantity over quality and a historical lack of instrumentation in the industry has allowed them to get away with this. So, even the newer products and models are essentially reinventing the older flawed way of doing things and a way forward that is different seems to be nowhere in sight.

Javascript Corruption In Vagrant Shared Folders

If you are serving javascript files from your typical LAMP stack in Vagrant using shared folders, you will hit a problem where the JS files will be served truncated at arbitrary lengths. Curiously, this does not seem to affect other static text file types and it could be a combination of headers and caching that is responsible for this.

By the looks of it, the problem is not something that’s new.  This thread from the Virtualbox forums addresses the same issue and it goes back all the way to 2009. And the last post in the thread provides the right solution, which is to turn off sendfile in httpd config.

Curiously, EnableSendFile defaults to ‘off’ in the stock installation, but disabling it specifically gets rid of the problem. This should be fun to dig into and unravel, but I will leave that for another day.

Quick Tip On Shared Folders And Logging In Vagrant

Continuing with the recent posts on Vagrant, today, we’ll look at the tricky issue of shared folders and using them as locations to store logs.

My idea with using Vagrant was to keep all development-related files, including logs, on the host machine through shared folders, while the guest would only access these files through shared folders. This gives you the best of both worlds, as you can use your editor of choice on the host, while the files are executed on the guest. This works fine on a set-up that has only few shares and not more than port or two that are forwarded.

For a bit of background, this is how Vagrant goes through its start-up cycle.

First cycle is all network-related. It detects any forwarding conflicts, cleans up old forwarding settings and once the coast looks clear, it sets up all the forwards specified in the Vagrantfile.

Next cycle is the actual VM boot, where a headless instance of the VM is kicked into life.

Lastly, Vagrant loads all the shared folders.

The problem comes starts when the guest machine starts processing its init.d directives at the second cycle. The shared folders often take a good chunk of time to load, and depending on the level of panic triggered by the software started by init.d when it encounters missing files that are missing because, well, the shared folder that has them has not been shared yet, life may move on peacefully (with adequate warnings) or the software may just error out and die.

One such software is the Apache HTTPD daemon. It can start-up without issues if it can’t find the documents it has to serve, but it simply throws up its hands and quits if it can’t find the log files that it is supposed to write to. And a good developer always logs everything (as she/he should).

The solution, in the case of HTTPD, is to ensure that you log to a volume that is on the guest machine and not on the host. This does mean that you can’t tail the log file to see errors and requests stream by, from the host, but it is not a big problem compared to figuring out mysterious deaths of the HTTPD daemon, which starts-up fine after you do a ‘restart’ once the VM is fully up and running.

Port Forwarding Small Port Numbers With Vagrant On OS X

While working with a Vagrant set-up it is easy to forward ports with the forwarded_port directive.

This is accomplished by making entries in the format below in your Vagrantfile:

config.vm.network :forwarded_port, guest: _guest_port_number, host: _host_port_number

The catch here is that Vagrant won’t forward ports when it comes to small port numbers on the host machine. This means that you will have to access the service on a higher port number, which is a bit of a downer considering the fact that we are going through all of this pain to have a development environment that is nearly an exact clone of what we will find on production.

The solution is to use ipfw (the humble IP Firewall in Unix-based and Linux systems), to forward the low port to a higher port and then forward that higher port to the corresponding low port on the VM.

Let us assume that you want to forward both HTTP (Port 80) and HTTPS (Port 443) to the Vagrant VM.

First, use ipfw to forward the ports with the host:

sudo ipfw add 100 fwd 127.0.0.1,8080 tcp from any to me 80

sudo ipfw add 101 fwd 127.0.0.1,8443 tcp from any to me 443

Then forward the lower ports to higher ones in the Vagrant file.

#forward httpd
config.vm.network :forwarded_port, guest: 80, host: 8080


#forward https
config.vm.network :forwarded_port, guest: 443, host: 8443

I do realize that this is a bit of a loopy way to go about accomplishing this, but when you have to juggle port numbers in a complex deployment environment, the overheads of keeping in mind the difference (and the set-up/code changes that will handle it) and the propensity to make mistakes will only keep increasing through time.

As far as I know, you can do the same with iptables on Linux, if ipfw is not your poison of choice, but I have not tested it.