Surprisingly little time has been spent examining how one can rethink a storage-centric infrastructure model for this kind of disappearing data model. Thinking about system architecture isn’t just relevant to engineers; it has important implications for helping services like Snapchat save — and therefore make — money. (By the way, that amount would need to be about $500 million revenue and $200 million profit to justify its $3 billion valuation in November.)
It’s very simple: If the appeal of services like SnapChat is in the photos (“the fuel that social networks run on”), then the costs are in operating that photo sharing-and-serving service, as well as running any monetization — such as ads — built on top of that infrastructure.
But I’d even go so far to argue that making use of advanced infrastructure protocols could let Snapchat get away with paying almost no bandwidth costs for a large subset of media. How? Well, let’s begin by comparing Snapchat’s infrastructure to that of a more traditional social network: its erstwhile suitor, Facebook.
According to publicly available data, Facebook users upload 350 million images a day. Back when users were adding 220 million photos weekly in 2009, the company was serving upwards of 550,000 images per second at peak — and they did it by storing five copies of each image, downsampled to various levels, in a photo storing-and-serving infrastructure called Haystack. (For obvious reasons, the exact architecture of these systems is not known.)
That gives you a sense of the scope of the infrastructure. But the salient detail here is the total cost of all this serving-and-storage — including all-in per-byte cost of bandwidth — which I estimate to be more than $400 million a year. [If you want the details, here’s what went into my calculation, which also includes ancillary costs such as power, capital for servers, human maintenance, and redundancy. The most important variables in this cost calculation are:
- the number of images/videos uploaded each month (estimated at ~ 400M photos daily)
- the size of each image/video (estimated at 3MB)
- the average number of images/videos served each month (estimated at 9.5% of all images)
- all-in per-byte bandwidth/serving cost (estimated at $5*10-11)
- all-in per-byte storage cost (estimated at $5*10-11)
- exponential growth rate coefficient (r, estimated at ~ 0.076, using Pt = P0ert).
To compare Facebook’s costs to Snapchat’s, however, we also have to include these variables: the mean number of recipients of each Snapchat message (estimated conservatively at 2.5); and the fraction of total messages that are undelivered (estimated at 10 percent).]
Obviously, we are comparing a much larger service that has advertising — Facebook — to one that is smaller and doesn’t have any advertising (yet). But this doesn’t really matter, in principle. Because even though Facebook has to make sure its infrastructure can store and serve the data needed to sell ads, the reality is that much of the information that helpsadvertisers target users is the metadata of user interactions — with whom, where, how, and when (as well as what they ‘like’) — as opposed to the content of what those users are actually saying.
By storing and analyzing only the metadata, Snapchat could build similar profiles of its users, and sell ads that target users, as effectively as Facebook.
This means that despite their differences, storing and analyzing only the metadata would still allow Snapchat to build similar profiles of its users as Facebook. Snapchat could thus sell ads that target users just as Facebook does (assuming of course that their product can attract a consistent customer base) — and with one huge advantage: lower costs, since Snapchat doesn’t need to store or serve any messages after they’ve been delivered.
This kind of approach to user targeting, with its metadata-centric infrastructure and associated cost savings — is by no means unique to Snapchat. The public revelations about NSA’s surveillance operations point to a similar architecture; storing the entire content of all intercepted communication would be prohibitive in terms of cost and space, but not so for metadata. In fact, the way the metadata is ostensibly used to target individuals and groups NSA agents deem to be a threat is not dissimilar to how advertising targeting works. But that’s another discussion.
What makes Facebook’s — and any other traditional social network’s — photo-serving costs so expensive is having to keep data in a high-availability, low-latency, redundant, multi-master data store that can withstand temporary spikes in traffic load. But much of this expense is unnecessary for storing and processing metadata. Based on some additional assumptions (such as the number of recipients of each message), we can estimate that, even if its per-byte storage costs were 5x higher, Snapchat would only need to pay $35 million a year (under 9 percent of Facebook’s total estimated infrastructure costs) to handle a similar load — all while accruing a trove of data with similar targeting value.
It’s like getting a mile when you’re only giving an inch.
So how could Snapchat reduce their bandwidth and storage costs even further? The key, again, is in the seemingly mundane: infrastructure. There are a number of complicated optimizations that could make the system even cheaper to operate. For example, Snapchats between parties that are concurrently online could be delivered via peer-to-peer messaging (think Skype). Because these messages would never even flow over Snapchat’s network, it would reduce Snapchat’s delivery costs to nearly nothing.
It’s not just theoretical. Firewalls are an impediment, of course, but a number of solutions, including proxy servers in the edge of the network, or ICE (RFC 5245) could make the above doable relatively soon. Snapchat could even store encrypted, undelivered messages on other users’ phones, ensuring availability by using erasure coding with sufficient redundancy. This technique basically involves splitting media up into many overlapping pieces (only a few of which are needed to reconstitute the entire picture); giving the data to different users (encrypted so that no one other than the recipient would be able to glean any information from it); and assuming with high probability that enough users will be online at any time to reconstruct the data.
While it’s hard to guess what fraction of messages are exchanged between parties that are online, the impact of such infrastructure design would definitely be substantial.
The fact is, this new generation of messaging services can use cost-effective infrastructure to operate so much more cheaply than the Facebooks of the world and yet still effectively target ads to users. While it would seem that not storing content would be an obstacle to monetization, that design feature turns out to be an asset when working from metadata. The question that remains isn’t how they’ll monetize; it’s whether these services can make a compelling enough product to keep users coming back for more.
March 22, 2017: Instagram announces it has more than 1 million advertisers.
March 21, 2017: Pinterest expects to make $500 million in revenue this year.
Feb. 9, 2017: Twitter’s monthly active users grew to 319 million, up from 317 million in the prior quarter.
Feb. 1, 2017: Facebook’s Q4 2016 numbers show it has 1.86 billion monthly active users and 1.23 billion daily active users. That’s up from 1.84 billion and 1.21 the previous quarter.