A few years ago I was inspired by this artice: http://viniciusvacanti.com/2010/11/01/6-things-you-need-to-learn-to-build-your-own-prototype/. Now I am able to build my own Rails apps and run them in production. I am now wondering what it takes to launch and successfully run a website with 5-10K users? All the articles I have found doing a web search were vary basic.
[+] [-] karterk|11 years ago|reply
These days RAM is cheap and SSD storage is also widely available. For a very long time, one of my side projects with 50K users was hosted in a EC2 small instance. With that out of the way, here are a few things you will need to take care of:
* Security (especially passwords) - Rails should take care of most of this for you, but you should ensure that you patch vulnerabilities when they are discovered. Also, stuff like having only key-based login to your servers etc.
* Backups - Take regular backups of all user data. It's also VERY important that you actually try restoring the data as well, as it's quite possible that backups are not occurring properly.
* One click deployment - Use Capistrano or Fabric to automate your deployments.
* A good feedback/support system - this could even be email to begin with (depending on the volume you expect), but it should be accessible.
* Unit tests - as your app grows in complexity, you will never be able to test all the features manually. I'm not a big fan of test driven development, but really, start writing unit tests as soon as you have validated your product idea.
* Alerts, monitoring and handling downtime - Downtimes are inevitable. Your host or DNS could go down, you might run out of disk space, etc. Use something like Pingdom to alert you of such failures.
* Logging, logging, logging - I can't stress on this enough. When things break, logging is crucial in piecing together what happened. Use log rotation to archive old logs so they don't hog the disk space.
[+] [-] icehawk219|11 years ago|reply
The part about testing your backups is huge. I can't count how many projects I've been on that had problems where we needed to restore and we looked only to find any number of problems. Oh, backups actually stopped last month when we ran out of space, oops the backups only backed up these 3 db's and not the one you want, things like that. I'd also stress the importance of off-site backups. If you're using AWS for everything and your account is compromised can they delete your backups (assuming they have full, 100% unlimited, admin access to AWS)?
Which is also why if you're using stuff like AWS, Heroku, or any other third party provider (hosted Mongo, hosted ElasticSearch, Stripe, NewRelic, etc.) it's very important to ensure those passwords are secured and only the people absolutely necessary have access. Also, when offered, two-factor authentication should always be used.
[+] [-] JoshTriplett|11 years ago|reply
Depending on the service you're building, you can log too much. Consider the privacy and security implications of the existence of those logs; anything you log can be subpoenaed, but logs that don't exist cannot be.
Consider anonymizing your logs from day 1, and only turning on non-anonymous logging upon a report from a user. Alternatively, give users a "report a problem" button, and save their last N minutes of otherwise-ephemeral logs only when they hit that button.
You absolutely want to log enough to help you debug the service, but do you really need to archive old logs, or should you delete them entirely?
[+] [-] porker|11 years ago|reply
+1 You can't log too much. The user who claims an important email never arrived - does your system say it was sent? This bug 3 users have reported yet no one can reproduce - what were they doing at the time and what else was going on?
No, I'm not at that stage yet (of effectively being able to rewind application state in the log files to see what was going on), but for debugging issues in production it's exceedingly useful.
[+] [-] landr0id|11 years ago|reply
How do most people manage activity logs? Currently what we have set up is the user id (if the user is logged in), IP address, URL they hit, user agent, and timestamp are all inserted into an activity logs table. For one particular site with an API that's being polled the size of the DB grew pretty large.
[+] [-] ShinyCyril|11 years ago|reply
I often found myself falling into the "I'm not using PHP, so I don't have to worry about any security holes" trap. CSRF is something you really need to watch out for if you are constructing forms manually!
[+] [-] pbreit|11 years ago|reply
The technical stuff will be pretty much trivial. Any decently constructed app on any decent framework (Rails, etc) on any decent host (AWS, DO) would be able to handle a 10k user app (probably maxing out at 1% online at same time) without breaking a sweat.
And you will have plenty of time to build out the tech because it will probably take you many months to get to even a few thousand users (depending on what kind of app it is, of course).
[+] [-] sixpenrose16|11 years ago|reply
[+] [-] ybrs|11 years ago|reply
[+] [-] danmanstx|11 years ago|reply
[+] [-] ing33k|11 years ago|reply
[+] [-] bw00d|11 years ago|reply
[+] [-] Gurkenmaster|11 years ago|reply
Ironicially it's a completly different story on the consumer market. I bought my two 4GB sticks about 2 years ago and now they cost twice as much. 4GB can cost you $40 which is not cheap at all.
[+] [-] rwhitman|11 years ago|reply
10K user records is not the issue. It's dealing with the humans who use the app on a day to day basis.
Typically getting only a small fraction of your user base to be active in the app is pretty challenging - if you can acquire them in the first place.
That said, having even a few hundred active users can tip the scales in terms of what is manageable, depending on what the app does and whether they're paying money or not. Customer support can be a full-time job or worse. In the early days your users will discover every bug and problem imaginable.
Biggest mistake I ever made was scaling up an active user base on a free product without a revenue model. Twice I managed to hit a sweet spot in acquiring active users but because I couldn't leverage the scale to achieve anything other than more work for myself, I burned out and it collapsed very quickly. If you make more money as you grow, you can afford to invest in delegating responsibilities or at least justify it. Otherwise you've got a very stressful hobby on your hands..
Quick add-on edit:
If you're launching a web app for the first time, the biggest takeaway you should get from the comments on this thread is anticipate that customer support will be a major challenge.
One of the best ways to prevent a flood of CS inquiries is aggressive logging and alerts to squash bugs or outages before they inconvenience too many users. Lots of great comments in here cover that point, so take notes.
[+] [-] SchizoDuckie|11 years ago|reply
I have zero costs though, since it's serverless and all hosted on Google and Github's infrastructure.
For privacy reasons I have made the deliberate choice to not include any log reporting tools or use any tracking services that could possibly make development a whole lot easier.
Thankfully I get some support from a couple of other people that mostly help with small dev tweaks and supporting users, finding out what goes wrong and providing excellent feedback.
I can really recommend setting up a subreddit so that people can help themselves and others, and you can provide some feedback where needed.
All in all the load of user support has been quite easily manageable, though now with Trakt.TV's debacle with the new API there is definite pressure to fix things.
[+] [-] nmcfarl|11 years ago|reply
However I’ll add that it’s my experience that free customers are the worst, by far for customer support, and that as such rwhitman probably got burned disproportionately. A percentage of free customers demand straight up magic, and get loud in public if you don’t deliver. People who are paying, particularly paying significant amounts of money, expect their money to be well spent and results in proportion with what they pay. IE they tend to be reasonable.
Still - if you where not expecting support to be a major part of this experience - you should now.
[+] [-] jsherer|11 years ago|reply
My service has a lot of moving parts, all of which are distributed among a couple dozen different servers. Keeping the technical infrastructure running smoothly requires a lot of data visualization of server stats, database stats, web request stats, worker stats, user stats, etc. I have everything piped into a nice dashboard so we can see if there is anything odd happening at a glance. When things break (and they will) you need to know where to look first.
Having 5k users also requires time to help them with support issues. Users generate a lot of bug reports, questions, and suggestions. To keep paying users happy, I offer a 1-day response time on support issues, which requires me to spend quite a bit of time sending emails.
Then, of course, if you want to grow the app, you need to spend time marketing it. We could talk for hours about this.
The list goes on and on. Feel free to shoot me an email (email in my profile) if you want to talk specifics about anything.
[1]: https://minimalreader.com/
[+] [-] martin_|11 years ago|reply
[1] http://stackexchange.com/performance#
[+] [-] inchforward|11 years ago|reply
[+] [-] ShinyCyril|11 years ago|reply
[+] [-] syllogism|11 years ago|reply
[+] [-] bw00d|11 years ago|reply
[+] [-] soheil|11 years ago|reply
[+] [-] bw00d|11 years ago|reply
[+] [-] troels|11 years ago|reply
How's the distribution of traffic? Do people use it spread out over the month or mainly within the last or first days of the month? Do they use it on work days or throughout the week? Are they from different time zones?
What do they do? Is there a lot of write activity or is it mainly read? Is the read stuff cacheable between users or is it highly individualised. etc. etc. etc.
[+] [-] akurilin|11 years ago|reply
With reasonably "low level" tooling such as Java/Clojure/Haskell/whatever and a properly configured Postgres instance you should be able to go quite far. You're very unlikely to be CPU-challenged in the web app (again, no idea what your web app is going to be doing, so it's just a guess), most of the memory and CPU will be consumed by your database server caching and running queries. You should be able to handle a good 500-1000 db transactions / sec without much hassle.
IMHO most of the challenge will be making something that 10k people will want to use daily, not actually being able to scale to that many users.
[+] [-] filearts|11 years ago|reply
That server runs happily as a single servo on http://modulus.io with absolutely no need for intervention on my part.
The rest of the application has similar requirements. I have one micro-equivalent server running the front-end, one the api and one the thumbnail generation. In general, this requires no hand-holding by me.
If your site is not processing or memory-intensive it should be feasible to scale to 10k users with a single $5/month instance on DigitalOcean or an equivalent level server on Heroku or Modulus or GCE.
Good luck attracting your first 5k users!
1: https://github.com/filearts/plunker_run/)
[+] [-] thinkcomp|11 years ago|reply
The main stress on the system is really determined by the complexity of the SQL queries on each page. I've spent a great deal of time optimizing them, and I know there are certain ones that need to be further optimized. I have the database (MySQL) on one server, the web server and documents on another, and static resources such as images on a third, which probably isn't even necessary. All three servers run Linux and the database server has 48GB of RAM. They're hardly new; you could buy all of this equipment today for under $1,000 total.
The biggest technical bottleneck is really RAM; the biggest expense for this kind of site is bandwidth.
[+] [-] kemayo|11 years ago|reply
I have http://ficwad.com/ sitting around, with Google Analytics telling me it gets daily users in the upper end of that range. It runs on the cheapest plan webfaction offers (and I'm making it even cheaper with some affiliate credit...). The only place where it's running into issues is email, which I had to write a little queue system to throttle the sending to keep it under the plan's daily limits while still making sure that the important messages go out first.
I could make it fancier and put it on pricier hosting if I bothered to monetize it in any way.
[+] [-] phacops|11 years ago|reply
And it took me nearly 4 years to get that many users. We can’t all grow like facebook!
[+] [-] wahsd|11 years ago|reply
[+] [-] animex|11 years ago|reply
[+] [-] mrbig4545|11 years ago|reply
The second problem is motivation, after a certain amount of time, it becomes far less fun and much more of a burden, at which point you have to decide if you'll power through, give up, or quit totally.
The rest is just a software/hardware problem, and easily dealt with when needed.
As for the load, it's not that busy, but not that quiet for what it is, (http://stats.thisaintnews.com) and it runs on a cheap server from http://www.kimsufi.com/uk/, has a Xeon(R) CPU E3-1225 V2 @ 3.20GHz, 16Gb ram and 2x 1tb hdd, unlimited bandwidth and 1gbps link. It only costs about £25/month iirc.
[+] [-] david_shaw|11 years ago|reply
- A reliable hosting environment. I currently have a Linode VPS (basic $20 package, with $5 monthly backups) that runs http://sleepyti.me, my personal web site, an IRC server, a Mumble server, and a bunch of other stuff -- it's not even close to being maxed out resource wise, even with all the constant traffic the site is getting. It's important to remember that consistent network connectivity is a really important aspect here: a 30-minute downtime during peak hours can easily lose a lot of users. I'd say Linode is great, and I'm very happy with their service, but I also host several Sinatra web applications on a Digital Ocean VPS that only costs $5 per month (although I do my own backups, rather than using their service). I've noticed zero load-related performance impacts. Clearly, though, there is a limit to how far that can scale.
- A production web server. This probably goes without saying, but a lot of webapp developers are used to just working on their own dev environment. For my apps, I use nginx (and thin, when necessary).
- Security. Make sure that you have the basics of application security covered in your app itself. OWASP produces some pretty great "cheat sheets" that can help out in this area. Furthermore, make sure that your server is updated frequently, using SSL correctly, etc. I work in information security -- please believe me when I say that getting hacked is not something you want to deal with when you're trying to grow.
Hope this helps, and good luck with launched your apps!
[+] [-] bw00d|11 years ago|reply
[+] [-] jakejake|11 years ago|reply
Firstly, the load of a web app is going to be dictated by what the app actually does.
Also 5k-10k users should be clarified as to whether you mean total users or concurrent users. Testing capacity can be actually tricky figuring out how the number of users equates to actual hits to your servers.
As an example, we have nearly 50k accounts but on average only a few hundred are using the service at the exact same time. I would guess that our app is fairly complex compared to the average app. We run 3 app servers, 1 DB master, 1 DB slave, and 2 cache servers. Our monthly hosting bill is around $1,200.
[+] [-] zamolsys|11 years ago|reply
2. Do not invest time and/or money in learning another programming language or framework until you are sure that for a specific component of your product, programming language X will perform at least 2 times better with 2 times less HW resources.
3. Stressing again on the app stack (I saw some really pushy comments on changing the programming language), it is rarely the bottleneck of a web app. You'll scale your storage stack way earlier and more often than the app stack.
4. Know your data. That's how you decide if it's better to use a RDBMS, document store, k/v store, graph database etc. Like I said before, you're going to scale your data before any other layer becomes a problem so choosing the right data storage solution is crucial. Don't be afraid to test various storage solutions. They usually have good -> great documentation and ruby tends to be a good friend to every technology. There's a gem for everything. :)
5. Scale proportionate to your business/product growth. You will have to scale at some point. But be careful to scale proportionate to your growth. For example, if the number of users will double, get the hardware that suffices that growth. Less HW resources will lead to a slower user experience thus user dissatisfaction. More HW resources than needed will increase your costs and the resources that are not needed will stay unused. Why waste money?!
These are my 2c. As your business gets bigger - I hope it does - other problems will occur. But usually these things will last up to 100k users.
Disclaimer: this is for a generic web app as you didn't give us any details. Depending on the app, some of my points might be inaccurate or invalid.
[+] [-] kentf|11 years ago|reply
Heroku: $279 - 400 / month
[+] [-] bw00d|11 years ago|reply
[+] [-] patio11|11 years ago|reply
If you back out the numbers, they go something like this: eight hour work day, worst hour has 25% of the user base actively logged in (we'll assume it is a very sticky app), 10 significant actions per hour implies 25k or so HTTP requests which actually hit Rails, which is less than 8 requests a second. You can, trivially, serve that off of a VPS with ~2 GB of RAM and still have enough capacity to tolerate spikes/growth.
Let's talk about the more interesting aspects of this question, which aren't mostly about capacity planning:
Monitoring: Depending on what you're doing, at some point between 0 users and 10k users, the app failing for long periods of time starts to seriously ruin peoples' days. Principally, yours. Depending on what you're doing, "long periods" can be anything from "hours" in the general case to "tens of minutes" for reasonably mission-critical B2B SaaS used in an office to "seconds" for something which could e.g. disable a customer's website if it is down (e.g. malfunctioning analytics software).
I run a business where 15 seconds of downtime means a suite of automated and semi-automated systems go into red alert mode and my phone starts blowing up. I don't do this because I love getting woken up at 4 AM in the morning, but because I hate checking my inbox at 9:30 AM in the morning and realizing that I've severely inconvenienced several hundred people.
You're going to want to build/borrow/buy sufficient reliability for whatever problem domain it is you're addressing. I wouldn't advise doing anything which requires Google-level ops skills for your first rodeo. (There is a lot to be said for making one's first business something like a WordPress plugin or ebook or whatnot where your site being down doesn't inconvenience existing customers. That way, unexpected technical issues or a SSL certificate expiring or hosting problems or what have you only cost you a fraction of a day's sales. Early on that is likely negligible. When an outage can both cost you new sales/signups and also be an emergency for 100% of your existing customer base, you have to seriously up your game with regards to reliability.)
Customer support:
Again, depending on exactly what you're doing, you will fail well in advance of your server failing on the road from 0 to 10k users. Immature apps tend to have worse support burdens than mature apps, for all the obvious reasons, and us geeks often make choices which pessimize for the ease of doing customer support.
My first business produced a tolerable rate of support requests, particularly as I got better about eliminating the things which were causing them, but I eventually burned out on it. I have a pretty good idea of what my second one would look like if it had 10k customers -- that would imply on the order of 500 tickets a day, 100+ of them requiring 20 minutes or more of remediation time. This would not be sustainable as a solo founder. (Then again, if that business had 10k customers, revenues would presumably be in the tens of millions, so I'd have some options at that point. There are many businesses which would not be able to support a dedicated CS team on only 10k customers, like e.g. many apps businesses, so you'd have to spend substantial brainsweat on making sure the per-customer support burden matched your unit economics.)
The biggest issue: selling 10,000 accounts of a SaaS app is really freaking hard.
[+] [-] pjbrunet|11 years ago|reply
[+] [-] darkarmani|11 years ago|reply
What do you mean by this? Doesn't this depend on the usage patterns? Do you have 5-10k concurrent users or are these users spread over a day?
The operations side is a whole other profession that dovetails into the technical aspects of getting it running on a good architecture.
[+] [-] arihant|11 years ago|reply
As others have mentioned, multiple of those users can be hosted on an EC2 small instance. I suggest you start there. When moving to production, a bigger challenge is security, both in terms of intrusion and data protection. Making sure you have good rollback feature built into your rollout regime, because things can be fatal with real users. If you're using something that's basic like Heroku or EC2, you can scale way beyond that user strength with a click of a button. Scaling up would be least of your worries, at least for a few weeks.
If you're unsure, go with Heroku. Once you understand your system use, you can very easily switch to AWS and reduce costs.