erikwitt's comments

erikwitt | 9 years ago | on: The AWS and MongoDB Infrastructure of Parse

That is absolutely right. You can easily write queries that can never be executed efficiently even with great indexing. Especially in MongoDB if you think about what people can do with the $where operator.

What would in retrospect be your preferred approach to prevent users from executing inefficient queries?

We are currently investigating whether deep reinforcement learning is a good approach for detecting slow queries and making them more efficient by trying different combinations of indices.

erikwitt | 9 years ago | on: The AWS and MongoDB Infrastructure of Parse

I agree, the parse shutdown was organized extremely well. The open source parse server, one year of migration time and a ton of new vendors that now offer to host your parse app, all made it much easier to handle the shutdown. It's also great to see the community still working on the open source server.

That said, there are a lot of upsides to having a company work full-time on your proprietary cloud solution and ensure its quality and availability. If an open source project dies or becomes poorly maintained you are in trouble too. Your team might not have the capacity to maintain this complex project on top of their actual tasks.

Also open sourcing your platform is a big risk for a company. Take RethinkDB for example: Great database, outstanding team but without a working business model and most recently without a team working full time, it is doomed to die eventually.

Nevertheless, we try to make migrating from and to Baqend as smooth as possible. You can import and export all your data and schemas, your custom business logic is written in Node.js and can be executed everywhere. You can also download a community server edition (single server setup) to host it by yourself.

Still a lot of users even require proprietary solutions and the maintenance and support that comes with it. And often they have good reasons, from requiring a maintenance free platform to to warranties or license issues. After all, a lot of people are happy to lock into AWS even though solutions based on OpenStack, Eucalyptus etc. are available.

erikwitt | 9 years ago | on: The AWS and MongoDB Infrastructure of Parse

Although MongoDB has its limits regarding consistency, there are things that we do differently from parse to ensure consistency:

- The first thing is that we do not read from slaves. Replicas are only used for fault tolerance as it's the default in MongoDB. This means you always get the newest object version from the server.

- Our default update operation compares object versions and rejects writes if the object was updated concurrently. This ensures consistency for single object read-modify-write use cases. There is also an operation called "optimisticSave" the retries your updates until no concurrent modification comes in the way. This approach is called optimistic concurrency control. With forced updates, however, you can override whatever version is in the database, in this case, the last writer wins.

- We also expose MongoDBs partial update operators to our clients (https://docs.mongodb.com/manual/reference/operator/update/). With this, one can increase counters, push items into arrays, add elements into sets and let MongoDB handle concurrent updates. With these operations, we do not have to rely on optimistic retries.

- The last and most powerful tool we are currently working on is a mechanism for full ACID transactions on top of MongoDB. I've been working on this at Baqend for the last two years and also wrote my master thesis on it. It works roughly like this:

   1. The client starts the transaction, reads objects from the server (or even from the cache using our Bloom filter strategy) and buffers all writes locally.

   2. On transaction commit all read version and updated objects are sent to the server to be validated.

   3. The server validates the transaction and ensures the isolation using optimistic concurrency control. In essence, if there were concurrent updates, the transaction is aborted.

   4. Once the transaction is successfully validated, updates are persisted in MongoDB.
There is a lot more in the details to ensure isolation, recovery as well as scalability and also to make it work with our caching infrastructure. The implementation is currently in our testing stage. If you are interested in the technical details, this is my master thesis: https://vsis-www.informatik.uni-hamburg.de/getDoc.php/thesis...

erikwitt | 9 years ago | on: The AWS and MongoDB Infrastructure of Parse

That's actually exactly what parse did. They used a slow query log to automatically create up to 5 indexes per collection. Unfortunately this did not work that well especially for larger apps.

I guess 5 indexes might be a little short for some apps. On the other hand too many or too large indexes can get a bottleneck too. In essence, you want to be quite careful when choosing indexes for large applications.

Also some queries tend to get complicated and choosing the best indexes to speed up these queries can be extremely difficult especially if you want your algorithms to choose it automatically.

erikwitt | 9 years ago | on: Building a Shop with Sub-Second Page Loads: Lessons Learned

You're right, NoSQL systems tend to be more complex and especially failure scenarios are hard to comprehend. In most cases, however, this is due to being a distributed datastore where tradeoffs, administration and failure scenarios are simply much more complex. I think some NoSQL systems do an outstanding job to hide nasty details from their users.

If you compare using a distributed database to building sharding yourself for say your MySQL backed architecture, NoSQL will most certainly be the better choice.

I'll admit though dealing with NoSQL when you come from a SQL background isn't easy. Even finding the database that fit your needs is tough. We have blog post dedicated to this challenge: https://medium.baqend.com/nosql-databases-a-survey-and-decis...

erikwitt | 9 years ago | on: Building a Shop with Sub-Second Page Loads: Lessons Learned

Considering scalability from the start does not just mean optimizing for millions of concurrent users, but choosing your software stack or your platform with scalability in mind. I get that it's important to take the next step and that premature optimization can stand in your way but there are easy to use technologies (like NoSQL, caching) und (cloud-) platforms with low overhead that let you scale with your customers and work wether you're big or small. This can be far supirior to fixing throughput and performance iteration to iteration.

erikwitt | 9 years ago | on: Building a Shop with Sub-Second Page Loads: Lessons Learned

You have a point there. I found jMeter, however, really easy to use. I could simply let it monitor my browser (via a proxy) while i clicked through the website and the checkout process to record the requests of an average user. Then I configured the checkout process to only be executed in 20% of the cases to simulate the conversion rate. Even executing the test distributed over 20 servers wasn't that hard.

Which tools would you use to generate this amount of traffic?

page 1