Launch HN: Depict.ai (YC S20) – Product recommendations for any e-commerce store
Today, most recommender systems are based on a class of methods commonly called ‘collaborative filtering’ - which means that they generate recommendations based on a user's past behavior. This method is successfully used by Amazon and Netflix (see the https://en.wikipedia.org/wiki/Netflix_Prize). They are also very unsuccessfully used by smaller companies that lack the critical mass of historical behavioral data required to use those models effectively. This generally results in the cold start problem (https://en.wikipedia.org/wiki/Cold_start_(recommender_system...) and a worse customer experience. We solve this by not focusing on understanding the customer but instead focus on understanding the product.
The way we do this is with machine learning techniques that create vector representations of products based on the products’ images and descriptions, and recommend matching using these vector representations. More specifically, we have found a way to scrape the web and then train massive neural networks on e-commerce products. This makes it possible to leverage large amounts of product metadata to make truly impressive recommendations for any e-commerce store.
One analogy we like is that just as almost no single company has enough sales or behavioral data to consistently predict, for instance, credit card frauds on their own, almost no e-commerce company has enough data to generate good recommendations based only on their own information. Stripe can make excellent fraud detection models by pooling transactions from many smaller companies, and we can do the same thing for personalizing e-commerce stores by pooling product metadata.
Through A/B-tests we have proved that we can increase top-line revenue with 4-6% for almost any e-commerce store. To prove our value we offer the tests and setup 100% for free. We make money by taking a cut of the revenue uplift we generate in the A/B-tests. We have also found that the sales and decision cycle gets much shorter by being independent of customer's user data. You can see us live at Staples Nordics and kitchentime.com, among others.
Oliver and I have several years of experience applying recommender systems within e-commerce and education respectively and felt uneasy about a winner-takes-it-all development where the largest companies could use their data supremacy to out-personalize any smaller company. Our goal is to build a company that can offer the best personalization to any e-commerce store, not just the ones with enough data.
Do you think our approach seems interesting, crazy, lazy or somewhere in the middle? We’d love any feedback - please feel free to shoot us comments below or DM, we’ll be here to answer your thoughts and gather feedback!
[+] [-] riddlemethat|5 years ago|reply
The product recommendation angle for eCommerce is a better angle but only works well for big companies where you have enough data at the onset to drive better recommendations. With smaller companies and lesser known products you must make probabilistic determinations based on image analysis and context structure that will be mostly guess work until you have real data. Such as you surmised with your A/B testing.
Anyway, it seems you already got some major clients under your belt and have proven a track record. Hope you are able to succeed in your quest to make better recommendations work for small business with the data fabric you created.
Happy to chat through my experiences if you have interest. hn (at) strapr (dot) com is my email.
[+] [-] antonoo|5 years ago|reply
[+] [-] serendipityrecs|5 years ago|reply
Couple questions
- How well do your recommendations hold up against Amazon's? Since you're scraping the metadata, you should be able to generate recs for Amazon items from their own catalogue. This might be an interesting product / demo for potential customers.
- Once you hook up your system to your customer's back end, how do you learn from the behavioral data you get from them? That's straightforward for cf/mf, but can be tricky to integrate into what you already have. - You talk about Stripe pooling the data from their customers. I think the analogue for you would be pooling the behavioral data from your customers as opposed to the metadata. Have you thought about this?
- It sounds like you're doing nearest neighbors on the vector representation. You may already know this, but LSH is a fast way to do this when you have many items.
- Do you embed all items from your different customers into the same vector space? That would be ideal from the POV of creating a pooled dataset that would be helpful for all future customers, but sounds tricky given that everyone likely has their own idiosyncratic system.
Best of luck! Lmk if you'd like to talk shop sometime, I also have several years of experience with recommender systems (my email is in my profile).
[+] [-] antonoo|5 years ago|reply
On the first question regarding comparing with Amazon: That’s a great point. We can actually personalize those kinds of demos for each specific customer, since the marginal cost of scraping yet another store is pretty low given the infrastructure we have put in place. See an example here: https://demo.depict.ai/madstyleshop
[+] [-] chudaka_pi|5 years ago|reply
[+] [-] sanj|5 years ago|reply
What was interesting is that the naive algorithm got better over time and the incremental benefit of our new code got smaller.
Why?
Because the training data for the naive algo included user behavior from the new one. As we created better recommendations, users clicked on them and that fed into the old algo!
Coming to your product: what is to prevent a customer from using it for a few weeks, copying down the results, and then using those recommendations forever?
They’ll get most of the benefit for very small cost.
[+] [-] antonoo|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] jonas_b|5 years ago|reply
[+] [-] antonoo|5 years ago|reply
[+] [-] an_opabinia|5 years ago|reply
If I'm an online store with 100 products, couldn't I just punch the products into Amazon on a fresh account, then copy the search results? 100 products would maybe take me 20 minutes to do a day, but if you're saying there's a 4-6% lift, seems like it's worth it?
If it was 1,000 products, maybe I do this once a week for 200 minutes? Etc. etc.
Here's what'll happen: Your online store won't have most of the products on Amazon's recommended list. Isn't that the problem?
So no matter what, don't I eventually have to scale to Amazon size to get the value out of collaborative filtering?
Maybe no small business has that real supply chain. They are just front-running other stuff. But hey, that's their prerogative - to try to be Amazon without doing the stuff that actually makes Amazon successful.
> Netflix Prize
They don't even use those methods anymore. And that competition was much more about how to do IT and ensemble methods than any one particular approach, since that's how you get to #1.
Netflix Prize is sort of the opposite narrative of what you're actually doing. If you're seeking something that normal people recognize, just stick to talking about Amazon.
> Do you think our approach seems interesting, crazy, lazy or somewhere in the middle?
At least the premise doesn't square away.
Considering the data gathering, it seems easier to do user-product collaborative filtering.
Considering the math, it seems easier to do user-product collaborative filtering. You can bootstrap weights data for a e.g. non-negative matrix factorization collaborative filtering from existing recommendations.
Is there going to be something important encoded in the image or metadata you can relate to other things? It seems easiest to just use the keywords. Like you don't need a picture of guacamole to know it goes with tortilla chips, it's in the keywords.
Then again, the whole point is to find serendipitous stuff from your existing user data. If you only offer 100 products, none of them will serendipitously be shopping carts together because that's so few products. It's already curated to such a degree collaborative filtering will not find anything you don't already know.
[+] [-] shoguning|5 years ago|reply
This assumes that the source and target listings definitely have accurate keywords.
There would be plenty of value in a model that can use either keywords OR images to effectively make recs.
> Considering the data gathering, it seems easier to do user-product collaborative filtering.
What user data are you talking about here? The customer doesn't necessarily have user data.
I personally think this approach makes a lot of sense. If it works, a one-size-fits all recommender would be really useful and easy to sell.
[+] [-] mlthoughts2018|5 years ago|reply
ML teams at hosting platforms like Wix or Shopify or Squarespace could offer the same as a built in or slight higher tiered premium feature, paying a tiny fixed cost instead of a share of revenue uplift.
This could even be basically an intern or a new grad project at tech companies like that, the technology for the model is very simple. The devil would be in the details of integrating with the data model backing those platforms ecommerce shop products, but you could solve once and then immediately offer it for all your customers and out of the box for new customers.
The part of your idea that makes me skeptical is the scalability of applying your recommendation approach to bespoke customers. Like, I’m sure you can do it, but with nowhere near the same reach or efficiency or price point as well capitalized major store hosting platforms.
[+] [-] antonoo|5 years ago|reply
It is both about integrating with the website as well as integrating with the catalog of existing products, and this is right now easier with us than any other provider since we have built tooling for making this very efficient right now.
[+] [-] sheeshkebab|5 years ago|reply
[+] [-] antonoo|5 years ago|reply
[+] [-] zkid18|5 years ago|reply
Correct me if I wrong, but afaiu, you have designed a black-box content-based recommender system for e-commerce domain by scrapping publicly available data. I love your business model, though I have a couple questions:
1. A/B testing in RecSys is a tricky process in terms of further interpretation. How do you choose the control and test group? I would love to go beyond revenue percentage influx while considering the new model. Btw, do you have your own A/B testing environment?
2. Are you targeting one specific problem, like cold start or checkout recommendation or have a general solution?
3. Are you planning to open-source your model?
4. Do you have any Wordpress/Shopify plugins?
Anyway, I really like your idea and would love to contribute.
Let's stay in touch via twitter: @kidrulit.
[+] [-] antonoo|5 years ago|reply
1. In order for e-stores to trust the results, we tend to use Google Optimize for the A/B-tests (which randomly assigns 50% of users to see the stores' previous recommendations)
2. Currently our main focus are recommendations on the product page - but we do aim to also add our recommendations in other spaces, where the checkout & landing pages are some clear alternatives! Mostly a bandwidth question at this point.
3. We might open-source parts of it sometime in the future, thought it's not something we're currently considering actively.
4. Not yet! We integrate by injecting a JS widget on their site, through which we also track the user behaviour.
Sounds great - let's follow up on Twitter
[+] [-] KaoruAoiShiho|5 years ago|reply
[+] [-] antonoo|5 years ago|reply
[+] [-] brecs|5 years ago|reply
[+] [-] antonoo|5 years ago|reply
[+] [-] bartkappenburg|5 years ago|reply
How does your scraping hold up against the already pretty effective More Like This query in ES? That one is backed by years of research and gives very good results.
[0] https://www.conversify.com
[+] [-] shoguning|5 years ago|reply
The search algorithm, as you point out, is pretty much commoditized at this point.
[+] [-] thegginthesky|5 years ago|reply
Using content based recommendation is interesting but requires a constant scraping for more data. Plus the whole cost of curating the dataset and guaranteeing data quality can be extra challenging. How are you getting around these problems?
Also, your approach with A/B Test is interesting, but how would you do it for smaller shops? Wouldn't it take too long to give appropriate results? Or are you using a Bayesian Test Methodology?
[+] [-] antonoo|5 years ago|reply
Thanks! Yes, in order for shops to trust the A/B-test results we tend to use Google Optimize, which use Bayesian inference.
[+] [-] notdang|5 years ago|reply
[+] [-] Boxxed|5 years ago|reply
[+] [-] antonoo|5 years ago|reply
[+] [-] ssharp|5 years ago|reply
I've seen lots of recommendation algorithms fail against curated recommendations. It would be really interesting to see where / what this approach beats.
Would also be really curious to how stores have reacted to the revenue model. Is that a one-time fee based on the A/B test results or are you capturing a cut of the uplift in perpetuity?
[+] [-] antonoo|5 years ago|reply
[+] [-] Winterflow3r|5 years ago|reply
[+] [-] antonoo|5 years ago|reply
[+] [-] swyx|5 years ago|reply
one nit - "we lift revenue by 4-6%" doesn't feel like a very impressive number (it may be within the bounds of normal noise for a smaller ecom site?). that said, im very much not an ecomm guy. is this a bigger deal than it initially reads?
i also feel like recommender systems work much better for netflix (infinite consumption) than for ecommerce (where if i already bought a shoe i normally dont want another). perhaps this tech is better applied to media than to ecomm?
[+] [-] Exuma|5 years ago|reply
[+] [-] serendipityrecs|5 years ago|reply
[+] [-] tariqueshams|5 years ago|reply
[+] [-] antonoo|5 years ago|reply
[+] [-] Finbarr|5 years ago|reply