Launch HN: Lumona (YC W24) – Product search based on Reddit and YouTube reviews
148 points| philena | 1 year ago
Rather than going through SEO-filled Google results or adding site:reddit.com to your search, we explain what makes a good product, show you the best products, and back it up with Reddit and YouTube reviews about the product. We’re starting with skincare products (more on that below) and plan to expand from there.
Here’s a demo: https://www.youtube.com/watch?v=C4kKjW2YkZ4&lc=Ugzl94GP9SDBO...
We started off with skincare because, growing up, we struggled with acne but had no clue what skincare products could actually help us. Going down the rabbit hole of endlessly scrolling r/SkincareAddiction and watching countless hours of videos about cystic acne was not fun.
Lumona’s skincare search index was built by first scraping the internet for listings of skincare products, along with their ingredient lists, through a combination of SERP, Amazon’s API, and web page crawling. We then use a fine-tuned Mistral LLM to parse through a large number of Reddit threads and YouTube transcripts to extract opinions made by users, along with the context in which the opinions were made. These opinions are then matched with any relevant products through another fine-tuned LLM that looks at an opinion and any products that have a high cosine similarity as that of the opinion’s subject and decides whether that opinion is relevant to any of those products. Using a Mistral-7B FT trained on GPT-4 outputs allowed us to parse through hundreds of thousands of Reddit threads in a simple way with just hundreds of dollars of compute.
If your query relates to a specific situation (e.g. “cleansers for my son who has inflamed acne on his forehead”), we search semantically through the opinions of Redditors and YouTubers to retrieve the products recommended by those who have dealt with a similar situation. If your query relates to a specific product (e.g. “iunik centella gel”), we instead go through the product listings themselves to return you the relevant products.
We also use an LLM to analyze your search query to tell you what ingredients or effects are preferable for your skin concern.For example, if you searched for “inflamed forehead acne”, properties like “Oil-Control” and “Azelaic Acid” which are good for dealing with inflamed acne would be explained to you, and results containing those properties would be boosted and tagged in our results. You can also try out searches like “korean cleansers under $20 with Cica” to filter for certain ingredients and price points.
While we think we’ve built a product search that would be pretty helpful for our teenage (and current!) selves, there are many improvements we’d like to make, such as getting opinions from Tiktok and other social media platforms and making our opinion extraction process more robust for edge cases (e.g. by using OCR, video transcription tools). We’re also planning on allowing our users to upload their own reviews and content and to expand our search across more products.
The long-term potential is to be a go-to product for anyone looking for what other people think about anything subjective (products, restaurants, b2b products, vacation planning, etc.). We believe that the entire discovery experience can be revolutionized by making it as easy as searching on Google to find out what the people you care about think about something. On the individual level, we want to make sharing your opinions with your friends and the world as easy as posting a picture on Instagram.
For now, if you have any skincare needs, whether it be to solve a skin concern, get rid of an annoying pimple, or just to find a good sunscreen, please give us a try: https://lumona.ai (We are an Amazon and Stylevana affiliate.)
We’d love to hear your feedback on our search engine, whether that be how the skincare search performs, what you think is missing, what products you want to see there, or any technical suggestions!
huevosabio|1 year ago
My only concern is that once Reddit reviews get used at scale for product discovery, we will see an inflow of fake and paid reviews in the comments. This will further pollute Reddit and probably drive discussions to forums closed from the public eye, e.g. Discord.
Obviously, this is not your fault at all, it's just the market dynamics at hand.
Anyway, let me try it!
A_D_E_P_T|1 year ago
This is already happening.
A lot of product-related posts on Reddit are made by marketing agencies, PR firms, SEO consultants, etc. There's also a thriving secondary market for "high karma" Reddit accounts, which are bought and sold with ease. Unlike old-fashioned forums, which were difficult for outsiders to crack, Reddit is easy to game and basically it's already the most astroturfed place on the internet. Making it the basis of a product search system can only make it worse.
dawalker|1 year ago
edmundsauto|1 year ago
dns_snek|1 year ago
You're ingesting highly biased, sponsored, astroturfed content. What measures have you taken to filter Youtube reviews down to the ones that haven't been sponsored, and likewise for Reddit? Otherwise it's just garbage in, garbage out but wrapped in fancy, legitimate-looking packaging.
tootie|1 year ago
_giorgio_|1 year ago
[deleted]
sovnwnt|1 year ago
Problem is that some of the best skincare is not available over the counter, and surfacing prescription treatments dips into medical care, which is a whole other can of worms.
In the end, you are missing valuable treatments but presenting a summary of poorly researched (by Reddit users) or anecdotal information.
I love the concept though and would love to see it catch on!
qiongzhouh|1 year ago
Personally, my pediatrician told me that acne is just something that happens to teens and recommended that I go try some acne washes from the drugstore instead of prescribing something like Tretinoin which could have some pretty intense side effects.
Reading r/SkincareAddiction has been really helpful for me, especially seeing the range of experiences that people have had, and that's why we made Lumona summarize these results.
QAComet|1 year ago
During my journey using the app there were a few things I noticed
1) It seems like the intermediate page is generating text from the LLM as well, which makes the whole process quite slow on my machine. It took maybe 10 seconds before the loader finished displaying the text. If I try and perform the same query again on the same browser, the results are somewhat quicker, maybe 700-800ms of wait time, but this still seems too slow. Once I ran the query five or so times, it was as quick as the demo queries on the front page.
2) Consistent results: If I use the same query on separate browsers, I'm given different products as the "Top Recommended Product", which seems odd. I know LLMs are stochastic, but the feed starting with the "Top Recommended Product" probably shouldn't have stochasticity. This problem opens up some interesting ML cans of worms, but I believe these issues could be overcome.
3) Another issue was if I wanted to scroll in the left column while the right column was still loading, the scrolling was very janky. This was an issue on firefox, but it took quite a long time for the app to be functional (> 10s)
4) Perhaps you could move the search bar and the logo to the top, so the logo is on the top left corner and the search bar takes space to the right of it. This way there aren't overlapping elements, I'm sure there's some annoying edge cases there which would frustrate users
5) For negative ingredients (and maybe any of the ingredients) it would be nice if you kept track of an ingredient database with references. I want to know why some ingredient is bad for my skin, and what I could expect.
6) If a product has many distributors, my first through was the arrow scrolling through products was a slider for the distributor list. I wonder if there's a nice way to differentiate the arrow further, so its functionality is more apparent.
Anyway, this is an excellent proof of concept, I'm excited to see how this product develops.
qiongzhouh|1 year ago
As for the performance issues, we're looking into several things that could speed things up - Fine-tuning a small LLM for the results on the intermediate page and deploying on a provider with higher throughput and time to first token - Admittedly, there there are quite a few SQL query / index optimizations we need to make on the backend, along with making parts of our pipeline async - The frontend itself is also not very performant right now, but we're working on it.
We cache previous calls to the API, so that's why the demo queries or queries others have tried before you are faster. I'll ship a change that makes the results more consistent but not fully consistent later today.
As for the ingredients, citing sources is definitely a next step. In the meantime, I recommend looking up the ingredients that catch your eye on a place like EWG Skin Deep if it's a huge concern for you (I used to do this to make sure my ingredients weren't comedogenic for acne).
Great point about the distributor list UI, we'll think about a better way to show it!
ilrwbwrkhv|1 year ago
philena|1 year ago
gumptionary|1 year ago
For example: I recently turned to reddit because I was looking for a foam roller to resolve some IT band issues from running, and ended up finding a stretching routine that has fixed my problem without buying anything.
Either way, I think this is really cool and bypassing the nonsense that google is becoming is a winning path.
zztop44|1 year ago
philena|1 year ago
we definitely want to expand this to outside product search and be more of a general recommendation/opinion search (e.g. in your case, finding out what people are saying about how to fix band issues from running), interested as to what you would think about this :)
1oooqooq|1 year ago
besides your anecdote, all the games reviews, computer part reviews, etc are all paid by drop shippers. and some reddits like mattress review ones are exclusively shills talking among them.
criddell|1 year ago
Maybe once you get larger you can pivot to being a paid service like Consumer Reports. To me, they still feel more trustworthy than other services similar to yours (like Wirecutter).
frankdenbow|1 year ago
The concept of a search that is multi layered is something I see The Browser Company and others doing to make your one search a bit more impactful, so kudos for going in that direction as well. I would do restaurants and search availability as well.
More thoughts: https://www.youtube.com/watch?v=xKFDuZsdXrc
dawalker|1 year ago
The idea of restaurants, like you mentioned, would be really great to have. It's not an immediate priority, but once we get Tiktok/short form videos on the site and integrate it well, it'd be really exciting to make and use.
ji_zai|1 year ago
Great idea. These sort of clever approaches are needed to be able to build these sort of products that benefit from scale. When the cost of inference goes down, it enables new experiences. And clever ways to reduce cost before the big providers do, is a massive competitive advantage that makes it tough for those who wait to compete with you.
Anyone building AI products should take note.
qiongzhouh|1 year ago
hypercube33|1 year ago
qiongzhouh|1 year ago
We'll have to dig deeper into not to filter out spammy reviews. I can imagine analyzing a user's post history or detecting if content was clearly GPT written, but it's hard to really tell. I know there things like Amazon review analyzers out there, but we'll have to learn more about this. I wonder if the people of HN have any suggestions on this front.
There'll probably be a lot AI generated reels that look like they're from real people online soon too. I wonder what platforms like Tiktok and YouTube will do about this. If this ends up being a huge , we can probably try to use ML methods to check if the video was filmed in the real world
hubraumhugo|1 year ago
Couple of challenges:
- Astroturfing is everywhere
- The data sources, especially social media, become more protective with their data
- Monetizing this is super hard. As an aggregator, you're always just the intermediate. The glory times of ads and affiliate marketing are over.
Vetted.ai is working on something similar and they raised $14M in 2022. For all consumers, I really hope one of you will succeed!
dawalker|1 year ago
nextworddev|1 year ago
shaoner|1 year ago
qiongzhouh|1 year ago
Ishan-002|1 year ago
I'm not sure if this type of problem is even a considerable one, but how does the search engine handle reviews from subreddits which are focussed only on a particular brand, and may potentially form a bias around such products? Does the LLM's awareness of each review's context handle that?
qiongzhouh|1 year ago
We'll have to think about good ways to handle this. Curious about your or others thoughts on these subreddits, how do you process content on these subreddits differently?
barbazoo|1 year ago
qiongzhouh|1 year ago
avsavani|1 year ago
philena|1 year ago
kristopolous|1 year ago
I feel like something like movies or video games would be a great way to validate the approach since there's generally agreed upon sentiments regarding these products.
Skin care I'd imagine is fairly complicated. Goal, lifestyle, budget, habit and individual based needs and preferences can lead to different sentiments. How do you calculate say, your loss function?
qiongzhouh|1 year ago
With less of this information, the ground truth would probably related to how popular the product is or the average sentiment of people reviewing the product. With this information though, you can compare each one to see which best fits the user's needs. Having compared enough products, you'll eventually figure out which one is the best.
calin_balea|1 year ago
johnfn|1 year ago
dawalker|1 year ago
compootr|1 year ago
qiongzhouh|1 year ago
and: #1 skin: #23 acne: #55 because #97 actually #263
NotYourLawyer|1 year ago
unknown|1 year ago
[deleted]
ravroid|1 year ago
dawalker|1 year ago
CSMastermind|1 year ago
https://imgur.com/a/cvT1iF8
dawalker|1 year ago
mkchoi212|1 year ago
qiongzhouh|1 year ago
p10_user|1 year ago
kiranp|1 year ago
shiredude95|1 year ago
dawalker|1 year ago
If on the other hand the top end of the spectrum is corrupted, then hopefully the masses can compensate for that. If both are corrupted and all of the data sources available are, then it really comes down to our ability to filter out LLM or promoted content which comes down to how well they can hide it. AI detection tools have been scaling alongside models, so it's also a question if that will continue over time. We'll think of some more advanced things if that becomes a bigger issue for us :)
At the end of the day, if a company can do a coordinated advertising campaign across the internet over months to block out any negative opinion, it's a big deal for both us and the social media/data sources we pull from that's going to be a challenge we have to deal with.
epoch_100|1 year ago
qiongzhouh|1 year ago
Seems to do a good job for various types of research, will give it a try next time I'm curious about something and need it researched
pj_mukh|1 year ago
dawalker|1 year ago
potatoman22|1 year ago
qiongzhouh|1 year ago
(I'm not a lawyer, so this needs to be taken with a large grain of salt)
0: https://openai.com/policies/terms-of-use
dns_snek|1 year ago
kaiomagalhaes|1 year ago
philena|1 year ago
zwaps|1 year ago
Thats probably where we are right now: I have seen quite a few purpose built and tuned AI systems for one specific use case or topic which work really well. By contrast, I have yet to see any general AI bot that does this with arbitrary data for any reasonable definition of good.
I mean, take any of these Chat-with-data bots, load up a huge document and ask it for information that is spread on many pages (like make a list of prices for every product in a catalogue). Then see it fail.
Exciting times.
qiongzhouh|1 year ago
digitcatphd|1 year ago
moneywoes|1 year ago
qiongzhouh|1 year ago
franze|1 year ago
qiongzhouh|1 year ago
theGnuMe|1 year ago
ada1981|1 year ago
No Alcohol or caffeine Lots of water Vegan diet Using baking soda Adequate sleep and time in nature
qiongzhouh|1 year ago
Perhaps we should be surfacing opinions like this beyond just products.
Linda231|1 year ago
[deleted]
unknown|1 year ago
[deleted]