top | item 39867757

Launch HN: Lumona (YC W24) – Product search based on Reddit and YouTube reviews

148 points| philena | 1 year ago

Hey HN! We are Lumona (https://lumona.ai), a product search engine that recommends products based on what people on social media—Reddit and YouTube, for now—are saying about them.

Rather than going through SEO-filled Google results or adding site:reddit.com to your search, we explain what makes a good product, show you the best products, and back it up with Reddit and YouTube reviews about the product. We’re starting with skincare products (more on that below) and plan to expand from there.

Here’s a demo: https://www.youtube.com/watch?v=C4kKjW2YkZ4&lc=Ugzl94GP9SDBO...

We started off with skincare because, growing up, we struggled with acne but had no clue what skincare products could actually help us. Going down the rabbit hole of endlessly scrolling r/SkincareAddiction and watching countless hours of videos about cystic acne was not fun.

Lumona’s skincare search index was built by first scraping the internet for listings of skincare products, along with their ingredient lists, through a combination of SERP, Amazon’s API, and web page crawling. We then use a fine-tuned Mistral LLM to parse through a large number of Reddit threads and YouTube transcripts to extract opinions made by users, along with the context in which the opinions were made. These opinions are then matched with any relevant products through another fine-tuned LLM that looks at an opinion and any products that have a high cosine similarity as that of the opinion’s subject and decides whether that opinion is relevant to any of those products. Using a Mistral-7B FT trained on GPT-4 outputs allowed us to parse through hundreds of thousands of Reddit threads in a simple way with just hundreds of dollars of compute.

If your query relates to a specific situation (e.g. “cleansers for my son who has inflamed acne on his forehead”), we search semantically through the opinions of Redditors and YouTubers to retrieve the products recommended by those who have dealt with a similar situation. If your query relates to a specific product (e.g. “iunik centella gel”), we instead go through the product listings themselves to return you the relevant products.

We also use an LLM to analyze your search query to tell you what ingredients or effects are preferable for your skin concern.For example, if you searched for “inflamed forehead acne”, properties like “Oil-Control” and “Azelaic Acid” which are good for dealing with inflamed acne would be explained to you, and results containing those properties would be boosted and tagged in our results. You can also try out searches like “korean cleansers under $20 with Cica” to filter for certain ingredients and price points.

While we think we’ve built a product search that would be pretty helpful for our teenage (and current!) selves, there are many improvements we’d like to make, such as getting opinions from Tiktok and other social media platforms and making our opinion extraction process more robust for edge cases (e.g. by using OCR, video transcription tools). We’re also planning on allowing our users to upload their own reviews and content and to expand our search across more products.

The long-term potential is to be a go-to product for anyone looking for what other people think about anything subjective (products, restaurants, b2b products, vacation planning, etc.). We believe that the entire discovery experience can be revolutionized by making it as easy as searching on Google to find out what the people you care about think about something. On the individual level, we want to make sharing your opinions with your friends and the world as easy as posting a picture on Instagram.

For now, if you have any skincare needs, whether it be to solve a skin concern, get rid of an annoying pimple, or just to find a good sunscreen, please give us a try: https://lumona.ai (We are an Amazon and Stylevana affiliate.)

We’d love to hear your feedback on our search engine, whether that be how the skincare search performs, what you think is missing, what products you want to see there, or any technical suggestions!

128 comments

order

huevosabio|1 year ago

This is so cool. I already do this in a very ad-hoc way. Will definitely try it!

My only concern is that once Reddit reviews get used at scale for product discovery, we will see an inflow of fake and paid reviews in the comments. This will further pollute Reddit and probably drive discussions to forums closed from the public eye, e.g. Discord.

Obviously, this is not your fault at all, it's just the market dynamics at hand.

Anyway, let me try it!

A_D_E_P_T|1 year ago

> My only concern is that once Reddit reviews get used at scale for product discovery, we will see an inflow of fake and paid reviews in the comments

This is already happening.

A lot of product-related posts on Reddit are made by marketing agencies, PR firms, SEO consultants, etc. There's also a thriving secondary market for "high karma" Reddit accounts, which are bought and sold with ease. Unlike old-fashioned forums, which were difficult for outsiders to crack, Reddit is easy to game and basically it's already the most astroturfed place on the internet. Making it the basis of a product search system can only make it worse.

dawalker|1 year ago

Thanks, we really appreciate it! This is something we've been thinking about too. One of the things that we've noticed is that video reviews have a lot more effect on us than almost all text reviews, which are harder to fake (for now). We're thinking that letting people upload their own video reviews will help solve this problem as long as we can detect video deepfakes, but that's definitely not a complete solution (like you said though, not sure anything is).

edmundsauto|1 year ago

A centralized curator could actually help by drawing a "schill" graph and excluding those signals.

dns_snek|1 year ago

Cool idea, but I don't see how it can ever possibly work with the amount of astroturfing and frequency/ubiquity of undeclared paid advertising.

You're ingesting highly biased, sponsored, astroturfed content. What measures have you taken to filter Youtube reviews down to the ones that haven't been sponsored, and likewise for Reddit? Otherwise it's just garbage in, garbage out but wrapped in fancy, legitimate-looking packaging.

tootie|1 year ago

It's the eternal September problem. Reddit was a great source of honest reviews until everyone figured it out and started to take advantage. Services trying to capitalize on it may be well-intentioned but it's only going to accelerate the enshittification.

sovnwnt|1 year ago

Strange that you chose as acne your demo topic but none of your results mention one of, if not the most, powerful treatments that is Tretinoin/Retinol and which comes up in the first search results on Google.

Problem is that some of the best skincare is not available over the counter, and surfacing prescription treatments dips into medical care, which is a whole other can of worms.

In the end, you are missing valuable treatments but presenting a summary of poorly researched (by Reddit users) or anecdotal information.

I love the concept though and would love to see it catch on!

qiongzhouh|1 year ago

You're right that there are some very effective prescription treatments that aren't shown, but it doesn't seem like prescription acne treatments are the usually the appropropriate / doctor prescribed choice for most people facing mild to moderate acne.

Personally, my pediatrician told me that acne is just something that happens to teens and recommended that I go try some acne washes from the drugstore instead of prescribing something like Tretinoin which could have some pretty intense side effects.

Reading r/SkincareAddiction has been really helpful for me, especially seeing the range of experiences that people have had, and that's why we made Lumona summarize these results.

QAComet|1 year ago

This is a neat product, and I plan on trying out some of the recommendations for sunscreen.

During my journey using the app there were a few things I noticed

1) It seems like the intermediate page is generating text from the LLM as well, which makes the whole process quite slow on my machine. It took maybe 10 seconds before the loader finished displaying the text. If I try and perform the same query again on the same browser, the results are somewhat quicker, maybe 700-800ms of wait time, but this still seems too slow. Once I ran the query five or so times, it was as quick as the demo queries on the front page.

2) Consistent results: If I use the same query on separate browsers, I'm given different products as the "Top Recommended Product", which seems odd. I know LLMs are stochastic, but the feed starting with the "Top Recommended Product" probably shouldn't have stochasticity. This problem opens up some interesting ML cans of worms, but I believe these issues could be overcome.

3) Another issue was if I wanted to scroll in the left column while the right column was still loading, the scrolling was very janky. This was an issue on firefox, but it took quite a long time for the app to be functional (> 10s)

4) Perhaps you could move the search bar and the logo to the top, so the logo is on the top left corner and the search bar takes space to the right of it. This way there aren't overlapping elements, I'm sure there's some annoying edge cases there which would frustrate users

5) For negative ingredients (and maybe any of the ingredients) it would be nice if you kept track of an ingredient database with references. I want to know why some ingredient is bad for my skin, and what I could expect.

6) If a product has many distributors, my first through was the arrow scrolling through products was a slider for the distributor list. I wonder if there's a nice way to differentiate the arrow further, so its functionality is more apparent.

Anyway, this is an excellent proof of concept, I'm excited to see how this product develops.

qiongzhouh|1 year ago

Thanks for trying it out!

As for the performance issues, we're looking into several things that could speed things up - Fine-tuning a small LLM for the results on the intermediate page and deploying on a provider with higher throughput and time to first token - Admittedly, there there are quite a few SQL query / index optimizations we need to make on the backend, along with making parts of our pipeline async - The frontend itself is also not very performant right now, but we're working on it.

We cache previous calls to the API, so that's why the demo queries or queries others have tried before you are faster. I'll ship a change that makes the results more consistent but not fully consistent later today.

As for the ingredients, citing sources is definitely a next step. In the meantime, I recommend looking up the ingredients that catch your eye on a place like EWG Skin Deep if it's a huge concern for you (I used to do this to make sure my ingredients weren't comedogenic for acne).

Great point about the distributor list UI, we'll think about a better way to show it!

ilrwbwrkhv|1 year ago

This is great. You can then start seeding products which give you a high cut and then proclaim them as the "best". Basically what Wired and all do now but without the whole article bit and you can claim "knowledge of the public".

philena|1 year ago

that's an interesting idea -- we have been seeing this play out successfully as well (like you mentioned, Wired + sponsored youtube videos + etc). though that would be useful for profitability, we're afraid that may compromise our reputability as being the knowledge of the public. instead we're looking for ways where, when we expand to more opinions and reviews, we can robustly filter out those that seem disingenuous / are sponsored. curious as to what you think about this "filtering" out + if you have any ideas of going about this :)

gumptionary|1 year ago

I understand why you went for a product search engine (gotta monetize) but I think one of the reasons mining reddit for intel is so helpful is you aren't always being sold a product.

For example: I recently turned to reddit because I was looking for a foam roller to resolve some IT band issues from running, and ended up finding a stretching routine that has fixed my problem without buying anything.

Either way, I think this is really cool and bypassing the nonsense that google is becoming is a winning path.

zztop44|1 year ago

I know this is wildly off topic, but can you please share any info about your stretching routine? I have persistent IT band issues from running…

philena|1 year ago

thank you! i completely agree -- i often go to reddit when looking for tv show recommendations because of its honest advice from the community (maybe it's because of its anonymity?)

we definitely want to expand this to outside product search and be more of a general recommendation/opinion search (e.g. in your case, finding out what people are saying about how to fix band issues from running), interested as to what you would think about this :)

1oooqooq|1 year ago

you'd be surprised but reddit is mostly shills.

besides your anecdote, all the games reviews, computer part reviews, etc are all paid by drop shippers. and some reddits like mattress review ones are exclusively shills talking among them.

criddell|1 year ago

It’s a neat idea, but I think using the affiliate model will ultimately be corrupting.

Maybe once you get larger you can pivot to being a paid service like Consumer Reports. To me, they still feel more trustworthy than other services similar to yours (like Wirecutter).

frankdenbow|1 year ago

Nice work! I kind of do this with google and reddit already sometimes, as a well written explanation for why someone likes a particular item plus the upvotes do help me make decisions.The format looks pretty good, woudl just like to have a view of all the products at once in a comparison if possible.

The concept of a search that is multi layered is something I see The Browser Company and others doing to make your one search a bit more impactful, so kudos for going in that direction as well. I would do restaurants and search availability as well.

More thoughts: https://www.youtube.com/watch?v=xKFDuZsdXrc

dawalker|1 year ago

Thanks! We just watched your review together in the living room, and we really appreciate your thoughts+detailed feedback. The list of items is an interesting idea that we'll think about how to fit into the ux. Comparisons is definitely something we want to add later down the line as well.

The idea of restaurants, like you mentioned, would be really great to have. It's not an immediate priority, but once we get Tiktok/short form videos on the site and integrate it well, it'd be really exciting to make and use.

ji_zai|1 year ago

> Using a Mistral-7B FT trained on GPT-4 outputs allowed us to parse through hundreds of thousands of Reddit threads in a simple way with just hundreds of dollars of compute.

Great idea. These sort of clever approaches are needed to be able to build these sort of products that benefit from scale. When the cost of inference goes down, it enables new experiences. And clever ways to reduce cost before the big providers do, is a massive competitive advantage that makes it tough for those who wait to compete with you.

Anyone building AI products should take note.

qiongzhouh|1 year ago

The missing part of the story is when we made an early prototype using GPT-4, leaving it on overnight, and realizing that we've spent several thousand dollars of OpenAI credits...

hypercube33|1 year ago

How do you deal with bot posts to push products on either platform skewing the reviews?

qiongzhouh|1 year ago

For now, we're excluding Reddit posts that are clearly automated and making sure the YouTube content is not sponsored, which you are required to disclose by the YouTube ToS.

We'll have to dig deeper into not to filter out spammy reviews. I can imagine analyzing a user's post history or detecting if content was clearly GPT written, but it's hard to really tell. I know there things like Amazon review analyzers out there, but we'll have to learn more about this. I wonder if the people of HN have any suggestions on this front.

There'll probably be a lot AI generated reels that look like they're from real people online soon too. I wonder what platforms like Tiktok and YouTube will do about this. If this ends up being a huge , we can probably try to use ML methods to check if the video was filmed in the real world

hubraumhugo|1 year ago

We've tried to build this in the past with Looria.com, where we aggreagted and summarized reviews from the most trusted sources, e.g. Reddit: https://www.looria.com/reddit

Couple of challenges:

- Astroturfing is everywhere

- The data sources, especially social media, become more protective with their data

- Monetizing this is super hard. As an aggregator, you're always just the intermediate. The glory times of ads and affiliate marketing are over.

Vetted.ai is working on something similar and they raised $14M in 2022. For all consumers, I really hope one of you will succeed!

dawalker|1 year ago

Thanks! Super interesting how many different approaches there are to this problem. Definitely encountered these challenges, and we think there's solutions to them eventually that we have to build towards. I'll drop a message sometime, would love to chat :)

nextworddev|1 year ago

Curious - what is your bearish case for profitability of affiliate marketing?

shaoner|1 year ago

Love the idea, my only concern is how to trust that at some point you're not going to include sponsored products?

qiongzhouh|1 year ago

As a user of our product, I'd really hate it if we were recommending crappy products. I suspect users will also feel the same and this thought will hold us accountable.

Ishan-002|1 year ago

This is so cool! The way the search engine has been built up also seems very smart. I'm honestly surprised too at the same time, that this kind of idea hasn't been worked on before (couldn't find anything similar; I could be very wrong)

I'm not sure if this type of problem is even a considerable one, but how does the search engine handle reviews from subreddits which are focussed only on a particular brand, and may potentially form a bias around such products? Does the LLM's awareness of each review's context handle that?

qiongzhouh|1 year ago

That's a really good point. I think in our current iteration of the system, if we applied it to subreddits focused specifically on a particular brand, it would not be able to account for the bias there, even if it knows which subreddit the content is from. That's probably too much to ask of the LLM.

We'll have to think about good ways to handle this. Curious about your or others thoughts on these subreddits, how do you process content on these subreddits differently?

avsavani|1 year ago

Results doesn't finish loading for me, I will try again in few hours, I am really curious to see how it compare to generalized search engines like Perplexity and You.com

philena|1 year ago

sorry about the loading issue -- we'll look into that right now!

kristopolous|1 year ago

Skincare is an interesting beachhead. How do you test if your results are good? What's the baseline?

I feel like something like movies or video games would be a great way to validate the approach since there's generally agreed upon sentiments regarding these products.

Skin care I'd imagine is fairly complicated. Goal, lifestyle, budget, habit and individual based needs and preferences can lead to different sentiments. How do you calculate say, your loss function?

qiongzhouh|1 year ago

I think that's what makes skincare interesting. We want our system to be able to understand your goal, lifestyle, budget, ... and pick out which product is the best for you, given what others who have used the product before said.

With less of this information, the ground truth would probably related to how popular the product is or the average sentiment of people reviewing the product. With this information though, you can compare each one to see which best fits the user's needs. Having compared enough products, you'll eventually figure out which one is the best.

calin_balea|1 year ago

Interesting idea! Well done. The interaction design on the site is a bit weird IMO. I can swipe up or sideways through results. I’m not sure what’s the difference in information architecture. Also there’s no indication of how many results there are. You could display the results as a stack of cards and show a counter for the number of results. I’m happy to help you with the UI design if you’d like some help.

johnfn|1 year ago

How does this differ from https://www.looria.com/?

dawalker|1 year ago

A couple of ways from my understanding. We have different focuses in our UX and UI as we, for example, feature reviews directly next to the product and show products 1 at a time instead of a listing view. We also place more emphasis on having a semantic search where you learn about the products being offered and how they're relevant to your specific situation instead of a keyword based search. From a business standpoint, we're also affiliates of Amazon and Stylevana while Looria isn't.

compootr|1 year ago

If I had to guess, I'd say the top words on Reddit would be "actually" or "because" and probably 69

qiongzhouh|1 year ago

r/SkincareAddiction Out of all posts comments from 2023:

and: #1 skin: #23 acne: #55 because #97 actually #263

NotYourLawyer|1 year ago

Reddit and YouTube are so astroturfed that I have trouble believing there’s much signal in the marketing noise.

ravroid|1 year ago

Cool concept. Not relevant to me in its current state being limited to skin care products, but would love to use something like this for things like supplements or other products where I otherwise have to sift through Amazon reviews & reddit threads.

dawalker|1 year ago

Thanks and makes sense. Supplements+general health and beauty will probably be one of the first things that get added outside of skincare. Would be interested in seeing the reviews as well for those considering how supplements are sold+regulated.

CSMastermind|1 year ago

I sat on the before we begin page for a long time waiting for something to happen before I realized nothing would:

https://imgur.com/a/cvT1iF8

dawalker|1 year ago

Sorry that happened :( What were you searching for? We'll look into it.

mkchoi212|1 year ago

Are you paying for Reddit's API or did y'all find a way around it?

qiongzhouh|1 year ago

We are not paying for Reddit's API to get our data, there are some really good and complete and publically available dumps of Reddit data available online. We are in contact with the folks at Reddit, which is of course a YC company, so they're aware of what we're doing.

p10_user|1 year ago

they're either paying or it was a gift from sama

shiredude95|1 year ago

how does this service deal with a coordinated advertising campaign -- most likely also driven by LLM's over a period of say X months. Moderators on subs can be bought out or marginalized, while youtube reviews can also be bought out. In other words, how is an aggregated source a better and more trustworthy source of information than a single blogger who people can ascribe some amount of trustworthiness to over a period of time.

dawalker|1 year ago

Great question. This would be a bigger issue if we were only aggregating results and summarizing them, but because we both aggregate and show (in our opinion) the highest credibility reviews from YouTubers (and other sources like blogs once we add them), our idea is that while the general mass opinion can be shifted through campaigns like that, the top end of the spectrum should hopefully still remain pure.

If on the other hand the top end of the spectrum is corrupted, then hopefully the masses can compensate for that. If both are corrupted and all of the data sources available are, then it really comes down to our ability to filter out LLM or promoted content which comes down to how well they can hide it. AI detection tools have been scaling alongside models, so it's also a question if that will continue over time. We'll think of some more advanced things if that becomes a bigger issue for us :)

At the end of the day, if a company can do a coordinated advertising campaign across the internet over months to block out any negative opinion, it's a big deal for both us and the social media/data sources we pull from that's going to be a challenge we have to deal with.

epoch_100|1 year ago

Very cool! It reminds me of https://chord.pub/.

qiongzhouh|1 year ago

Wow, thanks for sharing this. I find it interesting that they chose to make it something that I have to wait 1-2 minutes for before I get my AI generated article.

Seems to do a good job for various types of research, will give it a try next time I'm curious about something and need it researched

pj_mukh|1 year ago

Yo, can I take a picture (of my skin) and you can suggest some solutions? Multi modal plz!

dawalker|1 year ago

For sure! We'll work on that in the next couple of updates, it's been on our minds for a while.

potatoman22|1 year ago

Doesn't fine tuning models on GPT-4 output violate OpenAI's terms of service?

qiongzhouh|1 year ago

OpenAI says in their terms that you can't "use output from the Services to develop models that compete with OpenAI" [0], and it seems that people are interpreting it as training a model that directly competes with them, which we aren't doing. There are many companies out there built on using GPT-4 outputs to do task-specific fine-tuning, so it doesn't seem like it's a problem unless we were trying to make competing foundational model from GPT outputs.

(I'm not a lawyer, so this needs to be taken with a large grain of salt)

0: https://openai.com/policies/terms-of-use

dns_snek|1 year ago

It seems like everyone is doing it. Does anyone care? Should anyone care?

kaiomagalhaes|1 year ago

this idea is awesome, I hope you get into software products

philena|1 year ago

thank you! we've been wanting this as well while building this out haha, will do :)

zwaps|1 year ago

What I find interesting here is how far a well working QA application with LLMs (such as this one) is away from anything that can be generalized to other topics.

Thats probably where we are right now: I have seen quite a few purpose built and tuned AI systems for one specific use case or topic which work really well. By contrast, I have yet to see any general AI bot that does this with arbitrary data for any reasonable definition of good.

I mean, take any of these Chat-with-data bots, load up a huge document and ask it for information that is spread on many pages (like make a list of prices for every product in a catalogue). Then see it fail.

Exciting times.

qiongzhouh|1 year ago

Definitely feel this way too. Sometimes I think to myself that it'd be really great to have an LLM give me a well researched report on say like, recent trends on undisclosed marketing online, wished that we supported that on Lumona, but realized that we'll have to do it eventually, but pretty tough with the current infrastructure

digitcatphd|1 year ago

Really cool use of open source models

moneywoes|1 year ago

how do you index reddit cost effectively without breaking their tos

qiongzhouh|1 year ago

We're working with dumps of Reddit data, which means we don't have to use their API or do any scraping on Reddit itself for now. The data is updated monthly though, so we'll have figure out how to get higher quality data for things that are more time sensitive. We're in contact with the folks at Reddit, so we'll try to see if there are ways to get better data later on.

franze|1 year ago

searched for best yoga mat, got strange video about sunscreen...

qiongzhouh|1 year ago

Sorry about that, we didn't make it clear initially that Lumona only has skincare products for now, we'll be working to scale it beyond these products soon, but that message about skincare was probably not clear enough from our post

theGnuMe|1 year ago

Putting dermatologists out of business… lol.

ada1981|1 year ago

Just noting that best skin care routine is:

No Alcohol or caffeine Lots of water Vegan diet Using baking soda Adequate sleep and time in nature

qiongzhouh|1 year ago

True, I think I would agree with most of this sentiment. Unfortunately I'm not doing many of these things and am still using my skincare products.

Perhaps we should be surfacing opinions like this beyond just products.