top | item 31867708

Ask HN: If you could talk to a database in plain English, would you?

7 points| brochington | 3 years ago

For the past 6 months I’ve been working on a Natural Language Understanding (NLU) API. Essentially the request would contain a plain english sentence, and the response would include a breakdown of actions, entities, agents, location, temporal, logic, etc. My hope was that I could create a “Stripe/Twilio for NLU”, but recent feedback has been that it’s more a “technology”, and less a “product”. It would still require a lot of development work to create anything of value for an end user. While I see the value of an API, I also agree with their sentiment, and so I’ve begun exploring problems to apply my API to.

One use case that tends to pop up frequently is “text-to-database”. Similar to text-to-SQL, but with my API I could target any DB regardless of query language. This would require a large amount of work, and I’m not convinced that it’s something that users even want. The strongest feedback I’ve received has been that it would be a convenient method for managers and non-technical to query analytics databases.

Is this a path worth exploring? Are there industries or positions that would kill to be able to query a db with a plain english sentence? Is this something that you would use, or want to implement?

26 comments

order

AnimalMuppet|3 years ago

No. Absolutely not. I want to say precisely what I mean and have the database do precisely what I say, no more, no less.

But maybe you're asking the wrong question in your headline. If you could have other people in your organization able to talk to a database in plain English, would you?

This isn't something that most of the HN crowd would want for their own work. There might be a lot of people here who have, say, that upper-level manager who keeps asking for reports for which the HN person has to figure out how to get the data. Handing that manager a tool like this, and letting them run their own queries could get them out of our hair. (It could also be better for the manager, as they run the query, look at the results, and figure out that it wasn't actually the data they were looking for, and so they can iterate the query to get what they're really after.)

One caveat, though: I wouldn't want to hand anyone - even a professional - write access with this kind of a tool.

brochington|3 years ago

Ah, yes, Maybe I should clarify who my intended user would be. I agree that if you are technical, you are probably better off using other query methods. I'm always surprised what folks are on here, so was hoping a few would find this question.

> There might be a lot of people here who have, say, that upper-level manager who keeps asking for reports for which the HN person has to figure out how to get the data.

This is 100% the use case I was thinking.

> One caveat, though: I wouldn't want to hand anyone - even a professional - write access with this kind of a tool.

Agree! Not sure how this ever would be safe enough for edits.

xigoi|3 years ago

I guess certain kinds of edits could be safe, such as inserting a row and then asking for confirmation.

drittich|3 years ago

I remember first playing with tech like this in the early 1990's. Q&A v4 from Symantec supported NLP and I was quite surprised at how potentially useful it looked, although as a developer I preferred more control. After typing your query, the app would display its interpretation of your request in more formal English to confirm it understood you. When there was ambiguity a few options were presented. You selected the correct one, and got your answers. It worked very well for queries like, "show me all employees hired after 2020-01-01 who's salary is greater than 80,000 sorted by salary descending".

Ultimately though, I think the usefulness of these tools breaks down for both complex queries and even simple ones when the data model does not have explicit relationships defined.

brochington|3 years ago

1990's huh, wow. I totally believe it. It almost feels like a solved problem, but when I tried to find an API that parsed text to a degree that was usable by an average dev, nothing came up. Many NLP/AI/ML tools could define the entities, or VERY general relationships, but never went far enough. I'm curious if the 90's solutions used ML, or if they went straight to text analysis (which is what I'm using).

> Ultimately though, I think the usefulness of these tools breaks down for both complex queries and even simple ones when the data model does not have explicit relationships defined.

Makes sense. Do you think it would help if a developer could define the relationships in the data model ahead of time?

TylerE|3 years ago

No.

Trying to map english to formal logic is a fools errand.

From my experience, what keeps non-technical people from writing queries isn't SQL, it's stuff like joins.

drittich|3 years ago

>Trying to map english to formal logic is a fools errand.

Perfect description of a developer's job.

"WANTED: One Fool to serve the court at the pleasure of the CEO. Compensation is one crust of bread and a flagon of ale per day."

brochington|3 years ago

> Trying to map english to formal logic is a fools errand.

I don't disagree, but this would not be a 1-1 mapping, if that matters.

>From my experience, what keeps non-technical people from writing queries isn't SQL, it's stuff like joins.

I've met very few non-technical folks willing to brave learning any part of SQL. In my case joins would be an implementation detail, mostly handled by the product, but possibly with an escape hatch for technical folks.

edmundsauto|3 years ago

There is something in this space that I think would have value... maybe translation from English -> SQL, maybe suggest commonly used WHERE clause filters, etc.

At the end of the day, SQL is very expressive for most of these queries, but it's not particularly discoverable and does take some knowledge. Lowering that barrier to entry is a great idea, but otherwise I'm not sure if an analyst can be certain their query will give the same data as somebody who uses slightly different phrasing. SQL gives a lot more precision and I would hate to lose that due to a layer of abstraction.

But English -> SQL (with something like Github Copilot, built on other analysts' queries) would be very interesting although not "get out my wallet and purchase" compelling.

brochington|3 years ago

English->SQL is very much a thing, and has a good amount of research going into it. It does have a few weaknesses though:

1. It is ML based, and the best results I have seen put it at about 90% accurate. This might be "good enough", but not perfect. Verification and error correction is needed.

2. Knowledge of the schema needs to be passed in as part of the feature, or have the model explicitly trained to the target schema.

3. Going to a different DB requires a retraining of the model, due to slight differences in SQL dialects.

4. ML takes either a lot of time (speed) or money (GPUs). This is more a general ML problem, but does affect English -> SQL.

I am no expert in English -> SQL, or in ML in general, so somebody correct me if I'm wrong on the above points. These are just what I've seen or experienced in my research.

godshatter|3 years ago

The more detailed you have to get explaining things, the less I would want to use it. If I can say something like "get the employee's pay-related data" and not have to delineate exactly what fields to get and what to call them, that would be useful. If I could say "make sure they are not also a student" (I work for a university) and have it figure that out, that would also be useful. If I have to tell it what joins to make I'd rather just type it in in sql or whatever it's underlying language is. Typing is much more casual of an activity for me than speaking is, and I can type for hours at a time and I can't talk for that long without my throat running dry. If I can say in a few words what I want and save a lot of typing, then that's great. If I end up saying as much a I would type anyway, then I'd rather just type it.

I think this is definitely something that should be looked at, but it's not a product I'm really interested in unless it wows me with it's intelligence. It has to start somewhere, though. I'm probably an outlier in the sense that I think a lot of people would rather talk to their computer than type on a keyboard. I'm just not one of them. I also don't want to work in cubicle hell with everyone speaking to their computer 24/7.

brochington|3 years ago

Ah, sorry, I should clarify that when I say "talk", I didn't mean it in the literal sense. I've only worked on text as a source, though Speech-to-text shouldn't be too hard to integrate if it was needed.

I like your insight about using... let's call them "key phrases"...to help guide the query. Perhaps having a way to map a key phrase with a specific query, maybe one that is parameterized, would be useful. thank you.

themodelplumber|3 years ago

This seems like it could have enough value to build a customer base around? Or for someone to want to purchase the rights to the tech so they could build around it?

Selling such a thing should not be a problem given the right target. Not only has the customer space for technologies like that changed over time, but you are providing a new twist on the solution.

> “Stripe/Twilio for NLU”, but recent feedback has been that it’s more a “technology”, and less a “product”

That comment doesn't make a ton of sense to me. Are services not valuable? Stripe and Twilio seem like really helpful services and that seems...OK to me?

Personally I get excited when I hear about an ease-of-use wrapper around regex. But for a DB, in place of that regular messy query stuff with the prospect of things like multiple LEFT JOINS? That's a big deal.

And even if it doesn't tick every box it will probably I'd guess it would have its unique applications for a given set of customers.

Like let's say sets of people who would like to prototype to well-enough using their ability to sit around and talk in English all day long, and then hand off to someone else. The average person's energy pool for trying different sentences, even considering some expected failure rate, is so much deeper than the resources available for trying and failing with different SQL statements.

This would also apply to those who are not really working with the data to work with it. Let's say they are selling data-viz tools and want a quick way to make prototypes from the potential customer's sample data. There, boom, product example. I guess.

It sounds really cool. Good luck, hope it works out for you.

brochington|3 years ago

> That comment doesn't make a ton of sense to me. Are services not valuable? Stripe and Twilio seem like really helpful services and that seems...OK to me?

Services are totally valuable! I think with Stripe and Twilio they both solve a problem a Business/PM/Owner has. The conversations go something like this in my head:

PM: I want to be able to send SMS messages to my customers.

Dev: Uh, I don't know anything about telecom...

Twilio: I do! I'm way cheaper than a dev working this problem. Just use me.

For NLU, I'm not sure I've been able to find PMs that are wanting to "understand the plain english of our users". But I know there is a decent amount of NLP usage. Do PM's just not know that they can ask for these NLP? Or do they just not need it? I'm not sure. I feel a little bit like I'm missing a piece of the puzzle.

> Personally I get excited when I hear about an ease-of-use wrapper around regex. But for a DB, in place of that regular messy query stuff with the prospect of things like multiple LEFT JOINS? That's a big deal.

A regex wrapper is an interesting idea. Maybe I'll try it out. I agree that a text to db wrapper could be a good idea, if it works really well.

> The average person's energy pool for trying different sentences, even considering some expected failure rate, is so much deeper than the resources available for trying and failing with different SQL statements.

Great point!

> It sounds really cool. Good luck, hope it works out for you.

Thank you, me too :)

toast0|3 years ago

Having used various things that claim to be natural language, I find that to use them effectively, I end up needing to learn their particular structured language. These are often poorly documented and may have pretty weird/difficult edge cases (one often seen and easy to explain case is selecting lists with plural nouns and individual records with singular nouns... But many english nouns have the same spelling and pronounciation as plural and singular).

Regardless of the details of the language, if I'm going to learn a structured language anyway, I would usually prefer to learn the underlying language, and not an imperfect abstraction. Sometimes, there's good value in the abstraction, but I usually find they get in the way and make it harder to do what I want.

brochington|3 years ago

> Having used various things that claim to be natural language, I find that to use them effectively, I end up needing to learn their particular structured language.

This makes sense. Are there any specific examples of products that you have used that you had to do this?

> ...selecting lists with plural nouns and individual records with singular nouns...

Yes, this is for sure a failure case. I believe I have some means to work around this, fwiw.

> if I'm going to learn a structured language anyway, I would usually prefer to learn the underlying language, and not an imperfect abstraction.

I feel like I get what you are saying here, but some more concrete examples would help. Is there a "structured language" you are referring to?

nitwit005|3 years ago

Have you tried gathering sample queries from real people? I previously worked at a company that tried this briefly. They gave up after getting the first sample of queries real users typed in.

brochington|3 years ago

Not yet specifically for this; It is an excellent point.

If I may, was there any particular phrases that you remember that were "too much"?

It would be great to talk more about your experience, if you are open. Just let me know and I can DM you.

bjourne|3 years ago

This is an area that has been researched for decades and there is a wealth of prior art. If you want to pursue this idea you should narrow it down a bit. For example, creating a usable natural language interface for databases for GIS data for land surveying would in itself be a massive project. It should also be said that "plain English" is far from "plain" even for native speakers. It's not just about parsing it is also about making it usable.

wizwit999|3 years ago

Here's a related offering that Amazon launched a service for: https://aws.amazon.com/quicksight/q/ . Google Analytics has the same technology in their dashboard, found it kind of cool, works well sometimes It lets you ask questions about your data and get answers.