top | item 41151174

(no title)

Every now and then, I'll actually sort of believe an article like this. Then I go and test the current models on things like semantic search.

For instance -

The Hough transform detects patterns with certain structure in images, e.g. circles or lines.

So I'm looking for academic research papers which apply the Hough transform to audio spectra, to recognize the harmonic structure of tonal audio and thus determine the fundamental pitch. (i.e. the Hough space would be a 1D space over fundamental frequency).

I've spent probably 90 minutes, over the several times I've read an optimistic post like this, asking various LLMs (mostly GPT-4o, though my early tests predate GPT-4o, and I've also tried Gemini and Claude), prompts along the lines of

> The Hough transform detects patterns with certain structure in images, e.g. circles or lines. > I'm looking for academic research papers (please link them or provide a DOI.org link at least) which apply the Hough transform to audio spectra, to identify the harmonic structure of audio and thus determine the fundamental pitch. > Make sure to provide only papers that actually exist. If you can't find anything particularly relevant, say so as a disclaimer & just provide the most relevant papers you can.

This is a reliable "fake paper generator", unfortunately - it'll just make up plausible garbage like

> Here are some academic papers related to applying the Hough transform to audio spectra for identifying harmonic structures and fundamental pitch:

> "An Audio Pitch Detection Algorithm Based on the Hough Transform" > Authors: Mark W. and John D. > Published In: IEEE Transactions on Audio, Speech, and Language Processing > DOI: 10.1109/TASL.2008.2000773 > Abstract: This paper proposes an audio pitch detection algorithm that utilizes the Hough transform to analyze the harmonic structure of audio spectra and determine the fundamental pitch.

This paper does not exist. Complete waste of my time. And again, this behavior persists over the >1 year period I've been trying this query.

And it's not just search-like tasks. I've tried asking for code and gotten stuff that's outright dangerous (try asking for code to do safe overflow-checked addition on int64_t in C- you have about an 80% chance of getting code that triggers UB in one way or another). I've asked for floating-point calling conventions on RISC-V for 32-bit vs 64-bit (would have been faster than going through the extension docs), and been told that RV64 has 64 floating-point registers (hey, it's got a 64 in the name!). I've asked if Satya Nadella ever had COVID-19 and been told- after GPT-4o "searched the web"- that he got it in March of 2023.

As far as I can tell, LLMs might conceivably be useful when all of the following conditions are true:

1. You don't really need the output to be good or correct, and 2. You don't have confidentiality concerns (sending data off to a cloud service), and, 3. You don't, yourself, want to learn anything or get hands-on - you want it done for you, and 4. You don't need the output to be in "your voice" (this is mostly for prose writing, for code this doesn't really matter); you're okay with the "LLM dialect" (it's crucial to delve!), and 5. The concerns about environmental impact and the ethics of the training set aren't a blocker for you.

For me, pretty much everything I do professionally fails condition number 1 and 2, and anything I do for fun fails number 3. And so, despite a fair bit of effort on my part trying to make these tools work for me, they just haven't found a place in my toolset- before I even get to 4 or 5. Local LLMs, if you're able to get a beefy enough GPU to run them at usable speed, solve 2 but make 1 even worse...

discuss

fxj|1 year ago

Just out of curiosity: Have you tried perplexity? When I paste your prompt it gives me a list of

2 researchgate papers (Overlapping sound event recognition using local spectrogram features with the Generalised Hough Transform July 2013 Pattern Recognition Letters)

and one ieee publication (Generalized Hough Transform for Speech Pattern Classification, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 11, pp. 1963-1972, Nov. 2015)

When I am looking for real web results chatgpt is not very good, but perplexity very often shines for me

and for python programming have a look at withpretzel.com which does the job for me.

just my 2 ct

dijksterhuis|1 year ago

> 1. You don't really need the output to be good or correct

> 2. You don't have confidentiality concerns (sending data off to a cloud service)

At $PREVIOUS_COMPANY LLMs were straight up blanket banned for these reasons too. Confidentiality related to both the code and data for the customers.

The possibility that "it might get some things right, some of the time" was nowhere near a good enough trade-off to override the confidentiality concerns.

And we definitely did not have staff/resources to do things local only.

SOLAR_FIELDS|1 year ago

I’ve found that it really matters a lot how good the LLM is on how large the corpus it is that exists for its training. The simple example is that it’s much better at Python than, say, Kotlin. Also, I also agree with sibling comment that in general the specific task of finding peer reviewed scientific papers it seems to be especially bad at for some reason.

rhdunn|1 year ago

I've been using the JetBrains AI model assisted autocomplete in their IDEs, including for Kotlin. It works well for repetitive tasks I would have copy/paste/edited before, and faster, so I have become more productive there.

I've not yet tried asking LLMs Kotlin-based questions, so don't know how good they are. I'm still exploring how to fit LLMs and other AI models into my workflow.

XMPPwocky|1 year ago

I see no sibling comment here even with showdead on, but I could buy that (there's a lot of papers and only so many parameters, after all- but you'd think GPT-4o's search stuff would help, maybe a little better prompting could get it to at least validate its results itself? then again, maybe the search stuff is basically RAG and only happens one at the start of the query, etc etc)

Regardless, yeah- I can definitely believe your point about corpus size. If I was doing, say, frontend dev with a stack that's been around a few years, or Linux kernel hacking as tptacek mentioned, I could plausibly imagine getting some value.

One thing I do do fairly often is binary reverse engineering work- there's definitely things an LLM could probably help with here (for things like decompilation, though, I wonder whether a more graph-based network could perform better than a token-to-token transformer - but you'd have to account for the massive data & pretrain advantage of an existing LLM).

So I've looked at things like Binary Ninja's Sidekick, but haven't found an opportunity to use them yet - confidentiality concerns rule out professional use, and when I reverse engineer stuff for fun ... I like doing it, I like solving the puzzle and slowly comprehending the logic of a mysterious binary! I'm not interested in using Sidekick off the clock for the same reason I like writing music and not just using Suno.

One opportunity that might come up for Sidekick, at least for me, is CTFs- no confidentiality concerns, time pressure and maybe prizes on the line. We'll see.

OkGoDoIt|1 year ago

Yeah, I spent 6 months trying to find any value whatsoever out of GitHub copilot on C# development but it’s barely useful. And then I started doing python development and it turns out it’s amazing. It’s all about the training set.

sebastiennight|1 year ago

At least one paper about the Hough Transform here[1] should be of interest to you.

I'm afraid your prompts are the exact example of "holding it wrong". Replacing Wikipedia or Google is not what LLMs do. Think of them as a thinking engine, not as a "semantic search" of the Internet.

However, I've got great news for you: the app you're looking for exists, and it's a YC company. They've recently launched on here[0].

When I use the description from your post as the prompt (not your actual prompt that you quoted underneath), I get these clarifying questions:

> Applying the Hough transform to audio spectra for pitch recognition is an interesting extension of its typical use in image processing for line and circle detection.

> Can you clarify which specific types of harmonic structures you're hoping the Hough transform will detect in audio spectra? Are you interested in recognizing harmonic series in general, or are you targeting specific instrument voices or vocal data? Additionally, are there any constraints on the types of audio signals you'd want this method applied to—such as clean synthetic tones versus real-world noisy recordings?

> Just to ensure we're on the same page, are you specifically looking for papers that describe the application and methodological details of using the Hough transform in this context, or would you also be interested in papers that discuss the performance and comparative effectiveness of this approach against other pitch detection algorithms?

Now I've got no clue what your answers to these would be, but here are the search results[1]. Presumably that is a better tool for your purposes.

[0]: https://news.ycombinator.com/item?id=41069909 [1]: https://www.undermind.ai/query_app/display_one_search/aac9fd...

unknown|1 year ago

[deleted]

cdrini|1 year ago

The article goes through a few use cases where LLMs are especially good. Your examples are very different, and are the cases where they perform especially poorly.

Asking a pure (ie no internet/search access) LLM for papers on a niche subject is doubling down on their weaknesses. That requires LLMs to have very high resolution specific knowledge, which they do not have. They have more coarse/abstract understanding from their training data, so things like paper titles, DOIs, etc are very unlikely to persist through training for niche papers.

There are some LLMs that allow searching the internet; that would likely be your best bet for finding actual papers.

As an experiment I tried your exact prompt in ChatGPT, which has the ability to search, and it did a search and surfaced real papers! Maybe your experiment was from before it had search access. https://chatgpt.com/share/a1ed8530-e46b-4122-8830-7f6b1e2b1c...

I also tried approaching this problem with a different prompting technique that generally tends to yield better results for me: https://chatgpt.com/share/9ef7c2ff-7e2a-4f95-85b6-658bbb4e04...

I can't really vouch how well these papers match what you're looking for since I'm not an expert on Hugh transforms (would love to know if they are better!). But my technique was: first ask it about Hugh transforms. This lets me (1) verify that we're on the same page, and (2) loads a bunch of useful terms into the context for the LLM. I then expand to the example of using Hugh transforms for audio, and again can verify that we're on the same page, and load even more terms. Now when I ask it to find papers, it had way more stuff loaded in context to help it come up with good search terms and hopefully find better papers.

With regards to your criteria:

1. The code from an LLM should never be considered final but a starting point. So the correctness of the LLM's output isn't super relevant since you are going to be editing it to make it fully correct. It's only useful if this cleanup/correction is faster than writing everything from scratch, which depends on what you're doing. The article has great concrete examples of when it makes sense to use an LLM.

2. Yep , although asking questions/generating generic code would still be fine without confidentiality concerns. Local LLMs though do exist, but I personally haven't seen a good enough flow to adopt one.

3. Strong disagree on this one. I find LLMs especially useful when I am learning. They can teach me eg a new framework/library incredibly quickly, since I get to learn from my specific context. But I also tend to learn most quickly by example, so this matches my learning style really well. Or they can help me find the right terms/words to then Google.

4. +1 I'm not a huge fan of having an LLM write for me. I like it more as a thinking tool. Writing is my expression. It's a useful editor/brainstormer though.

5. +1

brooksbp|1 year ago

Also agree that asking for academic papers seems to increase the potential for hallucination. But, I don't know if I am prompting it the best way in these scenarios..