top | item 42522368

Show HN: I'm learning Go and built a scraper. How can I improve it?

5 points| itzami | 1 year ago

TLDR: I'm learning Go and [this](https://github.com/ItzaMi/compare-supermarket-prices) is my first project with it. What am I doing wrong and what could I improve?

In an attempt to expand my horizons and actually get into backend development, I've decided to learn Go. I picked it for no reason other than the market seems to be in a friendly state towards it, compared to Elixir, but I'm very much enjoying it.

Thankfully something unlocked in my brain and I thought of a project to do while learning the syntax and how the language works, so I've built a scraper.

Here's the link: https://github.com/ItzaMi/compare-supermarket-prices

At this point, I would like to know what I'm doing wrong and what I could improve in it, so any tips and advice would be very much appreciated!

4 comments

order

rudiksz|1 year ago

As far as style is concerned:

1) try to use "for := range" loops instead of the traditional for

2) "var result [4]string" - really seems like this wants to be a struct with a couple of fields.

  type result struct {
    id    string
    name  string
    brand string
    price string
  }
Makes code more self documenting:

  result.id = product
  result.name = e.ChildText("h3 a")
  result.brand = " "
  result.price = e.ChildText("span.value")
3) you could introduce a "store" struct with fields like "separator", "name", "baseProductUrl", and a "Visit" method. This would entirely eliminate the need for lines 63-73, which are static code in a for loop and a performance/code smell.

3b) The "visitStore" method does a switch on "store", but it is in a hot loop. This is a good candidate for micro-optimisation-is-the-root-of-all-evil debates later down the road. It's better to avoid those, and there's a number of simple solutions here to prevent that "constant check" in the hot loop.

4) Using defer in for loops is always a red flag. While it is technically correct, it's very likely not what you want here. Imagine that you have 10k products to visit, for 10 stores. The file for the first store will be flushed and closed only after your main function exists (your entire program in this case). What you would want is to open a file, visit all products, flush the writes and close the file handle, and only then move on to the next store. In other words, the lines 76-98 (the body of your main for loop) should be a separate function so your "defer"s happen when a loop iteration terminates, and not when the "main" function exists.

itzami|1 year ago

All points make a lot of sense, thank you for your reply!

eevmanu|1 year ago

as an idea I'd suggest to evaluate your codebase with the multiple llms available now using the free tier (deepseek V3, llama 3.3 versatile on groq, gemini-exp-.. on google ai studio and so on)

on each llm provider you can find prompting best practices or techniques to improve your eval prompt and eval your code correctly

at least the feedback loop would be faster

if you already thought about it or already did that, well ... great

itzami|1 year ago

I did but I've been feeling that the LLMs end up giving an average solution / something that I don't fully understand / the same solution that just looks slightly different and I think I'll end up getting influenced by its responses instead of thinking by myself or learning from others