ptrik's comments

ptrik | 1 year ago | on: Tracking supermarket prices with Playwright

> While the supermarket that I was using to test things every step of the way worked fine, one of them didn't. The reason? It was behind Akamai and they had enabled a firewall rule which was blocking requests originating from non-residential IP addresses.

Why did you pick Tailscale as the solution for proxy vs scraping with something like AWS Lambda?

ptrik | 1 year ago | on: Tracking supermarket prices with Playwright

> My CI of choice is [Concourse](https://concourse-ci.org/) which describes itself as "a continuous thing-doer". While it has a bit of a learning curve, I appreciate its declarative model for the pipelines and how it versions every single input to ensure reproducible builds as much as it can.

What's the thought process behind using a CI server - which I thought is mainly for builds - for what essentially is a data pipeline?

ptrik | 1 year ago | on: Tracking supermarket prices with Playwright

> The data from the scraping are saved in Cloudflare's R2 where they have a pretty generous 10GB free tier which I have not hit yet, so that's another €0.00 there.

Wonder how's the data from R2 fed into frontend?

ptrik | 1 year ago | on: Tracking supermarket prices with Playwright

> I went from 4vCPUs and 16GB of RAM to 8vCPUs and 16GB of RAM, which reduced the duration by about ~20%, making it comparable to the performance I get on my MBP. Also, because I'm only using the scraping server for ~2h the difference in price is negligible.

Good lesson on cloud economics. Below certain threshold we get linear performance gain with more expensive instance type. It is essentially the same amount of spending but you would save time running the same workload with more expensive machine but for shorter period of time.

ptrik | 1 year ago | on: "We ran out of columns"

This is the main takeaway for me. The decentralized way of software development in a large scale. It does echoes with microservices a lot, but this can be done with a more traditional stack as well. It's ultimately about how you empower teams to develop features in parallel, and only coordinate when patterns emerge.

ptrik | 3 years ago | on: For Want of a JOIN

This depends on use case. SQL is the king for batching process - queries are declarative, decades of effort put into optimization.

For real-time / streaming use cases, however, there is yet a mature solution in SQL yet. Flink SQL / Materialize is getting there, but the state-of-the-art approach is still Flink / Kafka Streams approach - put your state in memory / on local disk, and mutate it as you consume messages.

This actually echoes the "Operate on data where it resides" principle in the article.

ptrik | 3 years ago | on: SQLite or PostgreSQL? It's Complicated

Ditto on DuckDB point. This looks like OLAP workloads to me and a columnar database would work wonders. DuckDB if you going embedded, Clickhouse if you going with a server.

ptrik | 6 years ago | on: Clojure on the Desktop

Same here. Have been trying to build something similar with GitHub Actions and this is a nice reference.
page 1