mmyrte's comments

mmyrte | 2 years ago | on: A tale of two teenagers (2023)

I read it not as a self-diagnosis but as "I have tested positively, and moreover I believe in the veracity/adequacy of the test".

mmyrte | 2 years ago | on: Understanding Parquet, Iceberg and Data Lakehouses

TL;DR: In climatology, I know people are using zarr. However, I think columnar storage as in parquet also merits consideration.

My thinking goes as follows: I'm trying to read chunks from n-dimensional data with a minimum of skips/random reads. For user-facing analytics and drilling down into the data, these chunks tend to be relatively few, and I'd like to have them close to one another. For high-level statistics however, I only care that the data for each chunk of work be contiguous, since I'm going to read all chunks eventually anyways.

You can reach these goals with a partitioning strategy either in HDF or zarr or parquet, but you could also reach it with blob fields in a more traditional DB, be it relational or document based or whatever. Since any storage and memory is linear, I don't care whether a row-major or column-major array is populated from a 1d vector from columnar storage with dimensionality metadata or an explicitly array based storage format; I just trust that a table with good columnar compression doesn't waste too much storage on what is implicit in (dense) array storage.

Often, I've found that even climatological data _as it pertains to a specific analytic scenario_ is actually a sparse subset of an originally dense nd-array, e.g. only looking at data over land. This has led me to advocate for more tabular approaches, but this is very domain specific.

mmyrte | 2 years ago | on: My uBlock Origin filters to remove distractions

"Orion" is developed by the people at Kagi, free, and blocks a lot. As soon as I'm earning a salary again, I'll be supporting them financially; super valuable.

mmyrte | 2 years ago | on: Using Lidar to map tree shadows

Have you thought of marketing at cities or public sector consultancies for modelling urban heat islands? Might be handy to prioritize climate adaptation measures.

mmyrte | 2 years ago | on: Using Lidar to map tree shadows

Are you sure you mean topological data science? I know that there are topological methods for classifying high-dimensional data structures, but this discussion is mostly geographical/topographical. Yes, it does describe a surface, but there's a fundamental assumption that all objects are either on a plane or a sphere.

edit: If you mean GIS (geographical information systems/science), there are plenty of undergraduate courses strewn over github. IMO, the R geospatial ecosystem is more mature than its Python counterpart, but both are very usable.