jhokanson's comments

jhokanson | 3 years ago | on: How to choose the right Python concurrency API

"Firstly, there are three main Python concurrency APIs, they are:" asyncio, threading, multiprocessing, ... oh, and concurrent.futures

All kidding aside, I used the multiprocessing module lately and it was a mess. Do I want 'map', 'starmap', 'imap', etc.? All I wanted was to run a function multiple times with different inputs (and multiple inputs per function call) and to fail when any launched process failed rather than waiting for every input variation to execute and then telling me about the error(which honestly I didn't think was asking for too much).

jhokanson | 4 years ago | on: Spreadsheets Are Hot–and Cranking Out Complex Code

> Spreadsheets.com, for example, lets users dump almost anything into a cell. Drop a photo or a PDF into a cell and the product will immediately create a thumbnail, which you can then expand, as if the spreadsheet were some sort of blog content-management system.

Yes please! Can I get this for Google Sheets? :/

jhokanson | 4 years ago | on: Why tensors? A beginner's perspective

Speaking of math pages on Wikipedia ... and math text more generally

Is it just me or are we horrible at teaching advanced math? Where are the examples (with actual numbers)? Where is the motivation? Where are the pictures?

jhokanson | 4 years ago | on: Failing to reach DDR4 bandwidth

Does AMD have folks that you can reach out to regarding this? I know Intel has MKL and all the work around its own compiler for maximum speed. This seems like it should be trivial for someone at AMD to put together as an example of how to do things like this correctly ...

jhokanson | 4 years ago | on: Failing to reach DDR4 bandwidth

My c++ is not great (so it is hard for me to tell what is going on) and I'm used to OpenMP where my understanding has always been that you tend to get a single thread per processor (or per hyper-thread) -- not sure if that is guaranteed with the way your code is laid out? Perhaps it really is a NUMA issue as others suggest. I will note that one other variation I had (as it looks like you are already splitting across threads) is that the chunk sizes were actually smaller than the # of threads which meant a faster thread would take more chunks rather than waiting on the slowest thread. Good luck!

jhokanson | 4 years ago | on: Failing to reach DDR4 bandwidth

It is not exactly clear to me what is going on with threads (I guess you are using all of them?). I haven't done too much in this space but anecdotally I've had better luck if my summation is explicitly split into sub-summation tasks. It is not clear if that is being done here. It looks like a single summation loop that the author is expecting the computer to magically split across multiple threads. I'd be interested in seeing what this looks like if instead the task were to add chunks of the original dataset into results per thread (e.g, first 8000 samples on first thread, next 8000 on 2nd thread, etc.), with a final accumulation loop across all threads. Again, the author may be trying this and this is not my area of expertise but I've had decent luck saturating the memory bus with a similar approach.

jhokanson | 4 years ago | on: Improving GitHub Code Search

Yes please! I like to search for examples of how to use libraries and often times the results are all the same exact call in forks or copies of the same code in multiple places. Perhaps deduplication could be optional when searching?

jhokanson | 4 years ago | on: Nuclear waste is a solved problem

I thought this would be referencing work by Nathan Myhrvold using a new type of reactor that supposedly runs on "spent" nuclear waste and in the event of power failure just stops running safely. Not sure of the other logistic issues involved. The one thing I remember about transitioning to this approach is that Nathan said the US isn't very good about building new things, so they were going to build in China. But then it got shut down right as the anti-China trade policies started a few years ago. Not sure if there are big problems with this approach, but it sounded promising ...

jhokanson | 4 years ago | on: Beating TimSort at Merging

As someone that rarely works with Python lists, as opposed to numpy arrays, I was pleasantly surprised to see numpy does what I would expect in providing a mergesort option. I'm surprised Python doesn't, other than via heapq and only implemented in Python (according to my reading of the post and a very quick Google search).

Oops, just for fun the numpy documentation currently states: "The datatype determines which of ‘mergesort’ or ‘timsort’ is actually used, even if ‘mergesort’ is specified. User selection at a finer scale is not currently available." Awesome ...

Also, apparently mergesort may also be done using radix sort for integers "‘mergesort’ and ‘stable’ are mapped to radix sort for integer data types."

jhokanson | 5 years ago | on: Why the Wuhan lab leak theory shouldn't be dismissed

Agreed. However my reaction when first hearing about the lab leak (middle of last year?) was that the leak stories were meant to be malicious/propaganda against China. I didn't take any of this seriously until an article in Politico a week or two ago.

But here's the kicker. Let's say this was a lab leak and as a reporter (which I'm not) I thought the evidence was good enough to warrant reporting. I'm not sure I would share it. The previous occupant of the white house did a great disservice in giving this whole thing a racially charged tone. I'm genuinely scared by the increased acts of violence against southeast Asians in the US and worry that stories like this will make it worse. I'm hoping that the new US government is secretly taking steps to help prevent what may have happened in that lab -- in addition to the large effort needed elsewhere to improve our handling after things had begun to spread.

Anyway, main point is that this was the first time in a long time (ever?) where I really wondered whether, given the circumstances, if it was good to share "the whole truth" (as best we know it) given that we don't know what happened and the potential real-life implications to many people in the US.

jhokanson | 5 years ago | on: GitHub Should Start an App Store

Maybe somewhat off topic ... I don't sell anything on any App stores currently but I do like the idea of having some way of creating "applications" that are a bit more discoverable. Last I checked GitHub search is pretty awful for discovery (maybe I'm wrong on that?). I could imagine an interface where it would be easier to browse "apps" that do something specific.

Mathworks (MATLAB) has a decent version of this: https://www.mathworks.com/matlabcentral/fileexchange/

People can provide feedback and ratings which makes it easy to see when projects are dead or when there are big issues that aren't being fix. You can also reference other projects - "This project is like that project but it is faster ..."

I don't think PyPi is all that good for this. Also, I've never really invested the effort to learn how to deploy an application to PyPi whereas something that simply points to my repo may be valuable?

Anyway, the main problem I want solved is not having people discover my code (although that would be great too), but being able to find other code more easily so I don't need to constantly reinvent the wheel.

jhokanson | 5 years ago | on: Number Parsing at a Gigabyte per Second

Wow, those are big performance differences (660 MB/s for fast-double vs 1042 MB/s for the 'newer' fast-float), although most of the numbers (for the different libraries being tested) are all over the place, and even 'strtod' more than doubled in speed between the two tests (70 MB/s fast-double vs 190 fast-float MB/s). It wouldn't surprise me if those two code bases are essentially the same.

That highlights the complexity of benchmarking in general and the importance of comparing within the same benchmark. I haven't looked at this in a while but I thought some of the newer JSON parsers were standards compliant (maybe not?).

Anyway, that other blog post answers my question as it looks like the big insight is that you use the fast approach (that everyone uses) when you can, and fall back to slow if you really have to. From that blog link:

"The full idea requires a whole blog post to explain, but the gist of it is that we can attempt to compute the answer, optimistically using a fast algorithm, and fall back on something else (like the standard library) as needed. It turns out that for the kind of numbers we find in JSON documents, we can parse 99% of them using a simple approach. All we have to do is correctly detect the error cases and bail out."

Again, I swear I've seen this in one of the other JSON parsers but maybe I'm misremembering. And again, good for them for breaking it out into a header library for others to use.

jhokanson | 5 years ago | on: Number Parsing at a Gigabyte per Second

I'm curious as to what the biggest win in terms of speed was here (in terms of an approach, good lookup tables?). Also I'm curious how this compares to the many (?) JSON parsers that have rolled their own number parser because everyone knows the standard library is so slow ... (just more accurate?, faster?). Regardless, kudos to the authors on their work!

jhokanson | 5 years ago | on: Wireless Is a Trap

That was a great story about actually managing to debug a wireless problem. I wish there were better diagnostic tools commonly available that would clearly explain these things. I'm not sure if in the author's case there is some software that would listen for extensive polling and flag that as an issue - even better if it could log running programs at the time and try and guess which one is causing the issue.

I'm constantly having problems with my mac laptops and after many hours (10+) of internet searching I still have no idea why the wifi doesn't always work reliably. Some days or weeks (months?) it is great. Other days every hour it is disconnecting. Sometimes resetting the router helps, sometimes it doesn't. The whole situation is extremely frustrating.

Some issues: Why am I getting a DNS issue when my wired desktop never has a wireless issue?

Sometime I swear that the laptops becomes much less reliable at the opposite end of my house as my router but the signal strength is generally still excellent (4 bars).

Anyway, I'm not asking to have my particular issues solved (although that would be much appreciated!!!!). The real issue just seems to be that debugging these issues is extremely difficult and not based on principles but just random things that people can try (e.g., delete your plist files).

jhokanson | 5 years ago | on: The Hard Part of Learning a Language

My introduction to Python (a long time ago) and "batteries included" was going to some random guys website to download all the compiled packages I needed. Thanks Christoph!

jhokanson | 5 years ago | on: The Hard Part of Learning a Language

Besides focusing on language 1 vs language 2 I think the author is also highlighting just how hard it is for people to get started in any language. You mentioned R/Rstudio ....

I just wanted to work through an example in a book that I'm reading that has corresponding R code. I type require(package_name) into the command window and that doesn't work. I look at the help for "require" and there is nothing about how to install packages - I would think a reference to the install packages command would be useful (they link to how to check, but nothing obvious on how to install, maybe I missed it). After some searching I find the command and it prompts about whether I want to use the source code or not (after a few reads I understood what was going on but it could be improved). Things started compiling and then I tried to run my package again, no luck. So then I try reinstalling and this time I noticed the error - something like "exited with non 0 status". Awesome. Looking a bit closer I noticed one of the dependencies was not installing. After trying to install the dependency manually I somehow realized it was only for R version > 3.6, I'm on 3.5. So I figured updating RStudio would fix the problem, no luck. It's not like RStudio advertises what version of R they are shipping with Rstudio .... Now I'm on someone's Linkedin post (wtf?) looking at how to upgrade R from within RStudio. They indicate that on my mac I should updateR (or wait, they say that turns out to be not good, so just install from CRAN). Hopefully that works. Do I try my luck with R v4? Nothing's going to break there right ...?

This isn't meant to be a "R is bad" rant. Maybe C# or some other language really does avoid these problems, I don't know. However my experience has been the problems the author mentions are a problem everywhere.

page 1