fwilliams's comments

fwilliams | 3 years ago | on: The US-Canada border cuts through the Haskell Free Library and Opera House

It’s crazy seeing this on the front page of HN!

I grew up in Stanstead. I have fond memories of story time as a child in the library, borrowing movies and comic books, and playing age of empires 2 with my best friend on the two shared computers in the front room.

There’s also a street in the town (aptly named Canusa st.) which is half in the US and half in Canada. Interestingly, the houses on one side have flags reminding you where you are, while there are no flags on the other. Figuring out which side is left as an exersize to the reader ;)

fwilliams | 3 years ago | on: A Case of Plagarism in Machine Learning Research

If you look at the plagiarized language in the article, it seems as if the BM paper authors are claiming contributions (emphasis mine). Credit is a major currency in research, and it's important to give it where it is due. If someone did this with one of my papers, I'd be quite upset.

For example (Emphasis mine):

> The risks of data memorization, for example, the ability to extract sensitive data such as valid phone numbers and IRC usernames, are highlighted by Carlini et al. [41]. While their paper identifies 604 samples that GPT-2 emitted from its training set, we show that over 1 of the data most models emit is memorized training data. In computer vision, memorization of training data has been studied from various angles for both discriminative and generative models Deduplicating training data does not hurt perplexity: models trained on deduplicated datasets have no worse perplexity compared to baseline models trained on the original datasets. In some cases, deduplication reduces perplexity by up to 10%. Further, because recent LMs are typically limited to training for just a few epochs

fwilliams | 3 years ago | on: A Case of Plagarism in Machine Learning Research

To quote the article:

> But even putting aside the fact that claiming someone else's writing as one's own is wrong, the value in survey papers is in how they re-frame the field. A survey paper that just copies directly from the prior paper hasn't contributed anything new to the field that couldn't be obtained from a list of references.

Good survey papers can be important contributions in their own right (e.g. [1]). A good survey should contextualize works within a subject area with respect to each other and identify high level trends/ideas in that subject. These connections are not only useful for learning a topic, but also for positioning novel work or identifying under-researched areas to focus on.

If the authors felt that one of the papers they plagiarized concisely expressed what they wanted to say, they could simply quote and cite that work. Otherwise, it could be construed that the authors are claiming to be the ones drawing the conclusions they wrote. Moreover, from the article, the survey in question seems to be pretty egregiously plagiarizing, which deserves to be called out/shamed.

[1] https://arxiv.org/abs/2111.11426

fwilliams | 4 years ago | on: Ask HN: How do you find funds to invest in?

I want to echo the other comments here that low expense ratio (<0.25%) funds from Vanguard, Fidelity, Schwab, etc... are all great stable investments.

My time horizon is longer than 5 years, and I buy broad market index funds split up as follows: 55% US large cap (e.g. VIIIX, VTSAX, SWTSX), 15% US mid cap (e.g. VMCPX), 10% US small cap (e.g. VSCPX), and 20% international (e.g. VTSNX, SWISX, VXUS).

I also highly recommend dollar cost averaging. i.e. buying a fixed amount of your portfolio at fixed periods. I have my bank do this automatically every 2 weeks. The benefit of dollar cost averaging is (1) it takes the emotion out of investing, and (2) over a long time window, more of your assets will be purchased at a low prices than high prices (because you're buying a fixed dollar amount of assets every N days, fewer you will buy fewer assets when prices are high and more assets when prices are low).

fwilliams | 4 years ago | on: The illustrated guide to a Ph.D. (2010)

I was a software engineer briefly before starting grad school. During that time, I found I didn't have the time to sit down and learn about topics that interested me. I also wanted to be in research-y roles where I could build things that were more experimental and less well understood.

During my PhD, I got to spend time learning, and attending talks/seminars/conferences. Gaining deeper background knowledge in my field as well as learning how to quickly evaluate and explore new ideas gave me the tools to have the type of job I wanted. I'm a research scientist at an industrial lab now and quite enjoy it.

That being said, I agree with the grandparent post that doing a PhD can be a grueling experience. I had to carry the bulk of the work for many of the papers I submitted. If I took a day off, nobody would pick up the slack. Tight deadlines meant the only way to succeed was putting in long hours. My advisors were also spread very thin so it was difficult to get a lot of time with them. There were times when I felt very alone. This was a really stark contrast to how collaborative engineering in industry was and I don't think I ever fully adjusted to it. My current job feels like a happy middle ground. I publish papers alongside other people and we split the work.

fwilliams | 5 years ago | on: My 90s TV: Browse 90s Television

It’s not the site you’re looking for, but I found https://poolside.fm recently and it’s become one of those quirky corners of the internet that I have come to enjoy. I definitely miss the days of discovering weird specialty sites, and poolside gave me a bit of that new site discovery rush (also the music is great).

fwilliams | 7 years ago | on: Gradient Descent Finds Global Minima of Deep Neural Networks

It's worth noting that the primary result of this paper has only to do with the error on the training data under empirical risk minimization. Zero training error =/= a model that generalizes. For any optimization problem, you can always add enough parameters to achieve zero error on a problem over a finite training set (imagine introducing enough variables to fully memorize the map from inputs to labels).

The major contribution of the work is showing that ResNet needs a number of parameters which is polynomial in the dataset size to converge to a global optimum in contrast to traditional neural nets which require an exponential number of parameters.

fwilliams | 7 years ago | on: Autopsy of a deep learning paper

Okay so I went and read the paper. They discuss generative modeling in section 5 and in the appendix (section 7.2).

Section 5 claims "the corresponding CoordConv GAN model generates objects that better cover the 2D Cartesian space while using 7% of the parameters of the conv GAN". There isn't really quantitative analysis beyond a couple of small graphs discussing this any further. Section 7.2 and 7.3 visually compares the results between the generator's output of interpolated noise vectors in the latent space. The results look good but without quantitative analysis, they are very preliminary.

Generative modeling is tricky and I think in your first comment, the jump from a few nice images to CoordConv can "significantly improve the quality of the representations" is a big one given the sparsity of evidence in the paper. I'm not saying that you're wrong but your original comment seemed a bit misleading to me.

fwilliams | 7 years ago | on: Autopsy of a deep learning paper

I haven't read the paper so I can't comment on the success of the method, but most applied ML research will show their best results in the publication and leave out failure cases.

These images look impressive, but without doing a proper in-depth analysis, more general claims of improvement on a task are hard to make And while it's totally possible that, in this case, the improvements are significant, it's dangerous to extrapolate from just a few examples in a paper.

fwilliams | 7 years ago | on: Autopsy of a deep learning paper

Somewhat tangentially, some recent work showed that a lot of problems with images (e.g. denoising, upsampling, inpainting, etc...) can be solved very efficiently with no training set at all: https://dmitryulyanov.github.io/deep_image_prior

This work shows that the network architecture is a strong enough prior to effectively learn this set of tasks from a single image. Note that there is no pretraining here whatsoever.

More to your point, I think a big problem with toy tasks are not so much the tasks but the datasets. A lot of datasets (particularly in my field of geometry processing) have a tremendous amount of bias towards certain features.

A lot of papers will show their results trained and evaluated on some toy dataset. Maybe their claim is that using such-and-such a feature as input improves test performance on such-and-such problem and dataset.

The problem with these papers often comes when you try to generalize to data that is similar but not from the toy dataset. A lot of applied ML papers fail to even moderately generalize, and the authors almost never test or report this failure. As a result, I think we can spend a lot of time designing over-fitted solutions to certain problems and datasets.

On the flipside, there are plenty of good papers which do careful analysis of their methods' ability to generalize and solve a problem, but when digging through the literature its important to be judicious. I've wasted time testing methods that turn out to work very poorly.

fwilliams | 7 years ago | on: Why ActivityPub is the future

You could build a peer-tube compatible YouTube-like website. With control over the server, you could collect user data and probably stream ads over peer-tube. So there is possibly a profit incentive to hosting content.

fwilliams | 7 years ago | on: Understanding Machine Learning: From Theory to Algorithms

I took the author of this book's course during my undergrad and quite enjoyed it. It's a good theoretical introduction to machine learning principles. The video lectures are available here: https://youtu.be/b5NlRg8SjZg

As others have mentioned this is a fairly theoretical take on machine learning which may not be useful if you just want to use a deep learning library. That said, I think there is a lot of value in having a deeper theoretical grasp of a topic even when practicing.

fwilliams | 8 years ago | on: The differences between tinkering and research (2016)

Most researchers in my field make all their publications freely available online after they have been submitted.

In machine learning and computer vision, the default is to put your work on ArXiv immediately upon completion. There is a lot of openly available research depending on the field. I find most fields in computer science are good for this.

In the case of the Mario example, I very much doubt that the author was not able to find other work because of closed journals.

Research can seem non-transparent to non-researchers because when problems are new, they are often poorly understood. Academic papers discuss novel problems and contextualize them based on other cutting edge work. Reading and understanding these papers requires a lot of context and takes time. Research is a skill that takes years to learn.

After some time has passed and we collectively gain a better understanding of a problem, academic papers may seem abstruse and overly complicated, but when these works were first published, this was the best way to understand them. For somebody looking for a recipe solution to a problem, an academic paper is likely not an the ideal place to look, which is why we write books, blog posts, etc... as we come to better understand a problem and its solutions.

page 1