inputcoffee's comments

inputcoffee | 6 years ago | on: I was wrong about spreadsheets (2017)

The Fidelity "minus sign mistake" didn't create a loss. The mistake was in relaying the information to the end user. It didn't actually cause a loss of that magnitude.

That is like saying if I mistyped in a word doc, that word created the loss.

inputcoffee | 6 years ago | on: Racket Is an Acceptable Python

I am not offended that you think I may not program. That is fine. (I mean less than some, more than others. I have coded up the examples I brought up.)

But you haven't responded to the argument. If someone urges you to use Racket, and you have task in front of you (say, put up a website), it sort of matters whether Racket has a framework more than if it has brackets, indents or curly braces.

inputcoffee | 6 years ago | on: Racket Is an Acceptable Python

True, if we were talking about Data science, and you were bringing up Python or R or Julia, fair enough.

But if you're talking about Racket, I would want to know what you can do with it. Does it have a data science library? A web app framework?

inputcoffee | 6 years ago | on: Racket Is an Acceptable Python

Well, I assume people write mountains of code in the library. If you're making a machine learning product, that is still a lot of work.

However, writing your own Tensorflow interface would take several human lifetimes to get it right, and Google already has provided it. So it seems that is not the part you would re-write no matter how good the language is.

inputcoffee | 6 years ago | on: Racket Is an Acceptable Python

I am always confused when people talk about the language itself.

In my experience, python is used for Tensorflow, or Pandas, or Django, or Flask, or pytorch or something else that runs on top of it. Sometimes it is even more specialized and I need a wrapper for an API to let me talk to some web data. Maybe I need a crawler/scraper and a parser. There is a specialized language on top of the language.

So when someone says, oh this language is better with objects, or has some syntax thing or the other, or I can reason about it I am left confused.

Its like if I were talking to a professional shoe designer and I ask for hiking boots and they tell me that they're really into having at least two tones to offset the lace and the heels or something.

What am I missing? I want to reason about the language too, but doesn't that pale in comparison to being able to run a specialized library?

inputcoffee | 6 years ago | on: An opinionated view of the Tidyverse “dialect” of the R language

Agree 100%, but only if it is a one time activity. If you have to automatically pull files and do the operation several times, it is better to go with R (or python, or awk sed or whatever).

inputcoffee | 6 years ago | on: An opinionated view of the Tidyverse “dialect” of the R language

I was waiting for the critique... but I never quite saw it.

Imho, the data problem Tidyverse is trying to solve is basically the ones we face in a database. So, select, join, inner join and so forth. Show me all the rows in this datatable where the 4th columm is larger than the 6th column and the number itself is odd. Something like that.

There might be other ways to do it, but you want your select, filter, summarize, mutate etc functions to all work with each other, pipe to each other and be compatible.

Maybe there is a better way to do all this -- I haven't seen it but I am not an expert -- but you have to show that to me.

So, in base R, walk through a set of example of mutating, joining, filtering and so forth, and show me how they are all easier. Then I'll say, wow there is an alternative to this Tidyverse thing. But in lieu of that demo, this felt more like an intro to a complaint than an actual complaint.

Edit: Also, its funny that Wickham is (apparently) such a nice fellow that people go out of the way to be nice to him in critiques.

inputcoffee | 6 years ago | on: Launch HN: Carry (YC S19) – We Book Travel for You on Slack

Don't you think it was policy so they can capture the information and get reporting?

inputcoffee | 6 years ago | on: Learning to Love the AI Bubble

I've been trying to explain to people why I think ML is Stats rebranded but this is the most succinct expression of that sentiment:

> Taking averages, grouped by something? That's AI now.

I think that is right. The algorithm that does the grouped averages is machine learning, and if you put error bars around it, it is stats.

To address your concern: I wouldn't worry about the relevance of applying math and logic to the world. It has always been growing.

inputcoffee | 6 years ago | on: Ask HN: Can we create a new internet where search engines are irrelevant?

I don't think it would be questions.

Suppose you like fountain pens, and you recommend certain ones. One of your friend looks for fountain pens that their friends recommend and finds the ones you like.

That is just one example of things that don't require explicit questions.

Another one might be you have searched for books or other things and then they follow the same "path". So long as you have similar interests it might work.

People haven't solved this issue, but there is a lot of research out there on networks of connections potentially replacing certain kinds of search.

inputcoffee | 6 years ago | on: Ask HN: Can we create a new internet where search engines are irrelevant?

It was thought that one way of finding information is to ask your network (Facebook and Twitter would be examples), and then they would pass on the message and a chain of trusted sources would get the information back to you.

I am being purposefully vague because I don't think people know what an effective version of that would look like, but its worth exploring.

If you have some data you might ask questions like:

1. Can this network reveal obscure information?

2. When -- if ever -- is it more effective than indexing by words?

inputcoffee | 6 years ago | on: Show HN: DataTau (HN clone for data science) has been down, so we cloned it

It went down a few times before and I reached out asking if he wanted help too. He did respond fairly promptly (< 2 days) and told me he was fine and about to get it back up. This happens every so often so I am glad you’re doing this.

Are the articles the same as he had? Did you get a snapshot before it went down?

inputcoffee | 6 years ago | on: Timeline of Slack’s Tech Stack Evolution

I admit, it really makes you think about iterations and MVP in a new way.

I am surprised you said you found flask lacking though, because I would have thought they were similar. Can you say more about what you found to be lacking in terms of performance/team, size/code and tooling?

inputcoffee | 6 years ago | on: Timeline of Slack’s Tech Stack Evolution

This is really interesting.

I would love to see something like this for other successful companies.

Too often, we see the tech stacks of famous firms, but not the stacks that preceded them.

It would be very interesting to note if, say, 80% of unicorns started their life as RoR, or PHP projects. It tells you one of two things:

1. Which frameworks were popular n years ago (where n is the average time it takes from launch to unicorn)

2. Which framework actually helps you get an MVP off the ground

inputcoffee | 6 years ago | on: Slack Is Going Public at a $16B Valuation

These declarations of valuation should be followed by a little note explaining what multiple of earnings (or, failing that, sales) this represents, and how fast it is growing.

Is that high? Low? about right?

Well, it depends if Slack made $100 million in sales and is flat, or if it did $2B in sales and is doubling every year.

(I assume that it doesn't have earnings because its still growing and plowing all that money back into the business)

inputcoffee | 7 years ago | on: How to lose $172k per second for 45 minutes (2013)

You guys think this is bad?

In finance, I've seen people compute deals worth billions of dollars using excel spreadsheets and a team of MBAs.

inputcoffee | 7 years ago | on: Implementing a Neural Network from Scratch in Python

There are so many little details to remember when you implement a Neural Network from "scratch". Or, I suppose, even if you do not.

You know what would be a great contribution? An extensive set of unit tests, or even just problems with solutions. That way people can write their own implementations and test them. And even if a person were to implement the net in Pytorch of Tensorflow, they could test the work.

So there would be a matrix of weights, and a vector of input nodes, and the "answer" would be the output vector. Then there would be another "answer" which is the output with a particular activation function, and so on.

This library would just be there so people who are doing their own implementation can test their work.

As I said, a unit test would work too but then it would have to be language specific. Just the matrix and answer would be language agnostic.

For people who think: can't you just make up an example yourself using a sheet of paper or in excel.

Yes, for most purposes this is fine, but if you forget one little implementation detail of a three layer network with a ReLu, you really want an external way to check that.

inputcoffee | 7 years ago | on: Starting a Company Outside Silicon Valley Just Saved Me $1.1M

I think there is an error in the title.

It should read: "Starting a Company Outside Silicon Valley Saved Me Just $1.1M"

inputcoffee | 7 years ago | on: Nasdaq Acquires Quandl to Advance the Use of Alternative Data

Accepting the null hypothesis has utility only if you have some reason to believe it would not be accepted.

Accepting it per se has no particular value. You could generate several random datasets, and accept/reject the null hypothesis between them ad infinitum.

To put it another way, its only interesting if its surprising.

inputcoffee | 7 years ago | on: Nasdaq Acquires Quandl to Advance the Use of Alternative Data

Alternative take: there isn't that much low hanging fruit there.

Hear me out.

"To the person who only has a hammer, everything looks like a nail."

The data in front of your is the data you want to analyze, but it doesn't follow that that is the data you ought to analyze. I predict that most of the data you look at will result in nothing. The null hypothesis will not be rejected in the vast majority of cases.

I think we -- machine learning learners -- have a fantasy that the signal is lurking and if we just employ that one very clever technique it will emerge. Sure random forests failed, and neural nets failed and the SVR failed but if I reduce the step size, plug the output of the SVR into the net and change the kernel...

Let me put an example: suppose you want to analyze the movement of the stock market using the movement of the stars. Adding more information on the stars, and more techniques may feel like you're making progress but it isn't.

Conversely, even a simple piece of simple information that requires minimal analysis (this companies sales are way up and no one else but you know it) would be very useful in making that prediction.

The first data set is rich, but simply doesn't have the required signal. The second is simple, but has the required signal. The data that is widely available is unlikely to have unextracted signal left in it.