digisth's comments

digisth | 7 years ago | on: Ask HN: Resources for engineers new to system design

If one is already well-versed in multiple areas of software technology (especially development and database administration), this is an excellent book. It surveys the landscape of software data storage technologies, talks about (at a modest level of depth) some of theory behind things like quorums in distributed database systems, resiliency/redundancy strategies during data loss, and a host of other interesting topics.

I'd consider its level of depth somewhere in the middle between specialist books and 10k foot overview books. I recommend it to anyone that has been a software developer or DBA for 5+ years, as I think they'd get the most value out of it.

digisth | 8 years ago | on: Ask HN: What are some of the best job boards you have seen (any industry)?

The newer matching services (as opposed to boards) are all worth checking out: Hired, Vettery, Underdog.io. I found many good leads through all of them, and my current position is through one of them (Vettery.)

AngelList Jobs is also a place to find interesting positions (startup-centric ones in this case, as one might expect.)

digisth | 9 years ago | on: Show HN: Kim – A Python serialization and marshaling framework

Do you know of a source that compares these different libraries in terms of capabilities, focus/use cases, size limits, performance, format support, etc.?

Googling turned up very little for me.

TIA

Edit: libraries mentioned in thread:

PMML, Arrow, Dill, marshmallow, pytables, parquet/fastparquet (and pickle, obviously)

digisth | 9 years ago | on: Arguments against JSON-driven development

The rule of thumb I've always used for when to use OO is "will there be more than one extant object at once or not?" If yes, and especially if these objects need real behavior, then use OO.

If you're essentially going through one object at a time, then discarding them, you're may just be doing conduit data processing, and so there's little advantage to using objects. I think what's missing in this (well-written) analysis is this distinction; if you're slurping data from one place, making a few changes (or especially if you're not making any), then sticking into a DB or vice versa, OO may be the wrong choice.

Ask yourself while writing the code: "are these active, behavior-driven objects that need encapsulation and relatively sophisticated behaviors, or is this just data I'm doing some relatively simple processing on?"

digisth | 9 years ago | on: Facebook AI Research Team Open Source DeepMask and SharpMask

I have a pile of links for getting started with DL in my comment history you can use: https://news.ycombinator.com/item?id=10676455

What really helped advance my understanding from zero to knowledgeable novice was rewriting some existing code line by line (using expanded variable names and comments), and thinking about each line and what it does as you go. It's the software development equivalent of Hunter S. Thompson re-typing The Great Gatsby just to get the feel of writing a great novel. Here's one I did based on Denny Britz's tutorial:

Britz's Original: http://www.wildml.com/2015/09/implementing-a-neural-network-...

My version: https://gist.github.com/sthware/c47824c116e6a61a56d9

HTH

digisth | 10 years ago | on: Ask HN: Where to begin learning about Neural Nets

digisth | 10 years ago | on: Anyone Can Learn to Code an LSTM-RNN (Part 1: RNN)

If you want to know more about RNNs in general, I can't recommend watching the videos/reading the notes from this course enough:

http://cs224d.stanford.edu/syllabus.html

If you want something more basic to get your head around NNs, I recommend Denny Britz's "Neural Networks from Scratch":

http://www.wildml.com/2015/09/implementing-a-neural-network-...

I created a gist with a heavily commented version of his code:

https://gist.github.com/sthware/c47824c116e6a61a56d9

digisth | 10 years ago | on: The evolving fight against sham reviews

I'm not aware of any efforts in that direction. As far as the review link, I don't think that would defeat the newer countermeasure involving 3rd party sellers shipping an empty box (which never gets returned.) It still looks like a verified purchase. Second-parties could require all items be shipped through them (meaning they receive the item and inspect it, then reship it to the customer), but the added expense seems like something they'd be unlikely to spring for.

digisth | 10 years ago | on: The evolving fight against sham reviews

It's not as easy as it seems. Many systems have already been devised, but fake reviewers adapt; it's an arms race, just like with email spam. Bing Liu, who has done work on sentiment analysis, has written about some techniques dealing with spam/fakes on review sites. Here's just a sampling of popular methods and flags:

- Duplicate checking (same user on different products with similar reviews, for example)

- Meta-reviews ("was this review helpful?" - can also be gamed)

- user rating averages (all highs or all lows sometimes considered a flag)

- ratio of "first product reviews" to total reviews on a per-product basis

- "Super-reviewer" status (questionable, but often a flag)

- Products with low sales ranks

- Review ring detection (IP block, post times, etc.)

- Early reviews

- Users who give high ranks, while most other reviews are low

- Positive reviews for one brand's products, and negative for others

- "Verified" purchases

A lot of these already have countermeasures, and fake reviews have already come up with counter-counter measures (like review-time staggering, using multiple IP blocks, shipping empty boxes to defeat verified purchases, etc.)

The "how to know what's trustworthy?" is the million dollar question, and it has many answers that happen to change over time. Not easy to solve.

digisth | 10 years ago | on: Twitter CEO Dorsey Apologizes to Developers

Top of the list would be low-cost filtered API access. Much cheaper than the firehose, with commensurately less data (by keyword(s), location, etc. instead of the whole shebang.) The current pricing opacity and inability to just sign up online like you can with Parse (https://parse.com/plans) means far fewer people as customers. My understanding is that API deals are company-by-company.

A list of general suggestions for them: https://medium.com/@sthware/suggestions-for-twitter-hellowor...

Edit: I'll add that if relations get repaired now and then broken again, I don’t think they’ll be fixable a third time. They need to get this right, IMO.

digisth | 10 years ago | on: Uber launches Uber Rush, merchant delivery service, in three cities

Very interesting. Assuming that the drivers doing a "Rush" order can gather delivery items from multiple businesses in one geographical area, it could mean greater delivery efficiency (and shorter delivery times) overall. Example: orders come in to 4 businesses within within a few blocks of each other. Businesses send their Rush request in to Uber. Uber Rush driver indicates that they are in that area and picks up all items, then delivers them to target area(s).

Another thing this implies is that that these businesses could switch from a staffed delivery service to a completely on-demand one. One of the things the businesses and their customers could see an improvement with is the case where all their delivery people are out making deliveries, and new orders come in. With this, they just make a new Rush request and get a new delivery person.

digisth | 10 years ago | on: The FBI Is Struggling to Hire Hackers Who Don't Smoke Weed (2014)

They have/had the same problem with people with tattoos. Previous story:

http://gizmodo.com/the-fbi-is-struggling-to-hire-hackers-who...

There's definitely still a struggle in certain circles with the idea that many people in security (and other fields) are also members of subcultures that tend to have members that use recreational drugs, have tattoos, piercings, dyed hair, etc. Getting people to work for these agencies instead of the private sector is going to be challenging for a mound of other reasons; the least these agencies should do is rescind some of these bans.

digisth | 10 years ago | on: Did I Just Give My Permission? The Hashtag as Consent

There's nothing wrong with it, but having the ability for people to opt-out (some professional photographers, for example, don't want any kind of usage that isn't pre-approved, even if editorial) would be a way to make it crystal clear whether that user is OK with it or not. What we have right now leaves everyone in a murky area legally, and adds friction and annoyance. "Embed controls" would remove any questions about it, which would give users a say in how their content is used without having to give permission for each usage request, and would protect the people that want to use the posts without having to make a request for each one. Everyone would be covered.

Also, there are many different kinds of uses, some commercial-y. Here are a few:

- Display on Twitter itself (retweet)

- Use in a news story, near an ad

- Display in a Twitter widget on a web site

- Embedded Instagram photos on a company gallery page

- Use in an email ad campaign

- Use on a billboard

The list of things above exist on spectrum of "commercial-ness", and reasonable people can disagree on which they think are OK and which aren't as far as obtaining advanced permission. What many would like to see are much more explicit rules or technical controls so that the ambiguities are removed.

Embed controls might not be ideal from a fair use perspective, but it's a lot better than the "let the courts figure out" situation we have now.

digisth | 10 years ago | on: Did I Just Give My Permission? The Hashtag as Consent

Well, using the standard embedding tools, you do get attribution, and a link back to the source (see Twitter, Vine, and Instagram web embeds for examples), but the other parts are generally in the services' TOS. This is from Twitter:

"You retain your rights to any Content you submit, post or display on or through the Services. By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).

Tip: This license is you authorizing us to make your Tweets on the Twitter Services available to the rest of the world and to let others do the same."

Without this, all those embedded Tweets you see everywhere (like in news stories) wouldn't be available (this is why I think these services should all give you the ability to disable embedding your post(s) like Flickr/YouTube do. It would make everything much clearer and more straightforward.)

digisth | 10 years ago | on: Did I Just Give My Permission? The Hashtag as Consent

They are covered in the TOS/EULA, but based on some of things that have happened so far and some of the things written about it, it's far from clear that this stuff would hold up in court. You might think it should, and I think it should, but who knows if a judge would agree:

http://www.jdsupra.com/legalnews/an-update-on-the-legal-impl...

http://www.buzzfeed.com/jwherrman/want-to-publish-a-twitter-... -

https://pando.com/2013/01/22/how-twitters-new-embeds-will-ma...

Clear as mud.

page 1