jud_white's comments

jud_white | 8 years ago | on: NSQ – A realtime distributed messaging platform designed to operate at scale

* Disclosure: I sometimes contribute to NSQ.

We use NSQ at Dell for the commercial side of dell.com. We've been in Production with it for about 2 years.

> what are the typical use cases for NSQ ?

In the abstract, anything which can tolerate near real-time, at-least-once delivery, and does not need order guarantees. It also features retries and manual requeuing. It's typical to think order and exactly-once semantics are important because that's how we tend to think when we write code and work with (most) databases, and having order allows you to make more assumptions and simplify your approach. It typically comes at the cost of coordination or a bounded window of guarantees. Depending on your workload or how you frame the problem you may find order and exactly-once semantics are not that important, or it can be made unimportant (for example, making messages idempotent). In other cases order is important and it's worth the tradeoff; our Data Science team uses Kafka for these cases, but I'm not familiar with the details.

Here are some concrete examples of things we built using NSQ, roughly in the order they were deployed to PROD:

- Batch jobs which query services and databases to transform and store denormalized data. We process tens of millions of messages in a relatively short amount of time overnight. The queue is never the bottleneck; it's either our own code, services, or reading/writing to the database. Retries are surprisingly useful in this scenario.

- Eventing from other applications to notify a pinpoint refresh is needed for some data into the denormalized store (for example, a user updated a setting in their store, which causes a JSON model to update).

- Purchase order message queue, both for the purpose of retry and simulating what would happen if a customer on a legacy version of the backend was migrated to the new backend; also verifying a set of known 'good' orders continue to be good as business logic evolves (regression testing).

- Async invoice/email generation. This is a case where you have to be careful of at-least-once delivery and need to use a correlation ID and persistence layer to define a 'point of no return' (can't process the message again beyond this point even if it fails). We don't want to email (or bill) customers twice.

- Build system for distributing requests to our build farm.

- Pre-fetching data and hydrating a cache when a user logs in or browses certain pages, anticipating the likely next page to avoid having the user wait on these pages for an expensive service call. The client in this case is another decoupled web application; the application emitting the event is completely separate and likely on a different deployment schedule from the emitting application. The event emitted tells us what the user did, and it's the consumer's responsibility to determine what to do. This is an interesting case where we use #ephemeral channels, which disappear when the last client disconnects. We append the application's version to the channel name so multiple running versions in the same environment will each get their own copy of the message, and process it according to that binary's logic. This is useful for blue/green/canary testing and also when we're mid-deployment and have different versions running in PROD, one customer facing and one internal still being tested. I think I refer to this image more than any other when explaining NSQ's topics and channels: https://f.cloud.github.com/assets/187441/1700696/f1434dc8-60... (from http://nsq.io/overview/design.html).

Operationally, NSQ has been not just a pleasure to work with but inspirational to how we develop our own systems. Being operator friendly cannot be overrated.

Last thing, if you do monitoring with Prometheus I recommend https://github.com/lovoo/nsq_exporter.

jud_white | 10 years ago | on: Go best practices, six years in

A point of clarification for anyone skimming the original article:

> Top Tip — Libraries should never vendor their dependencies.

Peter goes on to clarify:

> You can carve out an exception for yourself if your library has hermetically sealed its dependencies, so that none of them escape to the exported (public) API layer. No dependent types referenced in any exported functions, method signatures, structures—anything.

I think this is the way to go if you're writing a library which has its own dependencies. You get a repeatable build and free yourself to change which dependencies you rely on without impacting users of your package.

There are exceptions, such as if your dependency has an init which only makes sense to run once. Loggers come to mind, where the setup should be determined by the main package. The f.Logger point in the article is friendlier to users of your package than just using log.Printf, and frees you from having to vendor logrus, for example, if you want to support structured logging.

jud_white | 10 years ago | on: Go best practices, six years in

> having no stack for errors is insanely frustrating.

Check out https://github.com/pkg/errors

If you want more rich control of the output of the stack trace there's https://github.com/go-stack/stack

jud_white | 10 years ago | on: Introduction to PostgreSQL physical storage

> Coming from pretty heavy background in MSSQL internals, this article is really great.

Do you know of any good resources for the undocumented function fn_dblog? I'm looking to understand the structure of RowLog Contents and Log Record in different Operations/Contexts to reconstruct DDL/DML.

http://www.sqlskills.com/blogs/paul/inside-the-storage-engin... is a good example but is really just an introduction.

jud_white | 10 years ago | on: Why it’s harder to forge a SHA-1 certificate than to find a SHA-1 collision

Thanks for the varying levels of explanation (thanks to viraptor too). I think part of the reason I was confused is because GitHub's web hook setup allows for a supplied shared secret which, based on what I understand from above, is not as secure as it could be unless the user ensures the shared secret has sufficient entropy. If I'm still not getting it please let me know. Thanks again.

jud_white | 10 years ago | on: Why it’s harder to forge a SHA-1 certificate than to find a SHA-1 collision

I have a question I haven't been able to find an answer to, hopefully someone here can help.

Why is HMAC+(hash) considered secure, while being considerably faster than say bcrypt with a cost of 12? For example, if a service used a user provided password to validate a "secret" (what would normally be the signed message), is that less secure than bcrypt? If so, what makes guessing the secret used in HMAC difficult?

jud_white | 10 years ago | on: Show HN: GitMark – Your GitHub Report Card

Please remove the access to Private Repositories, or make it optional.

jud_white | 10 years ago | on: Tracing JITs and Modern CPUs: Part 2

Even the wiki is a git repo, though there's no built in search like there is for repositories and issues.

jud_white | 11 years ago | on: Time.is

One of my most often used Google queries is "time in [city]"

jud_white | 11 years ago | on: The Story of Siri, by its founder [video]

Oct 4: Apple launches Siri

Oct 5: Steve Jobs dies

  One kind of side note. On October 5th, Steve Jobs died.
  He had been involved in a lot of the process leading up to it.
  We know that he was watching this launch from his house.
  I don't know what he thought about it, but I like to project
  that he saw it, said "It is good. This is the future, Apple's
  in the middle of it. I can go now." I don't know if that's true,
  but that's a projection that I like to put onto it.

I suppose this is the kind of statement you could expect from the creator of a predictive personal assistant, but wow.

jud_white | 11 years ago | on: Go 1.4 is released

> no good IDE

LiteIDE is open source, cross-platform and pretty enjoyable. https://code.google.com/p/liteide/

jud_white | 11 years ago | on: Launching in 2015: A Certificate Authority to Encrypt the Entire Web

To know something is insecure can be acceptable. To think something is secure when it isn't can be far more dangerous. I'm considering secure to mean encrypted and identity reasonably verified. Whatever your thoughts on the CA process it serves a purpose.

There are plenty of other things to complain about. EV for one.