codyps's comments

codyps | 8 years ago | on: 832 TB – ZFS on Linux

When drives are never replaced, yes. In the case where the chassis lasts longer than the drives (which I'd imagine is often), extra cost in drives adds up.

codyps | 8 years ago | on: Linux tracing systems and how they fit together

The idea with having eBPF in the kernel is that we can limit the amount of trust given to a particular user-space task.

Accepting compiled stuff in the form of a kernel module requires root privileges and requires that the kernel essentially have complete trust in the code being loaded.

Loading eBPF eliminates the need to trust the process/user doing the loading to that level.

codyps | 9 years ago | on: Fewer mallocs in curl

`alloca` is a simple addition to the stack pointer, so a single instruction, presuming it isn't folded into the normal bump of the stack pointer to allocate the fixed size local variables. There isn't really much cost to doing a dynamic stack allocation rather than a fixed one. Variable length arrays (VLAs) allow the same thing but can be slightly more portable.

Normal C caveats do apply here though: alloca is POSIX, not C (but is widely implemented outside of POSIX systems). VLAs are an optional standard feature. Neither is required to actually use the stack for storage.

Not sure if there are any platforms supported by curl which would prevent it's use of VLAs or alloca.

codyps | 9 years ago | on: Two frequently used system calls are ~77% slower on AWS EC2

So the only things I'm seeing in the linked circonus code that differ from illumos:

1. no use of a kernel supplied page, determines skew/etc itself in userspace 2. stores information on a per-cpu level, and tries to execute cpuid on the same cpu as rdtsc.

I'm presuming you're talking about #2 (and #1 is just due to the linked item being a library without kernel integrations)? Perhaps with some more kernel support so that the actual cpu rdtsc ran on can be reliably determined?

This still doesn't clarify the part about "shared page in which the time is updated" and is read from. This statement appears to imply TSC is not (necessarily) used (otherwise I'd categorize it under "uses values from memory page to fixup TSC", like Illumos' current implimentation). I'm still not sure how that can be done reasonably.

Is there just a 1 micro second timer running whenever a user task is being executed that is bumping the value? Wouldn't that be quite a bit of overhead? Or some HW trick? I mean, you could generate a fault on every read, and have the kernel populate the current data, but that seems just as bad as a syscall.

codyps | 9 years ago | on: Two frequently used system calls are ~77% slower on AWS EC2

For those actually curious about the implementation on solaris/illumos, heres a quick rundown (from looking at current illumos source):

- comm_page (usr/src/uts/i86pc/ml/comm_page.s) is literally a page in kernel memory with specific variables that is mapped (usr/src/uts/intel/ia32/os/comm_page_util.c) as user|read-only (to be passed to userspace, kernel mapping is normal data, AFAICT)

- the mapped comm_page is inserted into the aux vector at AT_SUN_COMMPAGE (usr/src/uts/common/exec/elf/elf.c)

- libc scans auxv for this entry, and stashes the pointer it containts (usr/src/lib/libc/port/threads/thr.c)

- When clock_gettime is called, it looks at the values in the COMMPAGE (structure is in usr/src/uts/i86pc/sys/comm_page.h, probing in usr/src/lib/commpage/common/cp_main.c) to determine if TSC can be used.

- If TSC is usable, libc uses the information there (a bunch of values) to use tsc to read time (monotonic or realtime)

Variables within comm_page are treated like normal variables and used/updated within the kernel's internal timekeeping.

Essentially, rather than having the kernel provide an entry point & have the kernel know what the (in the linux case) internal data structures look like, here libc provides the code and reads the exported data structure from the kernel.

So it isn't reading the time from this memory page, it's using TSC. In the case of CLOCK_REALTIME, corrections that are applied to TSC are read from this memory page (comm_page).

codyps | 9 years ago | on: Fossil SCM

I end up using it when fixing up local changes which had broken due to upstream modifications.

It's generally not seen as very good to send things upstream that conflict with existing changes, and including merges (from master, primarily) in upstream submissions is frowned upon.

When I was using mercurial, I ended up using the mq extension [1] to provide a similar work flow. I actually prefer mq's work flow to rebasing in git (simplifies some things when maintaining a set of changes), but the programs like that for git are lacking (I've tried guilt and quilt).

codyps | 9 years ago | on: Internet firms’ legal immunity is under threat

A bit clearer with the full quote:

> Although limiting liability online was intended to protect sites hosting digital content, it carried over to service platforms

codyps | 9 years ago | on: Sccache, Mozilla’s distributed compiler cache, now written in Rust

distcc is not a cache. It doesn't keep the output of the compiler around after it builds things. It (only) distributes running the compiler around (and to do that handles some details of preprocessing, in certain modes) and then gets the output back to the requester of the compilation task.

At that point, the requester could cache that output, should they want to.

codyps | 9 years ago | on: Tup – A file-based build system for Linux, OS X, and Windows

There is a per-user limit to the number of inotify handles available (max_user_watches) and the default value is 8192.

The limit exists because there is a ~1KiB kernel memory overhead per watch (though there should really be a way for them to take part in normal memory accounting per-process).

If one wants to watch a directory tree, one needs an inotify watch handle per subdirectory in that tree. On large trees (or if more than 1 process is using inotify), that number of watches can be exceeded.

As lots of folks are looking for recursive watches, they aren't happy with needing to allocate & manage a bunch of handles when they see what they want as a single item.

That said, I'm not sure the way the kernel thinks about fs notifications internally would allow a single handle recursive watch at the moment.

In any case, the amount of info one can obtain by using fuse (or any fs like nfs or 9p) to intercept filesystem accesses is a bit larger. At the very least, one can (in most cases) directly observe the ranges of the file that were modified (though that's not quite so important for tup, afaik). There also aren't any queue overruns (which can happen in inotify) because one will just slow the filesystem operations down instead (whether this is desirable or not depends on the application).

codyps | 9 years ago | on: Tup – A file-based build system for Linux, OS X, and Windows

regarding "out of tree": I'm not quite sure about your explanation here (just looks like a list of source files), but presuming you mean "creates output files in a seperate directory from source", it doesn't really have complete support for that. You can use "variants" to place output files in a subdirectory of the source tree, though.

> "some way for tup to manage discovering the files to build"

Well, no. It's not a "convention" build tool like rust's `cargo` where you just place things in the default locations and it figures it out.

You can use the `run ./script args` mechanism in tup to run your own script that emits tup rules, though.

The manual has details: http://gittup.org/tup/manual.html

codyps | 10 years ago | on: What's in a Build Tool?

The author also doesn't cover autoconf/automake, which operates in the same vein as cmake (generating files for use by another build tool).

I agree, it would be useful to evaluate it here as it provides a bunch of the features the other "build tools" provide.

codyps | 10 years ago | on: What's in a Build Tool?

ninja? tup?

Article doesn't note the property of being able to depend upon the entire command that generates an output (ie: re-generate when compiler flags change). This is something that makes doing reliable builds much easier (when present). It's notably very hard to do in make (and even then is very inefficient).

Also, on "download" the author seems to presume that one takes the naive approach in each tool. In most cases, if one spends a bunch of time on it the downloads can be done fairly efficiently (especially in make, without even much work there). Most of these build systems are fully programmable, so the rating should probably focus more on the level of difficulty to do well (with some requirements specifying what "well" is)

codyps | 10 years ago | on: Pacman-5.0 Released

Well, the front page did recently have people complaining about a bunch of paid products (github, twitter, everything IBM does). I'd expect us to treat open source projects like any other, and part of that is by noting their failings. Incentivising fixing those (or doing the fixing) does happen in different ways, but this isn't a development forum :)

I'd say:

> don't post anything publicly if you want only [warm fuzzy feelings]"

codyps | 10 years ago | on: Pacman-5.0 Released

iirc, apt has support for it (it will only download a single thing at a time from a given host, but does appear to download different things from different hosts at the same time).

I believe gentoo's emerge also has support (though I'm not as sure here as I typically have the downloads happen in the background while building).

The overall goal here is to more effectively utilize the available bandwidth on the client side even when there are limitations a given server.

It's great that we can customize how to download 1 package at a time. But there is room for improvement.

codyps | 10 years ago | on: Pacman-5.0 Released

I don't think it would be. Parallel downloads would simply mean that more of the client's bandwidth is consumed by connecting to multiple mirrors at once. Each individual mirror will, at most, still supply the same total bytes and bandwidth (though it's likely that being able to download form multiple mirrors at once would end up lowering an individual mirror's overall load by more effectively spreading downloaders between different mirrors)

codyps | 10 years ago | on: GitHub is undergoing a full-blown overhaul as execs and employees depart

I was interested in your comparison in #1, wrt "HN community overreacted".

I've been searching through old HN stories about Eich, but can't seem to find any where the comments generally supported firing him due to supporting Prop 8 (banning gay marriage).

Is there a particular thread you had in mind? Or are you stating the overreaction was being angry that he might have been fired over that support?

codyps | 10 years ago | on: Google Analytics Opt-Out Browser Add-On

I believe the key phrase is "being used by". If they weren't reporting it, they could use the stronger "being reported to Google Analytics". Or "being collected by".

Also note that by singling out Google Analytics, it's possible that the data might be collected and used by other Google tools, just not GA. Depends on how much mental gymnastics they are willing to do.

Of course, the best thing would be someone just looking at the actual javascript and seeing what it does.

codyps | 10 years ago | on: Avoid D-Bus bus activation

I wonder if anyone has submitted patches to dbus-daemon adding support for non-systemd "daemon managers". If the article is to be believed it should be really easy. Maybe that could have been done instead of writing a blog post? Or in addition to?

codyps | 10 years ago | on: Read-only deploy keys

Now we just need branch restricted keys & keys that aren't allowed to force push (both of these would make me feel a lot better about using certain 3rd party automation in combination with my github repos).

Not that I really expect that to happen anytime soon, I believe others have been asking for the above for quite some time.

codyps | 11 years ago | on: Io.js and Node.js reconciliation proposal

Competitive pressure isn't just useful for commercial projects, it is also useful for open source projects.