top | item 40912598

(no title)

stym06 | 1 year ago

Thanks! Could you please point me to a reference for (1)

etcd/wal actually does do preallocations (https://github.com/etcd-io/etcd/blob/24e05998c68f481af2bd567...)

Yet to implement max buffer age! Any references for this would be bomb!

Is mmap() really needed here? Came across a similar project that does this? Really gotta dig deep here! https://github.com/jhunters/bigqueue

discuss

order

vlowther|1 year ago

Can't share my references with you directly, the implementation I wrote is closed-source and is heavily intermingled with other internal bits. But I can provide examples:

1. syscall.Iovec is a struct that the writev() systemcall uses. You build it up something like this:

    func b2iov(bs [][]byte) []syscall.Iovec {
        res := []syscall.Iovec{}
        for i := range bs {
            res = append(res, syscall.Iovec{Base: &bs[i][0], Len: uint64(len(bs[i])}
        }
        return res
    }
Then, once you are ready to write:

    func write(fi *os.File, iov []syscall.Iovec, at int64) (written int64, err error) {
        if _, err = fi.Seek(at, io.SeekStart); err != nil {
            return
        }
        wr, _, errno := syscall.Syscall(syscall.SYS_WRITEV, fi.Fd(), uintptr(unsafe.Pointer(&iov[0])), uintptr(len(iov)))
        if errno != 0 {
            err = errno
            return
        }
        written = int64(wr)
        err = fi.Sync()
        return
    }
These are not tested and omit some more advanced error checking, but the basic idea is that you use the writev() system call (POSIX standard, so if you want to target Windows you will need to find its equivalent) to do the heavy lifting of writing a bunch of byte buffers as a single unit to the backing file at a known location.

2. Yeah, I just zero-filled a new file using the fallocate as well.

3. I handled max buffer age by feeding writes to the WAL using a channel, then the main reader loop for that channel select on both the main channel and a time.Timer.C channel. Get clever with the Reset() method on that timer and you can implement whatever timeout scheme you like.

4. No, it is not needed, but my WAL implementation boiled down to a bunch of byte buffers protected by a rolling CRC64, and for me just mmap'ing the whole file into a big slice and sanity-checking the rolling crcs along with other metadata was easier and faster that way.