top | item 40390646

(no title)

V1ndaar | 1 year ago

I would blame the majority of your criticism on the fact that HN is not the best place to read code. Also, syntax highlighting & basic familiarity with Nim helps.

His code is doing a few more things than necessary. The actual algorithm is inside the `uniqCEcvm` template. The `it` it receives is anything you can iterate over (a collection or an iterator). Multiple things in one line really only appear where they directly relate to the part at the beginning of the line.

The `when isMainModule` is Nim's way of Python's `if __name__ == "__main__"`. The entire part below that is really just a mini CL interface to bench different (random) examples. Final thing to note maybe, the last expression of a block (e.g. of the template here) will be returned.

And well, the style of comments is just personal preference of course. Whether you prefer to stay strictly below 80 cols or not, shrug.

I grant you that the usage of 2 sets + pointer access to them + swapping makes it harder to follow than necessary. But I assume the point of it was not on "how to write the simplest looking implementation of the algorithm as it appears in the paper". But rather to showcase a full implementation of a reasonably optimized version.

Here's a version (only the algorithm) following the paper directly:

    proc estimate[T](A: seq[T], ε, δ: float): float =
      let m = A.len
      var p = 1.0
      var thr = ceil(12.0 / ε^2 * log2(8*m.float / δ))
      var χ = initHashSet[T](thr.round.int)
      for i in 0 ..< m:
        χ.excl A[i]
        if rand(1.0) < p:
          χ.incl A[i]
        if χ.card.float >= thr:
          for el in toSeq(χ): # clean out ~half probabilistically
            if rand(1.0) < 0.5:
              χ.excl el
          p /= 2.0
          if χ.card.float >= thr:
            return -1.0
      result = χ.card.float / p

discuss

menthe|1 year ago

Thank you very much for having taken the time.. Your comments and function are both very helpful!