top | item 42155262

(no title)

I've always liked scatter solutions for these kind of problems:

  import numpy as np
  
  def scatter_mean(index, value):
      sums = np.zeros(max(index)+1)
      counts = np.zeros(max(index)+1)
      for i in range(len(index)):
          j = index[i]
          sums[j] += value[i]
          counts[j] += 1
      return sums / counts
  
  def scatter_max(index, value):
      maxs = -np.inf * np.ones(max(index)+1)
      for i in range(len(index)):
          j = index[i]
          maxs[j] = max(maxs[j], value[i])
      return maxs
  
  def scatter_count(index):
      counts = np.zeros(max(index)+1, dtype=np.int32)
      for i in range(len(index)):
          counts[index[i]] += 1
      return counts
  
  id = np.array([1, 1, 1, 2, 2, 2]) - 1
  sales = np.array([4, 1, 2, 7, 6, 7])
  views = np.array([3, 1, 2, 8, 6, 7])
  means = scatter_mean(id, sales).repeat(scatter_count(id))
  print(views[sales > means].max())

Obviously you'd need good implementations of the scatter operations, not these naive python for-loops. But once you have them the solution is a pretty readable two-liner.

discuss

No comments yet.