top | item 16648295

(no title)

What’s the advantage of the file system/repo/bespoke diag database over storing the numpy arrays in the existing database infrastructure?

Doesn’t implementing this system with HDF5 cause headaches for concurrency in either direction?

discuss

joshmarlow|8 years ago

I don't know the author's use-case specifically, but that could fall over in two ways:

1) if you've got models that are re-generated periodically based on new inputs/algorithm tweaks, then you can potentially end up with quite a few of these as you scale.

2) if you want to track the details that/debug the reason your production system made a given decision, you need to log not just your model but all of the parameters that went into that decision. If that type of decision happens many times a day, then you can end up with some pretty massive logs to go through.

In either case, storing that historical data in your transactional database can be a bit of a load, so it's ideal to keep it separate if you get any kind of volume. I've actually bumped into 2) at one job.