(no title)
dsies | 4 years ago
Re 24M+ records: create a batch runner that goes through "jobs" to perform stripping/cleaning tasks. To store state (and to organize cleaners), use a distributed store such as etcd - that way you can bookmark where you were at in the cleaning process.
No comments yet.