top | item 23332874

(no title)

kadder | 5 years ago

This View point is an engineering driven view point to solve the featurization problem, the same problem can be approached in a much more simplified way which is more compatible with how practitioners work.

Have a look at some of the abstractions sagemaker has in place for A modeling workflow, abstracting featurizarion at that level is more beneficial than approaching the problem this via a core engineering driven mechanism

Based upon my experience on building a system like this, the percentage of Features reuses and searched across models is Generally lot smaller. A system which provides A publishing And a fast simple serving mechanism generally meets all the needs.

Metadata management, history, audit, lineage, search are all good to have but not critical requirements to most practitioners

Eg computing an aggregate lookup over a time range in spark, archive in S3, wrap it in a fast lookup implementation in a sickit learn transformer, and have the transformer pickle the lookup will give you offline , online parity out of box

discuss

No comments yet.