top | item 29948658

(no title)

IWantToRelocate | 4 years ago

thank you! We will use the data for customer segmentation purposes. For that, we will need to transform all the user's sources into a single entity for a given user, for example. And link that entity/user to another table, with their purchases so we can do the customer segmentation. I'm not sure about latency but honestly that's not really important at the moment, I just need to have a strategy to make an end-to-end solution for gathering the data, transform it and delivery a concise and coherent "user table" for the machine learning dude.

discuss

order

stadium|4 years ago

You're probably looking at a users dimension table. There are different "types" of update strategies for dimension tables. I'd start by figuring out which type meets your stakeholders' needs. Some keep the history, others only keep the current state.

Usually I'd recommend to bring the raw data into your database first before transforming it. It's hard/impossible to predict future needs and this buys you flexibility. "ELT" describes this approach (vs ETL)

Good luck!