top | item 16868624

(no title)

eshvk | 7 years ago

Not really sure what your central question then is: The general theme for building a recommender system architecture is:

1. Decide whether you are okay with a batch approach or an online learning approach or a hybrid.

2. Start simple with a batch approach (similar to what you are doing):

a) Get features ready from your dataset (assuming you have interaction data) : Pre-processing via some big data framework (Map Reduce, Data flow etc)

b) Build a vector space and nearest neighbors datastructures.

c) Stick both into a database optimized for reads

d) Stick a service in front of it and serve.

Once you are happy with 2, you can try out variations involving either online updates to your recommender system which involves changes to the type of database you might want to optimize. etc

discuss

n_siddharth|7 years ago

I guess I should elaborate a little more on what I am looking for. I already have a hybrid approach working where batch processing informs and improves the ALS models and these models are stored in memory to do some (Near)real time recommendations. The attractive bit about using something like Solr is that the user behavior on the app/front-end is easily modeled in terms or query parameters that could help serve better recommendations and also improve the model. It also seems to be the most commonly used approach based on what I have seen. I am wondering if there are other ways of doing this. In the broader sense, I guess my question is what are the next steps to build on this basic recommendation system? What is a good way to serve recommendations based on user behavior in near-real time and how do these systems take feedback to improve the models.

eshvk|7 years ago

> In the broader sense, I guess my question is what are the next steps to build on this basic recommendation system? What is a good way to serve recommendations based on user behavior in near-real time and how do these systems take feedback to improve the models.

In the past, I have helped build Lambda architectures where we use a batch model to build a content vector space, build estimates of users in batch, update those in realtime (using PubSub/Kafka) based on user feedback.

Other online mechanisms could be to use Contextual Bandits: e.g. use context in terms of user interactions with the several arms of the bandits being recommendation choices etc. This interaction data can be used to continuously improve your policy. Of course, the key benefit over a Matrix Factorization setup where the interaction matrix is continuously rebuilt over time based on new data, is the in built exploration which minimizes regret.

chudi|7 years ago

this is a really good answer, at my job we use this approach