(no title)
sea6ear | 1 year ago
I think that given that you are not really dealing with structured data - you've said that different sites have different structures, and I assume even with processing, you may not be able to generate identical metadata structures from each entry.
I think I would go for one column of XML, plus maybe another column that holds a parsed data structure that represents the result of your processing (basically a cache holding the post-processed version of each site). Hopefully that could be re-evaluated by whatever language (Python?) you are using for your application. That way you don't have to do the full parsing each time you want to examine the entry, but you have access to something that can quickly give you whatever metadata is associated with it, but which doesn't tie you to the rigid structure of a table based database.
Once you know what you are really doing with the data, then you could add additional metadata columns that are more rigid, and which can be queried directly in SQL as you identify patterns that are useful for performance.
vivzkestrel|1 year ago