top | item 1363893

Ask HN: Does the idea of a "reduce-map" function make any sense?

1 points| dhs | 16 years ago | reply

Describing Google Search, I would say that it takes a formated string of words and returns a set of (links to) documents. I want do design a program that does the opposite: take an arbitrary text document (and maybe, at one time, a set of them), compute a formated string of words, and return that. This shall work by passing the text of the input document to a hierarchy of pattern matchers, looking for and extracting certain values and their relations. Next comes a function that takes these values and relations and knows how to distribute these data to the appropriate subfunctions, each of which computes and returns a substring; the parent then puts the substrings in the right order, formating them for the return.

I'm looking for a name for this concept, and wondered whether "reduce-map" would be a good one - maybe I could say that I reduce the document to a function, which returns a map (the formated string). To find out whether a "reduce-map" moniker had any currency, and if it did, in which context, I googled program "reduce-map":

http://www.google.com/search?hl=en&q=program+%22reduce-map%22

But due to the fact that Google doesn't search for an exact string or substring even if you format it using doublequotes (which it does seem to promise; compare 'M.I.A.'s album "/\/\/\Y/\" is ungoogleable', http://news.ycombinator.com/item?id=1363489 ), what I got back was a slew of ordinary map-reduce tutorials. So I still don't know whether "reduce-map" would mean to other people what I want it to mean. I would be thankful for your take on that.

2 comments

order
[+] rarestblog|16 years ago|reply
Its really unclear what are you trying to achieve. My wild guess would be you're trying to build a Markov chain generator (generation of random texts from sample base text).

Other than that, here are unclear parts:

"to a hierarchy of pattern matchers, looking for and extracting certain values and their relations"

What are the "certain values" and how are they "related"?

"takes these values and relations and knows how to distribute these data to the appropriate subfunctions"

What are those "appropriate subfunctions", what do they do? How do they differ for function to "know" where to send each one?

The concept of "reduce-map" doesn't seem to make any sense to me. It's like taking a word frequency (output of MapReduce) and building original text with that? You just don't have data to do that.

BTW. Google searches for doublequoted "reduce map" just fine, it's just that there's no such thing as "reduce map". You collect some data and output it piece by piece ("map"), then you aggregate it by key ("reduce"). "Map" often works as a "splitter" or a "tokenizer", it won't make sense to supply aggregated data ("reduced") to it, since aggregate data is already "tokenized".

[+] dhs|16 years ago|reply
Thanks a lot. Your "It's like taking a word frequency (output of MapReduce) and building original text with that? You just don't have data to do that" made me understand what doesn't make sense to you. Your last paragraph made me understand why it doesn't make sense to you. The weird thing is that, from my point of view, your "you don't have the data" statement is not really true, because in my model I have an oracle (more specifically, one or more human authors) which supplys the missing data (before compilation), so I can in fact "build original text" from the input. Now I'm looking for a name which describes that this happens. Any suggestions?