(no title)
cheatsheet | 11 years ago
Can someone knowledgeable in graphics research explain the context that this question comes from?
If I am reading the question correctly, I infer that the question suggests that there exists a right way to reproduce the visual experience of reality. To me, this sounds like a question that is equally valid to have no answer (or many answers) in aesthetics, art, and philosophy, etc.
rasz_pl|11 years ago
Its Plato's Allegory of the Cave all the way down.
Imagine "watching" a movie compressed using your very own prior knowledge. Every scene could be described in couple of hundred lines of plaintext. Today we do this by reading a book :) What if we could build an algorithm able to render movies from books?
sedachv|11 years ago
Bob Coyne has been working on a system for generating images of still scenes from text descriptions for about 15 years now:
https://www.wordseye.com/ http://www.cs.columbia.edu/~coyne/papers/wordseye_siggraph.p...
TheGrassyKnoll|11 years ago
"The world is such and such or so and so, only because we talk to ourselves about its being such and such and so and so..." Carlos Castaneda
digi_owl|11 years ago
throwawaymaroon|11 years ago
[deleted]
tel|11 years ago
For a long time statisticians wrangled over this word in a reduced context. The "art" of statistics is to build a model of the world which is sufficiently detailed to capture interesting data but not so detailed to make it difficult to interpret as a human decision-maker. Statisticians usually solve this problem by building a lot of models, getting lucky, presenting things to people and seeing what sticks.
For a long time this lack of a notion of "rightness" was so powerful that it precluded advancement of the field in certain ways.
With the advent of computers we discovered a new, even more precise form of "right" however and this formed the bedrock of Machine Learning. The "right" ML is concerned with is predictive power. A model is "right" when it leads to a training and prediction algorithm which is "probably, approximately correct", e.g. you can feed real data in and end up with something useful (with a high degree of probability).
So with respect to computer vision we know that it is very difficult to build "efficient" algorithms, ones which work well while using a reasonable amount of training data. CV moved forward when it realized that there were representations of the visual field which led to better predictive power---these were originally generated by studying the visual center of human and animal brains, but more recently have been generated "naively" by computers.
So, there's a reasonably well-defined way that we can find the "right" representation of visual scenes: if we find one which ultimately is best-in-class of all representations for any choice of ML task then it's "right".
darkmighty|11 years ago
So in some sense optimal compression gives the best you could hope, up to limitations of the probabilistic models, which is why I like this explanation.
darkmighty|11 years ago
For example, if you can extract a 'Mesh' from a 2D picture, you can generate many other view points, and that mesh can be considered a good representation. If you are more sophisticated however (and perhaps have a larger "dictionary"), you can instead extract 'There are two wooden chairs 1m from each other, ...'.
That's the sense in which the representation is fundamental to computer vision -- it distills what the system knows (or what it wants to know) about scenes. The more concise the representation without loss of information the smarter your system is (and past a point becomes a general AI problem).
nabla9|11 years ago
See for example:
Natural Image Statistics β A probabilistic approach to early computational vision https://www.cs.helsinki.fi/u/ahyvarin/natimgsx/