(no title)
rckoepke | 3 years ago
Are there more appropriate algorithms for finding convex hulls where dimensions are ~768? Or any parallelized / GPU-optimized options that I should look into?
rckoepke | 3 years ago
Are there more appropriate algorithms for finding convex hulls where dimensions are ~768? Or any parallelized / GPU-optimized options that I should look into?
blamestross|3 years ago
You are going to have to roll your own.
One trick you can use is that most convex hull algorithms chase O(nlg(n)). That lg(n) implies a branching step which lowers efficiency on GPUs. Your coefficients in high dimensions likely mean an O(n^2) branchless algorithm could run faster on a GPU.
Cull points aggressively too, for what little that is worth in high dimensions.
I found https://www.sciencedirect.com/science/article/abs/pii/S01678... which looks like it could be a starting point.
The real problem is that in dimensions that high, the point set probably already is the hull and all this is a zero signal gain operation.
rckoepke|3 years ago
Well, if I have 10,000 samples of a 768-dimension volume, most of those points will probably be inside the volume, and not per se a vertex of the hull.
I’m very comfortable rolling my own solution, so thank you for pointing me to Jarvis’ algorithm!
unknown|3 years ago
[deleted]
rcme|3 years ago
apwheele|3 years ago
There may be more general expected value formulas for higher dimensions, I only know of examples in two-dimensions offhand, https://blogs.sas.com/content/iml/2021/12/06/expected-number....
There may be smarter reduced form embeddings though to make pretty pictures, e.g. https://www.youtube.com/watch?v=sD-uDZ8zXkc&ab_channel=Cynth....
rcme|3 years ago
Wikipedia has more on this: https://en.wikipedia.org/wiki/Curse_of_dimensionality
> This is often cited as distance functions losing their usefulness (for the nearest-neighbor criterion in feature-comparison algorithms, for example) in high dimensions. However, recent research has shown this to only hold in the artificial scenario when the one-dimensional distributions R \mathbb {R} are independent and identically distributed.[13] When attributes are correlated, data can become easier and provide higher distance contrast and the signal-to-noise ratio was found to play an important role, thus feature selection should be used.
> More recently, it has been suggested that there may be a conceptual flaw in the argument that contrast-loss creates a curse in high dimensions. Machine learning can be understood as the problem of assigning instances to their respective generative process of origin, with class labels acting as symbolic representations of individual generative processes. The curse's derivation assumes all instances are independent, identical outcomes of a single high dimensional generative process. If there is only one generative process, there would exist only one (naturally occurring) class and machine learning would be conceptually ill-defined in both high and low dimensions. Thus, the traditional argument that contrast-loss creates a curse, may be fundamentally inappropriate. In addition, it has been shown that when the generative model is modified to accommodate multiple generative processes, contrast-loss can morph from a curse to a blessing, as it ensures that the nearest-neighbor of an instance is almost-surely its most closely related instance. From this perspective, contrast-loss makes high dimensional distances especially meaningful and not especially non-meaningful as is often argued.