top | item 41226329

(no title)

max_likelihood | 1 year ago

I've always thought the use of "Tensor" in the "TensorFlow" library is a misnomer. I'm not too familiar with ML/theory, is there a deeper geometric meaning to the multi-dimensional array of numbers we are multiplying or is "MatrixFlow" a more appropriate name?

discuss

order

adrian_b|1 year ago

Since the beginning of computer technology, "array" is the term that has been used for any multi-dimensional array, with "vectors" and "matrices" being special kinds of arrays. An exception was COBOL, which had a completely different terminology in comparison with the other programming languages of that time. Among the long list of differences between COBOL and the rest were e.g. "class" instead of "type" and "table" instead of "array". Some of the COBOL terminology has been inherited by languages like SQL or Simula 67 (hence the use of "class" in OOP languages).

A "tensor", as used in mathematics in physics is not any array, but it is a special kind of array, which is associated with a certain coordinate system and which is transformed by special rules whenever the coordinate system is changed.

The "tensor" in TensorFlow is a fancy name for what should be called just "array". When an array is bidimensional, "matrix" is an appropriate name for it.

twothreeone|1 year ago

I agree. Just like NumPy's Einsum. "Multi-Array Flow" doesn't sound sexy and associating your project with a renowned physicist's name gives your project that "we solve big science problems" vibe by association. Very pretentious, very predictable, and very cringe.

MathMonkeyMan|1 year ago

The joke I learned in a Physics course is "a vector is something that transforms like a vector," and "a tensor is something that transforms like a tensor." It's true, though.

The physicist's tensor is a matrix of functions of coordinates that transform in a prescribed way when the coordinates are transformed. It's a particular application of the chain rule from calculus.

I don't know why the word "tensor" is used in other contexts. Google says that the etymology of the word is:

> early 18th century: modern Latin, from Latin tendere ‘to stretch’.

So maybe the different senses of the word share the analogy of scaling matrices.

ogogmad|1 year ago

The mathematical definition is 99% equivalent to the physical one. I find that the physical one helps to motivate the mathematical one by illustrating the numerical difference between the basis-change transformation for (1,0)- and (0,1)-tensors. The mathematical one is then simpler and more conceptual once you've understood that motivation. The concept of a tensor really belongs to linear algebra, but occurs mostly in differential geometry.

There is still a "1% difference" in meaning though. This difference allows a physicist to say "the Christoffel symbols are not a tensor", while a mathematician would say this is a conflation of terms.

TensorFlow's terminology is based on the rule of thumb that a "vector" is really a 1D array (think column vector), a "matrix" is really a 2D array, and a "tensor" is then an nD array. That's it. This is offensive to physicists especially, but ¯\_(ツ)_/¯

Koshkin|1 year ago

> something that transforms

Well, they don't, it is their components that do (under a change of the coordinate system).

itishappy|1 year ago

The tensors in tensorflow are often higher dimensional. Is a 3d block of numbers (say 1920x1080x3) still a matrix? I would argue it's not. Are there transformation rules for matrices?

You're totally correct that the tensors in tensorflow do drop the geometric meaning, but there's precedence there from how CS vs math folk use vectors.

andrewla|1 year ago

Matrices are strictly two-dimensional arrays (together with some other properties, but for a computer scientist that's it). Tensors are the generalization to higher dimensional arrays.

blt|1 year ago

There is no geometric meaning. It's a really bad name.

dannymi|1 year ago

In the first example on https://www.tensorflow.org/api_docs/python/tf/math/multiply you can see that they use the Hadamard product (not the matrix product):

    x = tf.constant(([1, 2, 3, 4]))
    tf.math.multiply(x, x)
    <tf.Tensor: shape=(4,), dtype=..., numpy=array([ 1,  4,  9, 16], dtype=int32)>
I could stop right here since it's a counterexample to x being a matrix (with a matrix product defined on it; P.S. try tf.matmul(x, x)--it will fail; there's no .transpose either). But that's only technically correct :)

So let's look at tensorflow some more:

The tensorflow tensors should transform like vectors would under change of coordinate system.

In order to see that, let's do a change of coordinate system. To summarize the stuff below: If L1 and W12 are indeed tensors, it should be true that A L1 W12 A^-1 = L1 W12.

Try it (in tensorflow) and see whether the new tensor obeys the tensor laws after the transformation. Interpret the changes to the nodes as covariant and the changes to the weights as contravariant:

    import tensorflow as tf
    # Initial outputs of one layer of nodes in your neural network
    L1 = tf.constant([2.5, 4, 1.2], dtype=tf.float32)
    # Our evil transformation matrix (coordinate system change)
    A = tf.constant([[2, 0, 0], [0, 1, 0], [0, 0, 0.2]], dtype=tf.float32)
    # Weights (no particular values; "random")
    W12 = tf.constant(
        [[-1, 0.4, 1.5],
         [0.8, 0.5, 0.75],
         [0.2, -0.3, 1]], dtype=tf.float32
    )
    # Covariant tensor nature; varying with the nodes
    L1_covariant = tf.matmul(A, tf.reshape(L1, [3, 1]))
    A_inverse = tf.linalg.inv(A)
    # Contravariant tensor nature; varying against the nodes
    W12_contravariant = tf.matmul(W12, A_inverse)
    # Now derive the inputs for the next layer using the transformed node outputs and weights
    L2 = tf.matmul(W12_contravariant, L1_covariant)
    # Compare to the direct way
    L2s = tf.matmul(W12, tf.reshape(L1, [3, 1]))
    #assert L2 == L2s
A tensor (like a vector) is actually a very low-level object from the standpoint of linear algebra. It's not hard at all to make something a tensor. Think of it like geometric "assembly language".

In comparison, a matrix is rank 2 (and not all matrices represent tensors). That's it. No rank 3, rank 4, rank 1 (!!). So what does a matrix help you, really?

If you mean that the operations in tensorflow (and numpy before it) aren't beautiful or natural, I agree. It still works, though. If you want to stick to ascii and have no indices on names, you can't do much better (otherwise, use Cadabra[1]--which is great). For example, it was really difficult to write the stuff above without using indices and it's really not beautiful this way :(

More detail on https://medium.com/@quantumsteinke/whats-the-difference-betw...

See also http://singhal.info/ieee2001.pdf for a primer on information science, including its references, for vector spaces with an inner product that are usually used in ML. The latter are definitely geometry.

[1] https://cadabra.science/ (also in mogan or texmacs) - Einstein field equations also work there and are beautiful

andrewla|1 year ago

In TensorFlow the tf.matmul function or the @ operator perform matrix multiplication. Element-wise multiplication ends up being useful for a lot of paralellizable computation but should not be confused with matrix multiplication.