What Is a Tensor? Linear Algebra for Deep Learning
Jun 24, 2026 · 5 min read
For almost everyone new to deep learning, one of the first intimidating words is tensor. It shows up on every line of the PyTorch or JAX docs and sounds like something borrowed from physics. The reality is far more modest: a tensor is nothing more than an array of numbers with a particular shape. In this post we'll climb that ladder one rung at a time, and along the way notice something more important — that essentially all of deep learning is moving numbers through the right shapes.
From scalar to tensor: a ladder
Rather than memorizing a single definition, it helps to picture a ladder that rises by number of dimensions.
At the bottom sits the scalar: a single number, such as the 0.001 you might
use as a learning rate. One rung up is the vector, an ordered list of numbers;
a word embedding is the canonical example. The third rung is the matrix — a
table of rows and columns, which is the usual form of a neural network layer's
weights. A tensor is simply the general name for all of these: numbers
arranged on a regular grid, with however many dimensions you need.
| Object | Dims | Example shape | Typical use |
|---|---|---|---|
| Scalar | 0 | () | learning rate, loss |
| Vector | 1 | (n,) | embedding, bias |
| Matrix | 2 | (m, n) | weight layer |
| Tensor | N | (a, b, c, …) | image batch, sequence |
A concrete example grounds the abstraction: a batch of 32 RGB images at 224×224 is
a four-dimensional tensor of shape (32, 3, 224, 224), where the numbers denote
batch size, channels, height, and width. A tensor, in other words, is just a
slightly more formal way of saying "multi-dimensional container of numbers."
The real protagonist: shape
When you work with tensors, the thing to keep alive in your mind is not the values inside them but their shape. Nearly every operation in deep learning amounts to reshaping, multiplying, and adding tensors according to fixed rules, so in practice a model "working" largely means its shapes line up.
The most fundamental operation that joins two tensors is matrix multiplication,
and it is only defined when the inner dimensions agree: multiply an (m, k) matrix
by a (k, n) matrix and you get (m, n); the middle ks must match. On top of
this sit the broadcasting rules, which let tensors of different shapes be
automatically expanded and combined — a mechanism that quietly kicks in across
your code and deserves a post of its own.
Why linear algebra is the language
Written in its plainest form, a neural network layer is a matrix multiplication plus a bias term:
Here holds the learned weights, is the input, is the bias, and is the layer's output. On its own this is just a linear transformation; it acquires its real power once we stack such layers and slot a non-linearity between them. The word "deep" refers precisely to this stack.
The example below shows a single layer mapping a four-element input to a three-element output:
import torch
x = torch.randn(4) # input vector, shape (4,)
W = torch.randn(3, 4) # weights, shape (3, 4)
b = torch.randn(3) # bias, shape (3,)
y = W @ x + b # output vector, shape (3,)
print(y.shape) # torch.Size([3])The W @ x term multiplies (3, 4) by (4,) to produce (3,); the only
requirement is that the inner dimensions — the two 4s — agree. Adding the bias
leaves the shape unchanged and merely shifts each component. All of deep learning,
stripped down, is this step applied again and again with the right shapes and
suitable non-linearities; what remains is to find good values for and
, which is the job of gradient descent.
Beyond the numbers: geometric intuition
Treating linear algebra as mere arithmetic misses half the picture. You can think of a vector as a point or a direction in space, and a matrix as a transformation that stretches, rotates, or scales that space. Multiplying by moves the input into a new space, and each layer of a network nudges the data toward a representation where the classes are easier to separate. This geometric view will make notions like "similarity" and "distance" feel far more natural when we later reach embeddings and attention.
Common pitfalls
A surprising share of the errors you'll hit are not conceptual but shape
mismatches in disguise. Confusing a row vector with a column vector, forgetting a
transpose, or placing the batch dimension in the wrong slot all reduce to the same
class of bug. The single most useful habit, then, is to sprinkle print(x.shape)
at the critical points of your code and confirm at each step that the dimensions
are what you expect. That small reflex resolves at a glance what could otherwise be
hours of debugging.
Where this goes next
Now that we've seen what a tensor is and why a layer is just a matrix multiplication, the natural question is how the model actually finds "good" values for those weights. The answer lives in the optimization loop at the heart of deep learning — gradient descent — which the next post takes apart, again from scratch.