Related reading: NumPy manual on Broadcasting
NEML2's broadcasting rules
NEML2's broadcasting rules are built on top of NumPy's general broadcasting rules.
When operating on two tensors, NEML2 compares each dimension's size pair one-by-one, from right to left. Before the comparison, the shapes of two tensors are aligned at the final batch dimension. The comparison starts with the trailing (i.e., rightmost) batch dimension and works its way left. Two batch dimensions are compatible (broadcastable) when
- Their sizes are equal,
- One of them has size one, or
- One of them does not exist.
If all dimensions are compatible, the two tensors are batch-broadcastable. Many operations in NEML2 require operands to be batch-broadcastable, and if not, an exception will be thrown (typically only in Debug mode).
The operands do not need to have the same batch dimension (per the third rule) in order to be batch-broadcastable. After broadcasting, each dimension of the operands are effectively expanded to have the same size as larger size of the two.
Note that we have not yet mentioned anything about the broadcasting rules for base dimensions, because the requirement for operands' base shapes largely depend on the operation itself. And for all primitive tensor types, since the base shapes are determined statically, no shape check is needed at runtime.
Example
The utility of broadcasting is best demonstrated with practical examples.
Suppose we are given two samples made of different materials, and for each sample, we are given a number of strain measurements at different locations. Assuming both materials are homogeneous, we could utilize NEML2's broadcasting rules to perform the constitutive update extremely efficiently.
C++ @source:src1
#include <torch/torch.h>
#include "neml2/tensors/SSR4.h"
#include "neml2/tensors/SR2.h"
#include "neml2/tensors/Scalar.h"
int
main()
{
set_default_dtype(kFloat64);
auto youngs_modulus = Scalar::create({1e5, 2e5});
auto poissons_ratio = Scalar::create({0.1, 0.2});
auto C = SSR4::isotropic_E_nu(youngs_modulus, poissons_ratio);
auto stress = C * strain;
std::cout <<
" Shape of C:" << C.
batch_sizes() << C.base_sizes() << std::endl;
std::cout << "Shape of strain: " << strain.batch_sizes() << strain.base_sizes() << std::endl;
}
The symmetric second order tensor.
Definition SR2.h:46
TraceableTensorShape batch_sizes() const
Definition TensorBaseImpl.h:189
TensorShapeRef base_sizes() const
Definition TensorBaseImpl.h:196
Definition DiagnosticsInterface.cxx:30
constexpr auto kFloat64
Definition types.h:50
int64_t Size
Definition types.h:65
@endsource
Output:
Python @source:src2
import torch
from neml2.tensors import Scalar, SR2, SSR4
ns = 2
nm = 1000
youngs_modulus = Scalar(torch.tensor([1e5, 2e5], dtype=torch.float64))
poissons_ratio = Scalar(torch.tensor([0.1, 0.2], dtype=torch.float64))
C = SSR4.isotropic_E_nu(youngs_modulus, poissons_ratio)
strain = SR2(torch.rand(nm, ns, 6, dtype=torch.float64))
stress = C * strain
print(" Shape of C:", C.batch.shape, C.base.shape)
print("Shape of strain:", strain.batch.shape, strain.base.shape)
print("Shape of stress:", stress.batch.shape, stress.base.shape)
@endsource
Output:
To better understand broadcasting, let us try to apply the broadcasting rules manually:
- Align the shapes at the final batch dimension (i.e., align the semicolons):
Shape of C: (2; 6, 6)
Shape of strain: (1000, 2; 6)
- Examine batch sizes from right to left:
Shape of C: (2; 6, 6)
Shape of strain: (1000, 2; 6)
▲
The sizes are equal – the two operands are compatible at the final batch dimension.
- Continue to the next batch dimension:
Shape of C: (2; 6, 6)
Shape of strain: (1000, 2; 6)
▲
One of the size does not exist – the two operands are compatible at the second last batch dimension.
- All batch dimensions are compatible, therefore the two operands are batch-broadcastable.
- The resulting batch shapes are effectively the "expand" of the two:
Shape of C: (1000, 2; 6, 6)
▲
expanded (without copying)
Shape of strain: (1000, 2; 6)