Automatic differentiation¶
Calibrating a material model means finding parameter values that make
the model’s predictions match observations. The gradient of a scalar
loss with respect to the parameters is what every gradient-based
optimizer (SGD, Adam, L-BFGS, …) needs to take a step. NEML2 piggy-backs
on PyTorch’s autograd engine to provide that gradient for free: every
Model is a torch.nn.Module, every trainable parameter is a
torch.nn.Parameter, and every tensor operation inside the model is
traced.
This page walks through the mechanics: enable the gradient tape, run
the model, build a scalar loss, call .backward(), and read the
per-parameter gradient off parameter.grad.
The model¶
We reuse the linear-isotropic-elastic model from Running your first model so the focus stays on autograd rather than the physics:
# Same linear-isotropic-elastic "elasticity" model used in the earlier
# tutorials, repeated here so the autograd page is self-contained.
# E = 200 GPa
# nu = 0.3
[Models]
[elasticity]
type = LinearIsotropicElasticity
coefficients = '200e3 0.3'
coefficient_types = 'YOUNGS_MODULUS POISSONS_RATIO'
[]
[]
import neml2
model = neml2.load_model("input.i", "elasticity")
model
LinearIsotropicElasticity()
Parameters are nn.Parameters¶
Because Model derives from torch.nn.Module, the model’s named
parameters show up under model.named_parameters() exactly like any
other PyTorch module:
for name, p in model.named_parameters():
print(f"{name:>4} = {p.item():<10g} requires_grad={p.requires_grad}")
E = 200000 requires_grad=True
nu = 0.3 requires_grad=True
model.E and model.nu are NEML2 Scalar wrappers around those
parameters; the underlying torch.nn.Parameter lives at model.E.data
and model.nu.data. Because parameters default to requires_grad=True,
no extra opt-in is needed — every forward pass through the model
already records the autograd graph.
Note
If you want to freeze a parameter (treat it as a constant during
calibration), set requires_grad=False on its .data:
model.nu.data.requires_grad_(False)
A scalar loss, end-to-end¶
Pick a small uniaxial-strain input and evaluate the model:
import torch
from neml2.types import SR2
strain = SR2.fill(0.01, 0.0, 0.0, 0.0, 0.0, 0.0)
stress = model(strain)
stress
SR2(data=tensor([2692.3076, 1153.8462, 1153.8462, 0.0000, 0.0000, 0.0000],
grad_fn=<AddBackward0>), sub_batch_ndim=0, sub_batch_state=(), sub_batch_meta=(), k_ndim=0, k_state=(), k_pairing=())
Notice the grad_fn=… annotation on the underlying tensor — the
forward call wired up the autograd graph that connects stress back to
model.E and model.nu. Any scalar function of stress can now be
back-propagated. As a stand-in for a real calibration objective, drop
down to the underlying torch.Tensor via stress.data so we can
call PyTorch’s tensor .norm() directly — autograd is preserved
through this access:
loss = stress.data.norm()
loss
tensor(3148.2126, grad_fn=<LinalgVectorNormBackward0>)
loss.backward()
print(f"dL/dE = {model.E.data.grad.item():.6g}")
print(f"dL/dnu = {model.nu.data.grad.item():.6g}")
dL/dE = 0.0157411
dL/dnu = 12849.5
That is the entire autograd workflow:
Load the model (parameters already track gradients).
Evaluate the model on some input.
Combine the output(s) into a scalar
loss.Call
loss.backward().Read each parameter’s gradient off
parameter.data.grad.
Per-component derivatives via torch.autograd.grad¶
.backward() accumulates into .grad, which is convenient inside a
training loop but inconvenient when you want a one-off derivative
without side effects. torch.autograd.grad returns the gradient
directly:
strain = SR2.fill(0.01, 0.0, 0.0, 0.0, 0.0, 0.0)
stress = model(strain)
# Differentiate the xx-component of stress w.r.t. both parameters.
sigma_xx = stress.data[0]
dE, dnu = torch.autograd.grad(sigma_xx, [model.E.data, model.nu.data])
print(f"d(sigma_xx)/dE = {dE.item():.6g}")
print(f"d(sigma_xx)/dnu = {dnu.item():.6g}")
d(sigma_xx)/dE = 0.0134615
d(sigma_xx)/dnu = 7544.38
For a uniaxial-strain test the closed-form prediction is \(\sigma_{xx} = (1-\nu)\,E\,\varepsilon_{xx}/[(1+\nu)(1-2\nu)]\), which gives \(\partial \sigma_{xx}/\partial E = (1-\nu)\varepsilon_{xx}/[(1+\nu)(1-2\nu)] \approx 0.01346\) at \(E=200\) GPa, \(\nu=0.3\), \(\varepsilon_{xx}=0.01\) — matching the autograd value to machine precision.
Warning
Each call to .backward() adds to .grad. In a training loop you
must zero the gradients between optimizer steps, either explicitly
(model.E.data.grad = None) or via the optimizer’s
zero_grad() method. Forgetting to do so leads to silently wrong
updates.
What about implicit models?¶
For an implicit model — one whose output comes from an internal Newton
solve, e.g. a viscoplastic update — autograd still flows back to the
parameters with no changes to your training code. The reason is that
ImplicitUpdate overrides the backward rule: at the converged
solution it applies the implicit function theorem so the gradient
comes from a single linear solve against the Jacobian at the fixed
point, instead of recording every intermediate iterate from every
step in the autograd tape. The result is the analytically exact
derivative at the converged point, and the backward pass costs one
linear solve regardless of how many Newton iterations the forward
pass took.
You don’t have to do anything special to take advantage of this — as
long as the implicit residual is wrapped in ImplicitUpdate, autograd
on the model output flows straight back to the parameters just like in
the explicit case above. See
Implicit models for how implicit models are
assembled in NEML2.
Where to go next¶
Parameter calibration puts these gradients to work in a full calibration loop with a PyTorch optimizer.
For recurrent (time-stepped) calibration where the loss depends on an entire load history, see the
pyzagadapter at Recurrent calibration with pyzag.