Ways to evaluate a model

A NEML2 model is authored once, in Python — a Model is a plain torch.nn.Module composed from smaller registered pieces, and Python is the only authoring surface. Once a model exists, the same model can be evaluated through several different runtimes, depending on whether you are iterating interactively, training, or deploying into a host application.

Note

If you are an end user of an application built on NEML2, you evaluate models through whatever interface that application exposes — you do not choose a runtime and can stop reading here. This page is for developers integrating NEML2 into their own Python, C++, or command-line workflow. The deployment guides (Python integration, C++ integration, CLI utilities) cover getting set up; the reference pages linked below cover each route’s evaluation API.

All runtimes operate on the same starting point: a HIT input file that names one or more models. The minimal example referenced throughout lives at tutorials/models/running_your_first_model/input.i:

Listing 26 input.i
# Minimal hello-world NEML2 input file.
# A single linear isotropic elastic model named "elasticity":
#   E  = 200 GPa
#   nu = 0.3
# Maps a symmetric strain tensor (SR2) to a symmetric stress tensor (SR2).
[Models]
  [elasticity]
    type = LinearIsotropicElasticity
    coefficients      = '200e3          0.3'
    coefficient_types = 'YOUNGS_MODULUS POISSONS_RATIO'
  []
[]

One authoring surface, many runtimes

The six NEML2 evaluation routes as a host × mode matrix

Fig. 1 The six routes by host (Python / C++) and mode (eager / torch.compile / AOTI). A model is authored once in Python; the empty cell is real (there is no in-process torch.compile route in C++).

The four families differ in where the model runs and what it costs to start. The AOTI family is one artifact (.pt2 package + metadata + HIT stub) consumed by two different hosts — running a compiled model from C++ and from Python is the same compile, not two.

At a glance

Each runtime has a short codename of the form host-mode (py / cpp crossed with eager / jit / aoti). Use it in bug reports and discussion threads to say which path you’re on without a paragraph of description — “this reproduces on cpp-aoti but not py-eager” is unambiguous.

Every runtime supports forward. They differ on sensitivities and on whether they accept sub-batch models (e.g. crystal plasticity, which carries a per-slip inner batch dimension):

Codename

Entry point

Compile

Host

jvp / jacobian

Sub-batch

Primarily for

py-eager

neml2.load_model

none

Python

dev, testing, autograd training

py-jit

neml2.compile

in-process JIT

Python

✓ (native)

pyzag training loops

py-aoti

neml2.aoti.Model

offline

Python (pybind)

compiled model from Python

cpp-aoti

neml2::aoti::Model

offline

C++

production C++ deployment

cpp-dispatch

neml2::aoti::DispatchedModel

offline

C++

multi-device throughput

cpp-eager

neml2::eager::Model

none

C++ + embedded Python

compile-free C++ tests

The routes

Each route has its own reference page with the loading-and-calling API; the deployment guides cover the setup (install / build / artifacts) that comes first.

Python — set up with Python integration:

C++ — set up with C++ integration:

The three compiled routes (py-aoti, cpp-aoti, cpp-dispatch) share one artifact — see AOTI packages for its format and Compilation pipeline for how neml2-compile produces it. The command line is a fourth way to drive a model with no code at all: CLI utilities.

Choosing a runtime

  • Iterating, debugging, or training in Pythonpy-eager. Reach for py-jit (neml2.compile) only inside a pyzag loop where the residual is the hot path.

  • Deploying into a C++ applicationcpp-aoti. Switch to cpp-dispatch when you need to saturate multiple GPUs (or CPU + GPU) with one batched call.

  • Calling a compiled model from Python, or reproducing C++ numerics without a NEML2 source dependency → py-aoti.

  • C++ tests that can’t pay the compile costcpp-eager.

  • Sub-batch models (crystal plasticity and friends) run on every runtime except cpp-eager. If you need a compiled sub-batch model in C++, use cpp-aoti.

Runtimes vs. consumers

Several entry points are not runtimes themselves — they are consumers layered on one of the runtimes above:

  • neml2-run <input.i> <driver> and the Driver classes (TransientDriver, ModelUnitTest, TransientRegression, Verification) step a model through a load history on py-eager. If the input file names an AOTIModel, the same driver runs on py-aoti instead.

  • The pyzag adapter (NEML2PyzagModel) drives calibration on py-eager, optionally accelerated to py-jit via neml2.compile.

  • neml2-inspect <input.i> <model> resolves and prints a model’s input/output graph but does not evaluate it.

The full tool reference is in CLI utilities.

See also