NEML2 2.0.0
|
In NEML2, all tensor operations are traceable. The trace of an operation records the operator type, a stack of arguments and outputs, together with additional context. Multiple operations performed in sequence can be traced together into a graph representing the flow of data through operators. Such graph representation is primarily used for two purposes:
This tutorial illustrates the utility of JIT compilation of NEML2 models, and a later tutorial demonstrates the use of backward AD to calculate parameter derivatives.
In this tutorial, let us consider the following problem
where the subscript
All three equations can be translated to ScalarVariableValue. The input file looks like
And the composed model correctly defines
Output:
Output:
NEML2 enables tracing of tensor operations lazily. No tracing is performed when the model is first loaded from the input file. Tracing takes place when the model is being evaluated for the first time. The following code can be used to view the traced graph in text format.
Output:
Output:
Note that the above graph is called a profiling graph. While it is not the most human-friendly to read, let us highlight some lines of the text output to try to associated it with the equations.
The following lines
cover three tensor operations: two aten::sub
s and one aten::div
, which correspond to prim::profile
. These wrappers allow the graph executor to record and analyze the runtime statistics of tensor operations, in order to identify hot spots to optimize.
With the profiling graph, further execution of the same traced graph automatically identifies opportunities for optimization. In summary, the following types of optimizations are enabled in NEML2 by default:
See the PyTorch JIT design document for detailed explanation on each of the optimization pass.
The code below shows that, after a few forward evaluations, the traced graph can be substantially optimized.
Output:
Output:
Note how the optimized graph successfully identifies the common subexpression
JIT optimization and compilation isn't the holy grail for improving performance of all models. For tensor operations that branch based on variable data, the traced graph cannot capture such data dependency and would potentially produce wrong results. NEML2 is unable to generate traced graphs for models that include derivatives of other models in the forward evaluation when those derivatives are defined with automatic differentiation.
Due to these limitations, certain models disable the use of JIT compilation. The most notable case is ImplicitUpdate due to its use of Newton-Raphson solvers which are in general data dependent. However, the portions of the complete model defining the implicit function to solve can often benefit from JIT compilation.
When multiple models are composed together, a single function graph is by default traced through all sub-models. However, if one of the sub-model does not allow JIT, e.g., is of type ImplicitUpdate
, then the composed model falls back to trace each individual sub-model except for those explicit disabling JIT. Therefore, it is generally recommended to compose JIT-enabled sub-models separate from those JIT-disabled ones, allowing for more optimization opportunities.
Previous | Next |
---|---|
Transient driver | Models |