Info

This document is based on registry.pfcomputing.internal/mncore-sdk/mncore-sdk-minimal:0.3.

MLSDK documentation

class mlsdk.Context(device: MNDevice)

compile(function: Callable[[Dict[str, Tensor]], Dict[str, Tensor]], inputs: Mapping[str, Tensor | TensorProxy], codegen_dir: Path, *, options: Dict[str, Any] | None = None, cache_options: CacheOptions | None = None, num_compiler_threads: int | None = None, quiet: bool = True, exit_after_generate_codegen_dir: bool = False, optimizers: List[Optimizer] | None = None, export_kwargs: Dict[str, Any] | None = None, training: bool = True, initialize: bool | None = True, optimizer_spec: List[OptimizerSpecParamGroup] | None = None, optional_options: Set[str] | None = None, group: ProcessGroup | None = None, predefined_symbols: Dict[str, MNDeviceBuffer] | None = None) → CompiledFunction

Compile a Python callable to a function that can be executed on the device.

Parameters:
- function – The Python callable to compile.
- inputs – Sample inputs to the function.
- codegen_dir – The directory to store intermediate and generated files.
- options –
  
  You can specify compile options here. Since there are so many options, we provide predefined options in the preset_options directory. If you use the MLSDK image, there should be O0.json - O4.json in the /opt/pfn/pfcomp/codegen/preset_options directory. Options from a file with a higher number will perform more advanced optimizations, at the cost of the compilation time. Among all available options, setting float_dtype is also crucial to prevent unintended precision degradation. The MLSDK defaults to mixed precision operations (mixed), GEMM operations using half precision. To avoid this, set float_dtype to float. This argument also accept Among all available options, setting float_dtype is also crucial to prevent unintended precision degradation. This option controls which floating-point type to assign for torch.float32 tensors, and takes the following values:
  - mixed (default): in/out of GEMM are half, otherwise are float.
  - half, float, double: assign the type for all such tensors.
  You can specify these compile options like the following: You can specify these compile options like the following:
```
context.compile(
    ...
    options={
        "option_json": "preset_options/O4.json",
        "float_dtype": "float",
    },
)
```
- cache_options – Options for caching. See CacheOptions for details.
- num_compiler_threads – The number of threads to use for compilation. If None, the number of threads will automatically be determined.
- quiet – If True, suppress output from the compiler.
- exit_after_generate_codegen_dir – For internal use only. If True, exit after generating the codegen directory. This is useful for decomposed layers test.
- optimizers – For internal use only. A list of PyTorch optimizers to use for training.
- export_kwargs – For internal use only. kwargs related to exporting the model to ONNX.
- training – For internal use only. If True, the function is used for training.
- optimizer_spec – For internal use only. The optimizer spec to use for training.
- initialize – For internal use only. TODO (akirakawata): Add description.
- optional_options – For internal use only. TODO (akirakawata): Add description.
- group – For internal use only. TODO (akirakawata): Add description.
- predefined_symbols – For internal use only. TODO (akirakawata): Add description.

get_registered_value_proxy(value: Tensor) → TensorProxy

Get the TensorProxy for the given value if it is registered in the context. :param value: The torch.Tensor to get the proxy for. :return: The TensorProxy for the given value.

load_codegen_dir(codegen_dir: Path) → CompiledFunction

Load a function that can be executed on the device from codegen_dir without validation.

Parameters: codegen_dir – The directory to load compile results files.

NOTE

This method will fail if the required compiled artifact, model.app.zst, is not found within the codegen_dir. Be aware that the returned function is strict; it requires an input dictionary with the exact same keys (variable names) and tensor shapes as the input used during the original compilation.

register_buffer(buffer: Tensor) → None

Registers a buffer in the context.

NOTE

Before calling this method, you must set the name of the buffer using set_tensor_name_in_module or set_tensor_name.

register_optimizer_buffers(optimizer: MNCoreOptimizer) → None

Registers optimizer buffers in the context.

NOTE

Before calling this method, you must set the name of the buffer using set_buffer_name_in_optimizer or set_tensor_name.

register_param(param: Parameter) → None

Registers a parameter in the context.

NOTE

Before calling this method, you must set the name of the parameter using set_tensor_name_in_module or set_tensor_name.

synchronize() → None

Synchronizes the context by moving tensors to the torch framework and marks the context for initialization.

This method performs the following steps:

Calls the synchronize() method of the device associated with the context.
Iterates over all tensor names in the registry and moves each tensor to the torch framework.

NOTE

Different from sync torch.cuda.synchronize(), which only wait for all kernels in all streams on a CUDA device to complete. This function also moves all tensors in the context’s registry from the device to the host.

class mlsdk.CompiledFunction(context: Context, code_block: _CompiledFunction, *, output_signature: ValueSignature | None = None)

allocate_input_proxy() → Dict[str, TensorProxy]

Allocate input proxies for the function. :return: A dictionary mapping input names to their corresponding TensorProxy objects.

class mlsdk.TensorProxy(context: Context, codegen_data: TensorProxyCodegenData, *, is_input: bool = False)

load_from(value: Tensor | TensorProxy, *, clone: bool = True) → None

Load data from a torch.Tensor or another TensorProxy to this TensorProxy. :param value: The source tensor to copy data from. :param clone: If True and value is a torch.Tensor, it will be cloned before copying, enabling the source tensor to be modified without affecting this TensorProxy.

class mlsdk.CacheOptions(cache_dir_str: str, *, enable_app_cache: bool = True, enable_onnx_cache: bool = False, enable_codegen_cache: bool = False, enable_gpfn2obj_cache: bool = False)

init(cache_dir_str: str, *, enable_app_cache: bool = True, enable_onnx_cache: bool = False, enable_codegen_cache: bool = False, enable_gpfn2obj_cache: bool = False) → None

The options for specifying the cache directory and controlling cache behavior.

Parameters:
- cache_dir_str – A path string of the root directory to store cache.
- enable_app_cache – If True, cache compiled GPFNApp files from ONNX files. GPFNApp is the binary format of MN-Core compiler uses.
- enable_onnx_cache – If True, cache exported ONNXs from the given function.
- enable_codegen_cache – If True, cache the codegen compilation. This option is mainly for developers.
- enable_gpfn2obj_cache – If True, cache the GPFN object data. This option is mainly for developers.

class mlsdk.MNCoreOptimizer(params: Iterator[Parameter], defaults: Dict[str, Any])

zero_grad(set_to_none: bool = True) → None

Clear the gradient of the parameters.

Parameters: set_to_none (bool) – If True, the gradients will be set to None instead of zero.

class mlsdk.MNCoreSGD(params: Iterator[Parameter], lr: float | Tensor = 0.001, momentum: float = 0, dampening: float = 0, weight_decay: float | Tensor = 0, nesterov: bool = False, *, maximize: bool = False, foreach: bool | None = None, differentiable: bool = False, fused: bool | None = None)

step(closure=None) → None

Perform a single optimization step to update parameter.

Args: : closure (Callable): A closure that reevaluates the model and : returns the loss. Optional for most optimizers.

class mlsdk.MNCoreAdam(params: Iterator[Parameter], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, *, foreach: bool | None = None, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: bool | None = None, decoupled_weight_decay: bool = False, chainer_use_torch: bool = True)

step(closure=None) → None

Perform a single optimization step to update parameter.

Args: : closure (Callable): A closure that reevaluates the model and : returns the loss. Optional for most optimizers.

class mlsdk.MNCoreAdamW(params: Iterator[Parameter], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0.01, amsgrad: bool = False, *, maximize: bool = False, foreach: bool | None = None, capturable: bool = False, differentiable: bool = False, fused: bool | None = None)

step(closure=None) → None

Perform a single optimization step to update parameter.

Args: : closure (Callable): A closure that reevaluates the model and : returns the loss. Optional for most optimizers.

class mlsdk.MNCoreLRScheduler(scheduler: LRScheduler, context: Context | None)

step() → None

Perform a step.

mlsdk.set_buffer_name_in_optimizer(optimizer: MNCoreOptimizer, name: str) → None

Set the buffer names in the optimizer.

This function sets the names of the tensors in the optimizer (i.e. buffers) according to the optimizer’s name, so that ONNX exporter (FX2ONNX) can distinguish them later. You need to call this function before registering the optimizer buffers to the context.

Parameters:
- optimizer – The optimizer to set buffer names.
- name – The name of the optimizer.

mlsdk.get_tensor_name(tensor: Tensor) → str | None

Get the name of the tensor.

This function returns the name of the tensor set by set_tensor_name_in_module or set_buffer_name_in_optimizer.

Parameters: tensor – The tensor to get the name of.
Returns: The name of the tensor.

mlsdk.set_tensor_name_in_module(module: Module, module_name: str | None) → None

Set the tensor names in the module.

This function sets the names of the tensor in the module (i.e. parameters and buffers such as BN stats), so that ONNX exporter (FX2ONNX) can distinguish them later. You need to call this function before registering the module parameters and buffers to the context.

Parameters:
- module – The module to set tensor names.
- module_name – The name of the module.

Preferred Computing Platform（PFCP）ユーザガイド

MLSDK documentation

class mlsdk.Context(device: MNDevice)

get_registered_value_proxy(value: Tensor) → TensorProxy

load_codegen_dir(codegen_dir: Path) → CompiledFunction

NOTE

register_buffer(buffer: Tensor) → None

NOTE

register_optimizer_buffers(optimizer: MNCoreOptimizer) → None

NOTE

register_param(param: Parameter) → None

NOTE

synchronize() → None

NOTE

class mlsdk.CompiledFunction(context: Context, code_block: _CompiledFunction, *, output_signature: ValueSignature | None = None)

allocate_input_proxy() → Dict[str, TensorProxy]

class mlsdk.TensorProxy(context: Context, codegen_data: TensorProxyCodegenData, *, is_input: bool = False)

load_from(value: Tensor | TensorProxy, *, clone: bool = True) → None

class mlsdk.CacheOptions(cache_dir_str: str, *, enable_app_cache: bool = True, enable_onnx_cache: bool = False, enable_codegen_cache: bool = False, enable_gpfn2obj_cache: bool = False)

init(cache_dir_str: str, *, enable_app_cache: bool = True, enable_onnx_cache: bool = False, enable_codegen_cache: bool = False, enable_gpfn2obj_cache: bool = False) → None

class mlsdk.MNCoreOptimizer(params: Iterator[Parameter], defaults: Dict[str, Any])

zero_grad(set_to_none: bool = True) → None

class mlsdk.MNCoreSGD(params: Iterator[Parameter], lr: float | Tensor = 0.001, momentum: float = 0, dampening: float = 0, weight_decay: float | Tensor = 0, nesterov: bool = False, *, maximize: bool = False, foreach: bool | None = None, differentiable: bool = False, fused: bool | None = None)

step(closure=None) → None

step(closure=None) → None

step(closure=None) → None

class mlsdk.MNCoreLRScheduler(scheduler: LRScheduler, context: Context | None)

step() → None

mlsdk.set_buffer_name_in_optimizer(optimizer: MNCoreOptimizer, name: str) → None

mlsdk.get_tensor_name(tensor: Tensor) → str | None

mlsdk.set_tensor_name_in_module(module: Module, module_name: str | None) → None

mlsdk.path(target: str) → Path

mlsdk.trace_scope(output_filename: str | Path | None, ignore_if_traced: bool = False) → Iterator[None]

Keyboard shortcuts

Preferred Computing Platform（PFCP）ユーザガイド

register_optimizer_buffers(optimizer: MNCoreOptimizer) → None

class mlsdk.CompiledFunction(context: Context, code_block: _CompiledFunction, *, output_signature: ValueSignature | None = None)

allocate_input_proxy() → Dict[str, TensorProxy]

class mlsdk.TensorProxy(context: Context, codegen_data: TensorProxyCodegenData, *, is_input: bool = False)

load_from(value: Tensor | TensorProxy, *, clone: bool = True) → None

class mlsdk.MNCoreLRScheduler(scheduler: LRScheduler, context: Context | None)

mlsdk.set_buffer_name_in_optimizer(optimizer: MNCoreOptimizer, name: str) → None