PyTorch internals

Buffers are scratch storage space within a module that are not considered parameters, but are part of the module’s persistent state. They are typically used to store values that should not be updated during training, such as running statistics in batch normalization layers.

Hooks are functions that can be registered to be called during the forward or backward pass of a module. They allow users to modify the inputs or outputs of a module, or to perform custom operations during the training process like logging.

Best practices

Wrap a network’s learnable parameters in torch.nn.Parameter instead of manually tracking them with .requires_grad_. This registers them as parameters of the module and ensures they are included in calls to .parameters() and .state_dict()
PyTorch offers torch.nn OOP for building neural networks and torch.nn.functional for stateless functions. Blend the two paradigms: leverage the modularity and state management of OOP for your model architecture, while using functional programming for the stateless operations within your model’s forward pass
Use the autograd profiler to profile the performance of the operators used in your model.