diff --git a/rfcs/20240905-dnncompat b/rfcs/20240905-dnncompat
new file mode 100644
index 00000000000..b6fcb6350cb
--- /dev/null
+++ b/rfcs/20240905-dnncompat
@@ -0,0 +1,906 @@
+# oneDNN Compatibility Layer
+
+## Summary
+
+The `dnncompat` compatibility layer in oneDNN is a thin, header-only wrapper
+around the oneDNN API designed to provided an API that closely mirrors
+vendor-specific DNN libraries. Its primary purpose is to be a drop-in
+replacement for these libraries, making it useful for auto-translation tools
+such as SYCLomatic as well as manual conversion of user applications written
+with vendor-specific APIs to the oneDNN library.
+
+Since oneDNN is optimized for a variety of hardware vendors, `dnncompat` will
+allow users and tools to convert applications that are limited to a specific
+hardware into hardware-agnostic projects. Thus, allowing the same codebase to
+run on different platforms without modification.
+
+The goals of `dnncompat` are as follows:
+
+- To offer a light-weight, header-only wrapper around oneDNN which can easily be
+  integrated into user projects and utilised by auto-translation tools.
+- To deliver an API that closely resembles vendor-specific libraries, enabling
+API calls to be converted primarily through find-and-replace.
+- To handle differences in behaviour between oneDNN and vendor-specific
+libraries internally, thus allowing users to easily convert applications without
+having to debug differences between libraries
+
+In addition, `dnncompat` leverages SYCL Unified Shared Memory (USM) support
+within oneDNN to provide an API that can accept USM pointers as inputs similar
+to vendor-specific library APIs which take pointers to memory as inputs. This
+also allows `dnncompat` to work with memory that is natively allocated, such as
+memory allocated via `cudaMalloc` making integration into native applications
+possible too.
+
+`dnncompat` will be shipped as a set of header files. When oneDNN is built from
+source, these headers will be installed in the directory
+`oneDNN-install-directory/include/dnncompat/`. Users will be able to integrate
+`dnncompat` into their projects by including the oneDNN include directory in
+their include path and adding `#include <dnncompat/dnncompat.hpp>` to their
+project and linking with the oneDNN library.
+
+## SYCLomatic Usage
+
+At present, SYCLomatic provides its own header-only wrapper for oneDNN and
+oneMKL which is used when converting applications. While this approach is
+sufficient, the code produced by SYCLomatic can sometimes be challenging to
+debug and understand for new oneDNN users as it relies on a mixture of custom
+wrappers around oneDNN types along with direct oneDNN API calls. Additionally,
+there are several differences between the supported primitives & their supported
+features in oneDNN and vendor-specific libraries, making debugging incorrect
+output more difficult.
+
+To address these issues, the SYCLomatic headers can be substituted with the
+`dnncompat` wrapper. By using `dnncompat`, the conversion process will be
+simplified, reducing the complexity of the generated code and making it more
+straightforward to understand and debug. This will ensure that developers can
+convert their applications to `dnncompat` & oneDNN more easily, while also
+reducing potential integration issues.
+
+## Scope
+
+Initially the focus will be on a specific set of operators commonly used in the
+VGG-16 model. The following operators will be prioritised: Convolution, BiasAdd,
+Pooling, Activation, Softmax, Local Response Normalization (LRN), and
+TransformTensor (transpose). This will allow for creating a relatively small
+prototype, while showing that `dnncompat` will be useful.
+
+In this initial implementation, these operators will be supported in the forward
+direction. The supported data types initially will be `f32` and `f16`. In the
+future, the support for operations, data types & directions can be expanded
+based on the need of SYCLomatic and community.
+
+## Implementation Details
+
+`dnncompat` will adopt an API design similar to vendor-specific libraries, which
+are often C-based. Tensors and operations will be described using descriptors
+(objects that encapsulate the necessary parameters for a given tensor or
+operation). These descriptors will then be passed, together with pointers to GPU
+ memory containing the tensor data, to the relevant API functions which will
+execute the operations based on the provided parameters.
+
+While `dnncompat` will be implemented with object-oriented programming (OOP)
+functionality as it is a C++ library, the user-facing API will primarily
+interact with these descriptors in a C-style API rather than through member
+functions of classes. Since `dnncompat` is a C++ library, OOP features can be
+used, but the C-style APIs will be the main user facing API to align with the
+design of vendor-specific libraries, such as `cuDNN`.
+
+The classes in `dnncompat` will be presented to users as opaque pointers, hiding
+implementation details and providing API as close as possible to vendor-specific
+libraries.
+
+### Enumerations
+
+The `dnncompat` library will include the following enumerations to handle
+statuses, data types, tensor formats:
+
+- `Status_t`: Status codes returned by operations, indicating success or
+  specific types of errors.
+- `DataType_t`: Specifies the data types used in tensor operations, such as
+  `f32` or `f16`.
+- `TensorFormat_t`: Defines the layout formats for tensors such as `NCHW` or
+  `NHWC`.
+- `NanPropagation_t`: Controls how NaN values are handled in operations.
+- `MathType_t`: Used to enable or disable certain optimisations in
+  vendor-specific libraries. Not supported in oneDNN, added for completeness.
+
+### Non-Operation Classes
+
+#### Library Context
+
+Class **`detail::Handle`** serves as the context of the library, managing the
+execution environment.
+
+The `dnncompat::Handle` will internally manage the `dnnl::engine`,
+`dnnl::stream`, and the underlying `sycl::queue`.
+
+The `detail::Handle` class will have the following **private** constructors:
+
+```c++
+// Create a dnncompat::Handle with the specified engine kind
+Handle(dnnl::engine::kind engine_kind) {}
+
+// Create a dnncompat::Handle with an existing dnnl::engine
+Handle(dnnl::engine &engine) {}
+
+// Create a dnncompat::Handle with an existing sycl::queue
+Handle(sycl::queue q) {}
+```
+
+To allow users to create instances of `dnncompat::Handle`, the library will
+expose the following public functions. These functions will utilize the private
+constructors but will return a `dnncompat::Status_t`, indicating any potential
+errors during the handle's creation:
+
+```c++
+// Handle creation functions
+Status_t create(dnnl::engine::kind engine_kind);
+Status_t create(dnnl::engine &eng);
+Status_t create(sycl::queue q);
+```
+
+Additional public functions and getters will be available for users to interact
+with the handle:
+
+```c++
+// Retrieve the underlying engine, stream, or queue
+dnnl::engine &getEngine();
+dnnl::stream &getStream();
+sycl::queue getQueue();
+
+// Wait for operations to complete
+void wait();
+```
+
+The following type alias will be exposed to users:
+
+```c++
+// Alias for user convenience
+using Handle_t = dnncompat::detail::Handle *;
+```
+
+#### Tensor Descriptors
+
+##### Base Descriptor Class
+
+The `detail::DescriptorBase` class will be a base class, providing common
+functionality for both `TensorDescriptor` and `FilterDescriptor` classes. It
+will encapsulate parameters and information related to tensors such as number of
+dimensions, dimension sizes, strides associated with each dimension, tensor
+format, data type and an internal oneDNN memory descriptor
+(`dnnl::memory::desc`). The classes `detail::TensorDescriptor` and
+`detail::FilterDescriptor` (descriptor for Convolution weights) will inherit
+from this class as both share similar functionality.
+
+`DescriptorBase` provides several public getter functions, allowing users to
+access various tensor properties:
+
+```c++
+// Retrieve the data type of the tensor
+DataType_t getDataType() const;
+
+// Get the tensor's format (layout)
+TensorFormat_t getTensorFormat() const;
+
+// Access the dimensions of the tensor
+const std::vector<int> &getDims() const;
+
+// Access the strides for each dimension
+const std::vector<int> &getStrides() const;
+
+// Get the number of dimensions in the tensor
+size_t getNumDims() const;
+
+// Retrieve the oneDNN memory descriptor associated with the tensor
+const dnnl::memory::desc &getOneDNNDesc();
+```
+
+##### Tensor Descriptor
+
+The `detail::TensorDescriptor` class will be exposed to users with the following
+type alias:
+
+```c++
+using TensorDescriptor_t = dnncompat::detail::TensorDescriptor *;
+```
+
+List of free functions related to `TensorDescriptor`:
+
+```c++
+// Initialise a tensor descriptor
+Status_t createTensorDescriptor(TensorDescriptor_t *tensorDesc);
+
+// Destroy a tensor descriptor
+Status_t destroyTensorDescriptor(TensorDescriptor_t tensorDesc);
+
+// Set the parameters of a 4D tensor descriptor
+Status_t setTensor4dDescriptor(TensorDescriptor_t tensorDesc,
+        TensorFormat_t format, DataType_t dataType, int n, int c, int h,
+        int w);
+
+// Set the parameters of an n-dimensional tensor descriptor
+Status_t setTensorNdDescriptor(TensorDescriptor_t tensorDesc,
+        DataType_t dataType, int nbDims, const int dimA[],
+        const int strideA[]);
+
+// Set the parameters of a 4D tensor descriptor.
+// This function takes strides as an argument instead of a tensor format
+// which allows custom layouts
+Status_t setTensor4dDescriptorEx(TensorDescriptor_t tensorDesc,
+        DataType_t dataType, int n, int c, int h, int w, int nStride,
+        int cStride, int hStride, int wStride);
+
+// Get the parameters of a 4D tensor descriptor
+Status_t getTensor4dDescriptor(const TensorDescriptor_t tensorDesc,
+        DataType_t *dataType, int *n, int *c, int *h, int *w, int *nStride,
+        int *cStride, int *hStride, int *wStride);
+
+// Get the parameters of a n-dimensional tensor descriptor
+Status_t getTensorNdDescriptor(const TensorDescriptor_t tensorDesc,
+        int nbDimsRequested, DataType_t *dataType, int *nbDims, int *dimA,
+        int *strideA);
+```
+
+##### Filter Descriptor
+
+`detail::FilterDescriptor` encapsulates the parameters of the Convolution
+operation's weights.
+
+The `detail::FilterDescriptor` class will be exposed to users with the following
+type alias:
+
+```c++
+using FilterDescriptor_t = dnncompat::detail::FilterDescriptor *;
+```
+
+List of free functions related to `FilterDescriptor`:
+
+```c++
+// Create a filter descriptor
+Status_t createFilterDescriptor(FilterDescriptor_t *filt);
+
+// Destroy a filter descriptor
+Status_t destroyFilterDescriptor(FilterDescriptor_t filt);
+
+// Set parameters of a 4-dimensional filter descriptor
+Status_t setFilter4dDescriptor(FilterDescriptor_t filterDesc,
+        DataType_t dataType, TensorFormat_t format, int k, int c, int h,
+        int w);
+
+// Set parameters of a n-dimensional filter descriptor.
+Status_t setFilterNdDescriptor(FilterDescriptor_t filterDesc,
+        DataType_t dataType, TensorFormat_t format, int nbDims,
+        const int *filterDimA);
+
+// Get parameters of a 4-dimensional filter descriptor
+Status_t getFilter4dDescriptor(FilterDescriptor_t filterDesc,
+        DataType_t *dataType, TensorFormat_t *format, int *k, int *c, int *h,
+        int *w);
+
+// Get parameters of a n-dimensional filter descriptor
+Status_t getFilterNdDescriptor(FilterDescriptor_t filterDesc,
+        int nbDimsRequested, DataType_t *dataType, TensorFormat_t *format,
+        int *nbDims, int *filterDimA);
+```
+
+### Operations
+
+#### Activation
+
+The `detail::ActivationDescriptor` class encapsulates the parameters of the
+activation operation: activation mode (initially `sigmoid` and `relu` will be
+supported), NaN propagation mode, and coefficient (has different behaviour based
+on the activation mode).
+
+Free functions related to the Activation Operation:
+
+```c++
+// Create activation descriptor
+Status_t createActivationDescriptor(ActivationDescriptor_t *activationDesc);
+
+// Destroy activation descriptor
+Status_t destroyActivationDescriptor(ActivationDescriptor_t activationDesc);
+
+// Set the parameters of activation operation
+Status_t setActivationDescriptor(ActivationDescriptor_t activationDesc,
+        ActivationMode_t mode, NanPropagation_t reluNanOpt, double coef);
+
+// Launch activation forward
+Status_t activationForward(Handle_t handle,
+        ActivationDescriptor_t activationDesc, const void *alpha,
+        const TensorDescriptor_t xDesc, const void *x, const void *beta,
+        const TensorDescriptor_t yDesc, void *y);
+```
+
+List of enumerations related to the Activation operation:
+
+```c++
+enum ActivationMode_t { ACTIVATION_SIGMOID, ACTIVATION_RELU };
+```
+
+#### AddTensor
+
+Used for Convolution bias add. This operation does not have a separate
+descriptor.
+
+List of free functions related to BiasAdd operation:
+
+```c++
+Status_t addTensor(Handle_t handle, const void *alpha,
+        const TensorDescriptor_t aDesc, const void *A, const void *beta,
+        const TensorDescriptor_t cDesc, void *C);
+```
+
+#### Convolution
+
+The `detail::ConvolutionDescriptor` class encapsulates the parameters of a
+Convolution operation: number of groups, convolution mode (`convolution` or
+`cross_correlation`), padding, convolution stride, dilation.
+
+The `ConvolutionDescriptor` class will be exposed to users with the following
+type alias:
+
+```c++
+using ConvolutionDescriptor_t = dnncompat::detail::ConvolutionDescriptor *;
+```
+
+List of free functions related to the Convolution operation:
+
+```c++
+Free functions
+
+```c++
+// Create convolution descriptor
+Status_t createConvolutionDescriptor(ConvolutionDescriptor_t *convDesc);
+
+// Destroy convolution descriptor
+Status_t destroyConvolutionDescriptor(ConvolutionDescriptor_t convDesc);
+
+// Set parameters of a 2-dimensional convolution operation with 4D inputs
+Status_t setConvolution2dDescriptor(ConvolutionDescriptor_t desc, int pad_h,
+        int pad_w, int stride_h, int stride_w, int dilation_h, int dilation_w,
+        ConvolutionMode_t mode, DataType_t dataType);
+
+// Set parameters of an n-dimensional convolution operations (with n-2 dimensional inputs)
+Status_t setConvolutionNdDescriptor(ConvolutionDescriptor_t desc,
+        int arrayLength, const int *padA, const int *filterStrideA,
+        const int *dilationA, ConvolutionMode_t mode, DataType_t dataType);
+
+// Set the number of groups to be used when performing convolution
+Status_t setConvolutionGroupCount(
+        ConvolutionDescriptor_t desc, int groupCount);
+
+// Computes the dimensions of the output of an n-dimensional convolution
+Status_t getConvolutionNdForwardOutputDim(
+        const ConvolutionDescriptor_t convDesc,
+        const TensorDescriptor_t inputTensorDesc,
+        const FilterDescriptor_t filterDesc, int nbDims,
+        int *tensorOutputDimA);
+
+// Computes the dimensions of the output of an 2-dimensional convolution
+Status_t getConvolution2dForwardOutputDim(
+        const ConvolutionDescriptor_t convDesc,
+        const TensorDescriptor_t inputTensorDesc,
+        const FilterDescriptor_t filterDesc, int *n, int *c, int *h, int *w);
+
+// Computes the size (in bytes) of a convolution forward operation's required workspace
+Status_t getConvolutionForwardWorkspaceSize(Handle_t handle,
+        const TensorDescriptor_t xDesc, const FilterDescriptor_t wDesc,
+        const ConvolutionDescriptor_t convDesc, const TensorDescriptor_t yDesc,
+        ConvolutionFwdAlgo_t algo, size_t *workSpaceSizeInBytes)
+
+// Launch a convolution forward operation
+Status_t convolutionForward(Handle_t handle, const void *alpha,
+        const TensorDescriptor_t xDesc, const void *x,
+        const FilterDescriptor_t wDesc, const void *w,
+        const ConvolutionDescriptor_t convDesc, ConvolutionFwdAlgo_t algo,
+        void *workSpace, size_t workSpaceSizeInBytes, const void *beta,
+        const TensorDescriptor_t yDesc, void *y);
+
+// Computes the size (in bytes) of a convolution backward data operation's required workspace
+Status_t getConvolutionBackwardDataWorkspaceSize(Handle_t handle,
+        const FilterDescriptor_t wDesc, const TensorDescriptor_t dyDesc,
+        const ConvolutionDescriptor_t convDesc, const TensorDescriptor_t dxDesc,
+        ConvolutionFwdAlgo_t algo, size_t *workSpaceSizeInBytes)
+
+// Launch a convolution backward data operation
+Status_t ConvolutionBackwardData(Handle_t handle, const void *alpha,
+        const FilterDescriptor_t wDesc, const void *w,
+        const TensorDescriptor_t dyDesc, const void *dy,
+        const ConvolutionDescriptor_t convDesc, ConvolutionBwdDataAlgo_t algo,
+        void *workSpace, size_t workSpaceSizeInBytes, const void *beta,
+        const TensorDescriptor_t dxDesc, void *dx);
+
+// Computes the size (in bytes) of a convolution backward weights operation's required workspace
+Status_t getConvolutionBackwardFilterWorkspaceSize(Handle_t handle,
+        const TensorDescriptor_t xDesc, const TensorDescriptor_t dyDesc,
+        const ConvolutionDescriptor_t convDesc, const FilterDescriptor_t dwDesc,
+        ConvolutionFwdAlgo_t algo, size_t *workSpaceSizeInBytes);
+
+// Launch a convolution backward weights operation
+Status_t ConvolutionBackwardFilter(Handle_t handle, const void *alpha,
+        const TensorDescriptor_t xDesc, const void *x,
+        const TensorDescriptor_t dyDesc, const void *dy,
+        const ConvolutionDescriptor_t convDesc, ConvolutionBwdFilterAlgo_t algo,
+        void *workSpace, size_t workSpaceSizeInBytes, const void *beta,
+        const FilterDescriptor_t dwDesc, void *dw);
+
+// The following APIs are used to find the best algorithm given parameters of
+// a convolution operation. When used in dnncompat they will always return 
+// `CONVOLUTION_FWD_ALGO_AUTO` which then gets mapped to 
+// `dnnl::algorithm::convolution_auto` when launching the convolution operation.
+// Thus, dnncompat, relies on oneDNN to select the best algorithm.
+// This behaviour is consistent with vendor-specific libraries, which may return 
+// fewer algorithms than the requested number.
+Status_t findConvolutionForwardAlgorithm(Handle_t,
+        const TensorDescriptor_t, const FilterDescriptor_t,
+        const ConvolutionDescriptor_t, const TensorDescriptor_t, const int,
+        int *returnedAlgoCount, ConvolutionFwdAlgoPerf_t *perfResults);
+
+Status_t getConvolutionForwardAlgorithm_v7(Handle_t,
+        const TensorDescriptor_t, const FilterDescriptor_t,
+        const ConvolutionDescriptor_t, const TensorDescriptor_t, const int,
+        int *returnedAlgoCount, ConvolutionFwdAlgoPerf_t *perfResults);
+
+Status_t findConvolutionBwdDataAlgorithm(Handle_t,
+        const TensorDescriptor_t, const FilterDescriptor_t,
+        const ConvolutionDescriptor_t, const TensorDescriptor_t, const int,
+        int *returnedAlgoCount, ConvolutionBwdDataAlgoPerf_t *perfResults);
+
+Status_t findConvolutionBackwardFilterAlgorithm(Handle_t,
+        const TensorDescriptor_t, const FilterDescriptor_t,
+        const ConvolutionDescriptor_t, const TensorDescriptor_t, const int,
+        int *returnedAlgoCount, ConvolutionBwdFilterAlgoPerf_t *perfResults);
+```
+
+List of enumerations related to the Convolution operation:
+
+```c++
+enum ConvolutionFwdAlgo_t {
+    CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM = 0,
+    CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM,
+    CONVOLUTION_FWD_ALGO_GEMM,
+    CONVOLUTION_FWD_ALGO_DIRECT,
+    CONVOLUTION_FWD_ALGO_FFT,
+    CONVOLUTION_FWD_ALGO_FFT_TILING,
+    CONVOLUTION_FWD_ALGO_WINOGRAD,
+    CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED,
+    CONVOLUTION_FWD_ALGO_AUTO,
+    CONVOLUTION_FWD_ALGO_COUNT,
+};
+
+enum ConvolutionBwdDataAlgo_t {
+    CONVOLUTION_BWD_DATA_ALGO_0,
+    CONVOLUTION_BWD_DATA_ALGO_1,
+    CONVOLUTION_BWD_DATA_ALGO_FFT,
+    CONVOLUTION_BWD_DATA_ALGO_FFT_TILING,
+    CONVOLUTION_BWD_DATA_ALGO_WINOGRAD,
+    CONVOLUTION_BWD_DATA_ALGO_WINOGRAD_NONFUSED,
+    CONVOLUTION_BWD_DATA_ALGO_AUTO,
+    CONVOLUTION_BWD_DATA_ALGO_DIRECT,
+    CONVOLUTION_BWD_DATA_ALGO_COUNT,
+};
+
+enum ConvolutionBwdFilterAlgo_t {
+    CONVOLUTION_BWD_FILTER_ALGO_0,
+    CONVOLUTION_BWD_FILTER_ALGO_1,
+    CONVOLUTION_BWD_FILTER_ALGO_FFT,
+    CONVOLUTION_BWD_FILTER_ALGO_3,
+    CONVOLUTION_BWD_FILTER_ALGO_WINOGRAD_NONFUSED,
+    CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING,
+    CONVOLUTION_BWD_FILTER_ALGO_AUTO,
+    CONVOLUTION_BWD_FILTER_ALGO_DIRECT,
+    CONVOLUTION_BWD_FILTER_ALGO_COUNT,
+};
+
+enum ConvolutionMode_t {
+    CONVOLUTION = 0,
+    CROSS_CORRELATION,
+};
+
+enum Determinism_t {
+    NON_DETERMINISTIC = 0,
+    DETERMINISTIC,
+};
+```
+
+#### LRN
+
+The `detail::LRNDescriptor` class encapsulates the parameters of the LRN
+operation: N (the normalisation window), alpha (alpha variance scaling
+parameter), beta (power parameter), K (k parameter of the LRN formula).
+
+The `LRNDescriptor` will be exposed to users with the following type alias:
+
+```c++
+using LRNDescriptor_t = dnncompat::detail::LRNDescriptor *;
+```
+
+Free functions related to the LRN operation:
+
+```c++
+// Create LRN descriptor
+Status_t createLRNDescriptor(LRNDescriptor_t *lrnDesc);
+
+// Destroy LRN descriptor
+Status_t destroyLRNDescriptor(LRNDescriptor_t lrnDesc);
+
+// Set parameters of an LRN operation
+Status_t setLRNDescriptor(LRNDescriptor_t normDesc, unsigned lrnN,
+        double lrnAlpha, double lrnBeta, double lrnK);
+
+// Launch LRN operation
+Status_t lrnCrossChannelForward(Handle_t handle, const LRNDescriptor_t normDesc,
+        LRNMode_t lrnMode, const void *alpha, const TensorDescriptor_t xDesc,
+        const void *x, const void *beta, const TensorDescriptor_t yDesc,
+        void *y);
+```
+
+List of enumerations related to the LRN operation:
+
+```c++
+enum LRNMode_t { LRN_CROSS_CHANNEL_DIM1 };
+```
+
+#### Pooling
+
+The `detail::PoolingDescriptor` class encapsulates the parameters of the Pooling
+operation: number of dimensions, mode (Max, Average including padding, Average
+excluding padding), NaN propagation mode, Pooling window dimensions, padding,
+pooling stride.
+
+The `PoolingDescriptor` will be exposed to users with the following type alias:
+
+```c++
+using PoolingDescriptor_t = dnncompat::detail::PoolingDescriptor *;
+```
+
+Free functions related to the Pooling operation:
+
+```c++
+// Create descriptor
+Status_t createPoolingDescriptor(PoolingDescriptor_t *poolingDesc);
+
+// Destroy descriptor
+Status_t destroyPoolingDescriptor(PoolingDescriptor_t poolingDesc);
+
+// Set 2D Pooling descriptor attributes
+Status_t setPooling2dDescriptor(PoolingDescriptor_t poolingDesc,
+        PoolingMode_t mode, NanPropagation_t maxpoolingNanOpt, int windowWidth,
+        int windowHeight, int verticalPadding, int horizontalPadding,
+        int verticalStride, int horizontalStride);
+
+// Set ND Pooling descriptor attributes
+Status_t setPoolingNdDescriptor(PoolingDescriptor_t poolingDesc,
+        PoolingMode_t mode, NanPropagation_t maxpoolingNanOpt, int nbDims,
+        const int windowDimA[], const int paddingA[], const int strideA[]);
+
+// Computes the dimensions of an 2-dimensional pooling operation's output tensor
+Status_t getPooling2dForwardOutputDim(const PoolingDescriptor_t poolingDesc,
+        const TensorDescriptor_t inputDesc, int *outN, int *outC, int *outH,
+        int *outW);
+
+// Computes the dimensions of an n-dimensional pooling operation's output tensor
+Status_t getPoolingNdForwardOutputDim(const PoolingDescriptor_t poolingDesc,
+        const TensorDescriptor_t inputDesc, int nbDims, int outDimA[]);
+
+// Launch pooling operation
+Status_t poolingForward(Handle_t handle, const PoolingDescriptor_t poolingDesc,
+        const void *alpha, const TensorDescriptor_t xDesc, const void *x,
+        const void *beta, const TensorDescriptor_t yDesc, void *y);
+```
+
+List of enumerations related to the Pooling operation:
+
+```c++
+enum PoolingMode_t {
+    POOLING_MAX = 0,
+    POOLING_AVERAGE_COUNT_INCLUDE_PADDING,
+    POOLING_AVERAGE_COUNT_EXCLUDE_PADDING,
+    POOLING_MAX_DETERMINISTIC
+};
+```
+
+#### Softmax
+
+The Softmax operation does not have a separate descriptor.
+
+List of free functions related to the Softmax operation:
+
+```c++
+// Launch the Softmax forward operation
+Status_t softmaxForward(Handle_t handle, SoftmaxAlgorithm_t algorithm,
+        SoftmaxMode_t mode, const void *alpha, const TensorDescriptor_t xDesc,
+        const void *x, const void *beta, const TensorDescriptor_t yDesc,
+        void *y);
+```
+
+List of enumerations related to the Softmax operation:
+
+```c++
+// With mode INSTANCE Softmax is computed across dimensions C,H,W
+// With mode CHANNEL Softmax is computed across dimensions C 
+enum SoftmaxMode_t { SOFTMAX_MODE_INSTANCE, SOFTMAX_MODE_CHANNEL };
+
+// Note: oneDNN does not have an explicit fast Softmax algorithm
+enum SoftmaxAlgorithm_t { SOFTMAX_FAST, SOFTMAX_ACCURATE, SOFTMAX_LOG };
+```
+
+#### TransformTensor
+
+The TransformTensor operation maps to the oneDNN Reorder primitive. It does not
+have a separate descriptor.
+
+List of free functions related to the TransformTensor operation:
+
+```c++
+// Launch the Transform tensor operator
+Status_t transformTensor(Handle_t handle, const void *alpha,
+        const TensorDescriptor_t xDesc, const void *x, const void *beta,
+        const TensorDescriptor_t yDesc, void *y);
+```
+
+#### Matmul
+
+Matrix multiplication is a fundamental operation in machine learning. While it
+is commonly provided by BLAS libraries, a mapping for this operation will be
+provided in `dnncompat`. Below is an outline of the necessary enumerations,
+classes, and functions for this implementation.
+
+##### Enumerations
+
+```c++
+// Enum for specifying transpose operations on matrices
+enum class Operation_t {
+    OP_N, // No transpose
+    OP_T  // Transpose
+};
+
+// Enum defining matmul descriptor attributes, such as matrix transpose settings and epilogues
+enum class MatmulDescAttributes_t {
+    MATMUL_DESC_TRANSA,    // Transpose setting for matrix A
+    MATMUL_DESC_TRANSB,    // Transpose setting for matrix B
+    MATMUL_DESC_TRANSC,    // Transpose setting for matrix C
+    MATMUL_DESC_EPILOGUE,  // Epilogue setting for post-ops, see Epilogue_t for supported epilogues
+    MATMUL_DESC_BIAS_POINTER // Pointer which stores bias data
+};
+
+// Enum for matrix layout attributes, specifying the order and batching configurations
+enum class MatrixLayoutAttribute_t {
+    MATRIX_LAYOUT_ORDER,                 // Matrix layout
+    MATRIX_LAYOUT_BATCH_COUNT,           // Number of batches for batched matrix operations
+    MATRIX_LAYOUT_STRIDED_BATCH_OFFSET   // Stride between batches
+};
+
+// Enum for setting matrix layout formats, especially for IMMA
+enum class Order_t {
+    ORDER_COL,                // Column-major format
+    ORDER_ROW,                // Row-major format
+    ORDER_COL32,              // Special format for IMMA matrix A and C on Ampere architecture
+    ORDER_COL4_4R2_8C,        // Special format for matrix B on Turing architecture
+    ORDER_COL32_2R_4R4        // Special format for matrix B on Ampere architecture
+};
+
+// Enum for supported epilogues (post-ops)
+enum class Epilogue_t {
+    CUBLASLT_EPILOGUE_DEFAULT,      // Default epilogue
+    CUBLASLT_EPILOGUE_RELU,         // ReLU activation
+    CUBLASLT_EPILOGUE_BIAS,         // Bias addition
+    CUBLASLT_EPILOGUE_RELU_BIAS     // ReLU activation with bias addition
+};
+```
+
+##### Structures
+
+```c++
+// Descriptor for defining the computation and scaling data types for matmul operations
+struct MatmulDesc_t {
+    ComputeType_t computeType;   // Data type of computation (i.e. accumulation type)
+    DataType_t scaleType;        // Data type for scaling factors
+};
+
+// Structure representing the result of a heuristic search for matmul algorithms
+struct MatmulHeuristicResult_t {
+    MatmulAlgo_t algo;           // Algorithm chosen for matmul operation (see notes for Algo_t mapping details)
+    size_t workspaceSize;        // Required workspace size for the algorithm
+    Status_t state;              // Status of the heuristic search
+};
+
+// Descriptor for matrix layouts, including the data type and dimensional properties
+struct MatrixLayout_t {
+    DataType type;               // Data type of matrix elements
+    uint64_t rows;               // Number of rows in the matrix
+    uint64_t cols;               // Number of columns in the matrix
+    int64_t ld;                  // Leading dimension of the matrix
+};
+
+// Class representing a oneDNN matmul primitive descriptor
+// See Notes for more details
+struct MatmulAlgo_t {
+    dnnl::matmul::desc::desc desc;   // OneDNN matmul descriptor
+};
+```
+
+##### Functions
+
+```c++
+// Function to create a matrix layout descriptor
+// IMMA datatypes are restricted to R_8I for matrices A and B, and R_8I/R_32I for D (R_32I only for CUDA >= 12)
+Status_t matrixLayoutCreate(MatrixLayout_t *matLayout,
+                            DataType_t type,
+                            uint64_t rows,
+                            uint64_t cols,
+                            int64_t ld);
+
+// Function to set attributes for a matrix layout
+Status_t matrixLayoutSetAttribute(MatrixLayout_t matLayout,
+                                  MatrixLayoutAttribute_t attr,
+                                  const void *buf,
+                                  size_t sizeInBytes);
+
+// Function to execute the matrix multiplication operation
+Status_t matmul(Handle_t handle,
+                MatmulDesc_t computeDesc,
+                const void *alpha,
+                const void *A,
+                MatrixLayout_t Adesc,
+                const void *B,
+                MatrixLayout_t Bdesc,
+                const void *beta,
+                const void *C,
+                MatrixLayout_t Cdesc,
+                void *D,
+                MatrixLayout_t Ddesc,
+                const MatmulAlgo_t *algo,
+                void *workspace,
+                size_t workspaceSizeInBytes,
+                Stream_t stream);
+
+// Function to initialize the matmul descriptor with specified computation and scale types
+Status_t MatmulDescCreate(MatmulDesc_t *matmulDesc,
+                          ComputeType_t computeType,
+                          DataType_t scaleType);
+
+// Function to set attributes for the matmul descriptor
+Status_t MatmulDescSetAttribute(MatmulDesc_t matmulDesc,
+                                MatmulDescAttributes_t attr,
+                                const void* buf,
+                                size_t sizeInBytes);
+
+// Functions for matmul preferences and heuristic (cannot be mapped to oneDNN)
+// A stub can be provided instead to make automated code translation simpler
+Status_t MatmulPreferenceCreate(MatmulPreference_t* pref);
+Status_t MatmulPreferenceSetAttribute(MatmulPreference_t pref,
+                                      MatmulPreferenceAttributes_t attr,
+                                      const void *buf,
+                                      size_t sizeInBytes);
+
+// Function to obtain heuristic results for viable matmul algorithms
+// Note: in dnncompat this function will always return a single heuristic
+Status_t MatmulAlgoGetHeuristic(Handle_t handle,
+                                MatmulDesc_t operationDesc,
+                                MatrixLayout_t Adesc,
+                                MatrixLayout_t Bdesc,
+                                MatrixLayout_t Cdesc,
+                                MatrixLayout_t Ddesc,
+                                MatmulPreference_t preference,
+                                int requestedAlgoCount,
+                                MatmulHeuristicResult_t heuristicResultsArray[],
+                                int* returnAlgoCount);
+```
+
+###### Notes
+
+- Several features such as `MatmulPreference_t`, `MatmulPreferenceCreate`,
+  `MatmulPreferenceSetAttribute`, and `ReductionScheme_t` are not mappable to
+  oneDNN as they are internal to the implementation and not exposed to users
+- To work around the absence of direct `Algo_t` mapping, the heuristic result
+  structure can be adapted. Instead of directly mapping `Algo_t`, a new struct
+  containing the oneDNN primitive descriptor could be used. This allows the
+  primitive to be initialized early, the required workspace size to be queried,
+  and gives the user control over the memory allocation for the workspace.
+
+## Caveats
+
+While `dnncompat` aims to provide an API as close as possible to vendor-specific
+libraries, there are some caveats:
+
+- There are differences in the range of operators supported by oneDNN compared
+  to vendor-specific libraries.
+
+- There are also differences in feature support. Some operation features present
+  in vendor-specific libraries are not available or may function differently in
+  oneDNN and vice versa.
+
+- For operations such as Activation and Pooling, cuDNN allows the option to
+  disable `NaN` propagation. However, in oneDNN, `NaN` values are always
+  propagated by default, which could lead to different outcomes in the execution
+  of similar operations. In the initial implementation a warning will be printed
+  when `NaN` propagation is explicitly disabled.
+
+- Multiple Handle Support: cuDNN supports the creation and usage of multiple
+  handles. In `dnncompat`, while multiple handles can be created, they must all
+  be initialized with the same `sycl::queue` as the primary handle. This
+  limitation will lead to a slight difference in usage of `dnncompat` in some
+  applications.
+
+- cuDNN includes a specialized FFT convolution implementation, while this
+  algorithm is not supported in oneDNN.
+
+### Scratchpad Memory
+
+Some oneDNN primitives require a separate scratchpad memory buffer to
+store temporary results during computations. By default, oneDNN handles the
+allocation and management of this scratchpad memory internally, which generally
+simplifies usage. However, in the case of `dnncompat` there's a performance
+drawback due to how it creates and executes primitives. Specifically,
+`dnncompat` creates the oneDNN primitives within the execution API itself. This
+design means that scratchpad memory allocations occur at runtime, potentially
+introducing a performance hit due to dynamic memory allocation during execution.
+
+To address this performance concern, a more efficient approach is required for
+handling scratchpad memory. One solution leverages the proposed [async memory
+allocation extension](https://github.com/intel/llvm/pull/14800), which enables
+scratchpad memory to be allocated asynchronously from a memory pool. The
+proposed approach is outlined below:
+
+- Upon initialization of the `dnncompat` library context, a large contiguous USM
+  memory block can be pre-allocated to accommodate all the scratchpad memory
+  needs during execution. This buffer will reduce the need for repeated dynamic
+  allocations.
+- Additionally, when the library context is created, the asynchronous memory
+  pool can be initialized with the `use_existing_memory` property. This property
+  allows the memory pool to reference the large, pre-allocated memory block
+  mentioned above, ensuring that the pool uses the same contiguous memory rather
+  than performing separate allocations.
+- `dnncompat` will set the `dnnl::scratchpad_mode::user` mode for each executed
+  primitive, which gives `dnncompat` explicit control over the allocation and
+  management of scratchpad memory. In this mode, `dnncompat` will be responsible
+  for allocating the scratchpad buffer and ensuring it is passed to the oneDNN
+  primitives when required.
+- Since the handle is passed to all execution APIs within `dnncompat`, the
+  asynchronous memory pool can be reused across different primitives. When a
+  primitive is executed, `dnncompat` will first query the required scratchpad
+  size, allocate it from the pre-allocated pool, and then pass it to the
+  primitive. This approach minimizes the overhead of scratchpad memory
+  allocation and ensures that memory is handled efficiently.
+
+### Workspace Memory
+
+In oneDNN, certain primitives require additional workspace memory during the
+training phase. This workspace serves as an extra buffer passed to the forward
+training primitive, where it stores information that will be needed
+for the backward pass. The same workspace is then passed to the corresponding
+backward primitive as it is required for the computation. However, this
+approach differs from vendor-specific libraries like cuDNN, which do not rely on
+workspace memory in this way. To work around this, the following approach is
+proposed to manage the API differences between oneDNN and cuDNN and avoid adding
+extra parameters to the `dnncompat` APIs:
+
+- A map or dictionary structure can be created where the key is a tensor object
+  passed to both the forward and backward primitives (e.g. destination tensor),
+  and the associated value is the corresponding workspace memory.
+  - This map can be stored within the library handle which is passed to all
+    primitive execution APIs
+- During the forward pass, the `dnncompat` implementation will query the primitive
+  to determine its workspace requirements. Once identified, the necessary
+  workspace memory will be allocated from a memory pool (as described in the
+  Scratchpad Memory subsection). The key-value pair, consisting of the tensor
+  and the allocated workspace, is then stored in the map.
+- When executing the backward pass, the handle, along with the associated map,
+  is passed to retrieve the previously stored workspace memory. The retrieved
+  memory is then passed to the backward oneDNN primitive to complete the
+  computation.
+
+While this method effectively addresses the API mismatch between oneDNN and
+cuDNN, there is a potential issue: if the user copies or modifies the pointer
+associated with the tensor between the forward and backward passes, the map
+lookup will fail, as the modified pointer will no longer match the original key.
+Therefore, this limitation should be clearly documented.
+
+## Unit Testing
+
+To ensure the correctness of dnncompat, unit tests will be implemented using the
+`googletest`-based framework in oneDNN. The tests will run each operation with
+different combinations of parameters and will compare the results with results
+of oneDNN operations.