User/sheilk/winml adapter c api (#2891)

* Create winml adapter c api * fix build * make it build * move adapter into onnxruntime core/session * entry point not exported * minor changes * make model metadata work * make tests pass * implement all the model reflection apis on the adapter c abi * update the new ort interface to create a lotus ennvironment with a logging sink * start adding ort env * move all winml code into adapter folder/lib to isolate it * ensure a single logging manager at a time * start refactoring session * refactor session creation interface * add cpu and dml session option methods to adapter * finish session init * stub out interfaces in ort lib to perform similar mechanics of iinference session * enable profiling, and enable schema override * update session register graph transformers * turn back on custom registry for custom ops * Add sync api * add last c api stubs * should build... but all feature values are broken since this is in flight to moving all implementation details into ivalue * remove ep adapter header * Implement DML execution provider functions from adapter (#2846) * Implement DML execution provider functions from adapter * Use functions in OnnxruntimeEngine.cpp * make map/sequence type_infos freeable, and start implementing ivalue * make it build again * implement value methods * implement remaining methods * remove com adapter abi * check dml session * cache the allocator on ivalue * check if resource is cpu/gpu when access its mutable data * update tensor * mismatched parentheses * fix tensor base and binding obj * it evaluates tensors! sometimes... * minor fixes * enable gpu evals * wrapper all existing winml adapter apis with API_IMPL to try catch (#2854) * update winml... tensor strings are broken, need to template tensorbase to do different things for strings * make tensor strings work with 2 copies in/2 copies out * Fix tensor string and allocator bug * make maps work again... needs some fixes still * Make it build! * enable map inputs * map outputs * unbound outputs for sequences and maps * User/xianz/merge windowsai (#2883) * Packaging pipeline changes for VS 2019 (#2711) * Tiny fix to codegen * Simplify cache implementation and avoid static variables that may carry over between models * Extend DML kernels (#2641) * Additional DML operators * Check unsupported attributes and inputs * Address PR comments * Add kernel capability function used for partitioning, and re-enable stride-based int64 support based on value range * Fix test failures * Build fix * PR comments * Update Nuphar tutorial notebook (#2721) 1. Reflect int8 GEMV improvements for multi-threading from #2696 2. Add notes on multi-threading control using OpenMP 3. Add samples of running multi-isa AOT, and show int8 GEMM differences between AVX and AVX2 4. Add rnn_benchmark example to resolve #1993 * Add schema for new Qops (#2611) * Add schema for new Qops * adding shape inference + qlinearaveragepool * plus review comments * plus review comments * updates per review comments * plus review comments * [server] Add supposed for model_name and model_version as cli parameter (#2708) * remove 64bit warning message from python validation. (#2727) * MLAS: ARM64 build fix (#2734) fix bad usage of vreinterpret to cast vector element types * Fix broken python docs links (#2740) * Fix build on Mac OS (#2731) mac os ld doesn't support --while-archive, correct option is -all_load * fix ngraph wheel (#2737) * fix ngraph wheel 1.1.0 onnxruntime_ngraph wheel doesn't work * remove libdnnl.so in nGraph Libs * make it easy to compare * Split onnxruntime server to a separated folder (#2744) * Fix build for Python 3.8 (#2747) * Fix build for Python 3.8 * Update protobuf to 3.11.2 (#1928) Update protobuf to 3.11.2 (#1928) * Change default optimization level to All (from Basic) (#2745) * change default optimization level to All (from Basic) * fix test * fix c# test * Update numpy to 1.18 (#2758) * Update numpy to 1.18 * Pipeline changes for python 3.8 (#2753) 1. Pipeline changes for python 3.8 2. Fix a regression in setup.py which was just introduced in the previous commit. Please notice, we still haven't made python 3.8 + Windows + CUDA work. * Add basic stacktrace output for posix debug builds. (#2749) * [NupharEP] fix a race condition when multiple sessions running different models concurrently (#2772) * Revert "Change default optimization level to All (from Basic) (#2745)" This reverts commit 56bb503. * Fix typo in error message (#2736) * Rename MKL-DNN to DNNL to fix broken link (#2730) * Fix nightly build version number issue * Pass BUILD_BUILDNUMBER to linux docker * Disable featurizers in python packages * Import more featurizers (#2781) Make kernels non-template. Add input constraint for learnt data. Add min_max_scalar_transformer, robust_scalar_transformer, inputation_marker_transfomer, label_encoder_transformer, missing_dummies_transformer along with tests. Advance Featurizers library commit. * Implement a more stable softmax (#2715) * Implement a more stable SoftMax e^x is represented as infinity if x is large enough, like 100.f. Infinity divided by Infinity is a NAN. Thus, softmax gets a NAN if one or more item are large enough. A math transform as below is leveraged to get a stable softmax: e^xi/(e^x1 + ...e^xn) = e^(xi - max) / (e^(x1 - max) + ... + e^(xn - max)) And for convenience, force max to 0.f if all xi are negative * Contributing: Fix a typo (#2784) * ACL EP GEMM improvements (#2780) When it is posible we use a fully connected layer instead of the gemm implementation. This will let the library use the best implementation based on the input data. * ACL EP convolution improvements (#2774) Added the optimized implementation for depthwise convolution for both ACL v19.02 and ACL 19.05. Also the pointwise convolution seems to be more optimal in the CPU implementation so we opted for that instead. * Add script for release Nuget validation (#2719) * Initial commit * Nits * Disable a test temporarily * Change working directory * Test * Add download python step * Test update * More changes * Fix space issue * Fix * Verify nuget signing * Fix * Spaces * PR feedback * Nit * Fix * Fix * Remove temporary changes * add uint8 support to where op (#2792) * Improve bert optimization script: (#2712) (1) Move input int64=>int32 conversion to embed layer fusion. (2) Output epsilon attribute for LayerNormalization fusion. * add session creation time cost. (#2798) * ML.NET team needs featurizers within a package (#2789) Add auto ml featurizers to Windows, MacOS as well as to GPU packaging-pipelines. * Initialize max of softmax with lowest of float (#2786) * MLAS: update SGEMM threading parameters (#2808) * add interface to copy batch tensors. (#2807) * add interface to copy batch tensors. * onnxruntime * speed up Windows TRT CI (#2811) * don't run cuda tests if building with tensorrt * remove unnecessary build options for win trt ci * refactor win gpu tensorrt ci yml * --numpy_version=1.17 * update * update * azcopy and cuda path * Update test data (#2356) * Add timeseries imputer transformer featurizer kernel (#2813) Make kernels non-template. Add input constraint for learnt data. Fixup tests. Add two more featurizers along with tests. Tests fail. min_max_scalar_transformer robust_scalar_transformer Fix tests serialized stream by prepending version bytes. Add inputation_marker_transfomer and the test. Fix up float/double type designations. Added label_encoder_transformer along with a test. string_throw case is broken at the momement. Fix labelencodertransfomer_test.cc string_throw case Rename maxabsscalertransformer_test.cc Add MissingDummiesTransformer along with the test. Update manifest. Add TimeSeriesImputerTransformer definition, implementation and tests * Fix memory leak in TRT (#2815) * fix memory leak issue * revert EP_FAIL on enueueV2 * Add manifest missing comma * Run static code analyzer on most of our code (#2817) * Scneario Test : Build Google Test and Taef Test based on preprocessor definition (#2809) * Add winml macro wrappers on top of google test macros * change test methods to disabled * Add custom winml macros for both taef and google tests * PR comments * update quantization doc (#2783) * update documentation for quantization script * plus some spell corrections * Filter CPU case for IsFloat16Supported (#2802) * update default optimization level + fix gemm_activation fusion (#2791) * update defualt optimization level + fix gemm_activation fusion * fix typo * add unit test and incorporate review comments * fix test comment * Fix dnnl wheel package name (#2823) * Append '-dnnl' to whl package name when --use_dnnl * Update build.py * Update Ubuntu & TensorRT version in README (#2820) Dockerfile.tensorrt is using nvcr.io/nvidia/tensorrt:19.09-py3 as base Image, update Ubuntu and TensorRT version according to https://docs.nvidia.com/deeplearning/sdk/tensorrt-container-release-notes/rel_19-09.html#rel_19-09 * Merge fixes * Add OneHotEncoder and HashOneHotEncoder kernels. (#2830) Add defs and imlementation for OneHotEncoders, adjuist date_time_transformer kernel and test. Add OneHotEncoder kernel test. Add HashOneHotVectorizerTransformer unit test. This does not link due to multiple definitions of functions that are included into header from a CPP file. * Upgrade gtest to the latest version (#2827) WinML would like to update the googletest submodule. They want some newer features (namely GTEST_SKIP to skip tests programmatically and be able to skip entire fixtures easily) and would need to update the submodule version. However, because the new version of code hit a bug in gcc, even though the bug is already fixed in the latest gcc but we're using gcc 4.8.x and it won't get patched for the bug, so we have to do a compromise, change our code a little bit to make it work. The gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51213 * Add support for int64_t for topk CPU. Fixes github issue #2806. (#2833) * Ignore allocator type in ExecutionProviders allocator map. Make default initialization of OrtMemoryInfo more clearly invalid. (#2768) * Remove allocator type from the key comparison in ExecutionProviders. Remove usage of DummyArena as it's no longer necessary. * Fix x86 tests where arena allocator is disabled. Make initialization of OrtMemoryInfo clearer by adding Invalid enum value. * Make OrtValueNameIdxMap::MaxIdx more intuitive. * Convert ExternalProject Featurizers into git submodule (#2834) Add git submodule for Featurizer library. Update cmake to build for git submodule. * add domain check for nodes + update documentation (#2831) * Fix cgmanifest.json generating script (#2770) * Fix protobuf submodule name * Workaround pygit2 bug * User/orilevari/32bit comparison warning (#2800) * use correct type for for loop * explicitly specify void for parameters of OrtGetApiBase because the function is defined in c, so when the function is just (), it is interpreted as having an unknown number of parameters. This was causing compiler warning C4276. * CMake cross-generator fixes (#2790) * Fix compilation w/ non-VS CMake generators * Fix custom WINMD target in Ninja * Remove usage of msbuild .targets file * Fix linking using DML in Ninja * Automate SDK kit version choice * Cleanup DML package install * Fix SDK version detection * Fix comment * Revert unittest linkage changes * Fix latest SDK detection * Don't link to non-uapcore libraries * Remove MessageBoxA reference and unused link libs * Fix Linux CUDA nuget packaging pipeline break * Refactor WinMLAPI Tests to build both google and taef test based on preprocessor definition (#2829) * Add winml macro wrappers on top of google test macros * change test methods to disabled * Add custom winml macros for both taef and google tests * PR comments * Refactor winml api tests * Move additional gtest specific macro definition into googleTestMacros.h * Fix test build break since winml_lib_api needs to be statically linked to tests since winmlp::learningmodeldevice::iscpu() is being used in devicehelpers.cpp (#2837) * Enforce WINML_TEST_CLASS_BEGIN_* matches w/ a WINML_TEST_CLASS_END (#2841) * update optimization doc for BERT related fusions (#2819) * Add bert related transformers to doc * Add execution provider and comment for bert optimizations * Add comment about accuracy impact of approximation * Fix warnings that cause build to fail * MLAS: enable threading for quantized GEMMs (#2844) * Fix test warnings and delayload linking (#2843) * Ortmemoryinfo struct changed * mark the camera scenario test as edgecore because it uses d3d11 (#2852) * User/orilevari/pipeline fi breaks (#2853) * remove conflicting artifact names. Decided to stop using drop-nuget-cuda since this may have implications on other dependent pipelines. * change job name in gpu.yml back to Windows_CI_GPU_CUDA_Dev * Remove internal libs from tests (#2864) * Support custom DML in onnxruntime_providers.cmake (#2867) * remove old winmladapter cpp Co-authored-by: Changming Sun <[email protected]> Co-authored-by: KeDengMS <[email protected]> Co-authored-by: Jeff <[email protected]> Co-authored-by: Ashwini Khade <[email protected]> Co-authored-by: Andrey <[email protected]> Co-authored-by: George Wu <[email protected]> Co-authored-by: Tracy Sharpe <[email protected]> Co-authored-by: Faith Xu <[email protected]> Co-authored-by: zhanyi-ms <[email protected]> Co-authored-by: Changyoung Koh <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Takeshi Watanabe <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Yufeng Li <[email protected]> Co-authored-by: Maher Jendoubi <[email protected]> Co-authored-by: Andrews548 <[email protected]> Co-authored-by: Hariharan Seshadri <[email protected]> Co-authored-by: Nathan <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: Ke Zhang <[email protected]> Co-authored-by: stevenlix <[email protected]> Co-authored-by: Ryan Lai <[email protected]> Co-authored-by: Ori Levari <[email protected]> Co-authored-by: Yingge WAN <[email protected]> Co-authored-by: Qing <[email protected]> Co-authored-by: Pranav Sharma <[email protected]> Co-authored-by: Tiago Koji Castro Shibata <[email protected]> * move sequence implementation into ort lib... still commented out... need to turn back on... * begin sequence implementation * make maps and sequences work * fix broken tests * remove dead code * misc cleanup * CR feedback * User/xianz/winml adapter c api (#2869) * wrapper all existing winml adapter apis with API_IMPL to try catch * Return HR or Throw for WinML adapter APIs if failed * undo macro wrapper for two places * Wrap error macros around ort apis, too. * address CR feedback #2 * add more api throw/return macros * Revert changes no longer needed * revert changes to cxx api * format winml lib.ort and winml adapter * remove static pheonix singleton Co-authored-by: Ryan Lai <[email protected]> Co-authored-by: Xiang Zhang <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: KeDengMS <[email protected]> Co-authored-by: Jeff <[email protected]> Co-authored-by: Ashwini Khade <[email protected]> Co-authored-by: Andrey <[email protected]> Co-authored-by: George Wu <[email protected]> Co-authored-by: Tracy Sharpe <[email protected]> Co-authored-by: Faith Xu <[email protected]> Co-authored-by: zhanyi-ms <[email protected]> Co-authored-by: Changyoung Koh <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Takeshi Watanabe <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Yufeng Li <[email protected]> Co-authored-by: Maher Jendoubi <[email protected]> Co-authored-by: Andrews548 <[email protected]> Co-authored-by: Hariharan Seshadri <[email protected]> Co-authored-by: Nathan <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: Ke Zhang <[email protected]> Co-authored-by: stevenlix <[email protected]> Co-authored-by: Ori Levari <[email protected]> Co-authored-by: Yingge WAN <[email protected]> Co-authored-by: Qing <[email protected]> Co-authored-by: Pranav Sharma <[email protected]> Co-authored-by: Tiago Koji Castro Shibata <[email protected]>
microsoft · Jan 28, 2020 · 34b5e65 · 34b5e65
1 parent eacb8a7
commit 34b5e65
Show file tree

Hide file tree

Showing 95 changed files with 5,707 additions and 3,486 deletions.
diff --git a/cmake/winml.cmake b/cmake/winml.cmake
@@ -8,12 +8,13 @@ include(winml_cppwinrt.cmake)
 # get the current nuget sdk kit directory
 get_sdk(sdk_folder sdk_version)
 set(target_folder ONNXRuntime/winml)
+set(winml_adapter_dir ${REPO_ROOT}/winml/adapter)
 set(winml_api_root ${REPO_ROOT}/winml/api)
 set(winml_dll_dir ${REPO_ROOT}/winml/dll)
 set(winml_lib_dir ${REPO_ROOT}/winml/lib)
 set(winml_lib_api_dir ${REPO_ROOT}/winml/lib/api)
-set(winml_adapter_dir ${REPO_ROOT}/winml/adapter)
 set(winml_lib_api_image_dir ${REPO_ROOT}/winml/lib/api.image)
+set(winml_lib_api_ort_dir ${REPO_ROOT}/winml/lib/api.ort)
 set(winml_lib_common_dir ${REPO_ROOT}/winml/lib/common)
 set(winml_lib_telemetry_dir ${REPO_ROOT}/winml/lib/telemetry)
 
@@ -116,32 +117,102 @@ set_target_properties(winml_lib_telemetry
 # Link libraries
 target_link_libraries(winml_lib_telemetry PRIVATE wil)
 
+###########################
+# Add winml_lib_ort
+###########################
+
+list(APPEND winml_lib_api_ort_files
+  ${winml_lib_api_ort_dir}/inc/OnnxruntimeProvider.h
+  ${winml_lib_api_ort_dir}/OnnxruntimeCpuSessionBuilder.h
+  ${winml_lib_api_ort_dir}/OnnxruntimeCpuSessionBuilder.cpp
+  ${winml_lib_api_ort_dir}/OnnxruntimeDescriptorConverter.h
+  ${winml_lib_api_ort_dir}/OnnxruntimeDescriptorConverter.cpp
+  ${winml_lib_api_ort_dir}/OnnxruntimeEngine.h
+  ${winml_lib_api_ort_dir}/OnnxruntimeEngine.cpp
+  ${winml_lib_api_ort_dir}/OnnxruntimeEngineBuilder.h
+  ${winml_lib_api_ort_dir}/OnnxruntimeEngineBuilder.cpp
+  ${winml_lib_api_ort_dir}/OnnxruntimeEnvironment.h
+  ${winml_lib_api_ort_dir}/OnnxruntimeEnvironment.cpp
+  ${winml_lib_api_ort_dir}/OnnxruntimeModel.h
+  ${winml_lib_api_ort_dir}/OnnxruntimeModel.cpp
+  ${winml_lib_api_ort_dir}/OnnxruntimeSessionBuilder.h
+  ${winml_lib_api_ort_dir}/pch.h
+    )
+
+if (onnxruntime_USE_DML)
+  list(APPEND winml_lib_api_ort_files
+    ${winml_lib_api_ort_dir}/OnnxruntimeDmlSessionBuilder.h
+    ${winml_lib_api_ort_dir}/OnnxruntimeDmlSessionBuilder.cpp
+    )
+endif(onnxruntime_USE_DML)
+
+# Add static library that will be archived/linked for both static/dynamic library
+add_library(winml_lib_ort STATIC ${winml_lib_api_ort_files})
+
+# Compiler options
+target_compile_features(winml_lib_ort PRIVATE cxx_std_17)
+target_compile_options(winml_lib_ort PRIVATE /GR- /await /wd4238)
+
+# Compiler definitions
+target_compile_definitions(winml_lib_ort PRIVATE PLATFORM_WINDOWS)
+target_compile_definitions(winml_lib_ort PRIVATE _SCL_SECURE_NO_WARNINGS)                         # remove warnings about unchecked iterators
+
+# Specify the usage of a precompiled header
+target_precompiled_header(winml_lib_ort pch.h)
+
+# Includes
+target_include_directories(winml_lib_ort PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/winml_api)                   # windows machine learning generated component headers
+target_include_directories(winml_lib_ort PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/winml_api/comp_generated)    # windows machine learning generated component headers
+target_include_directories(winml_lib_ort PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/winml/sdk/cppwinrt/include)  # sdk cppwinrt headers
+
+target_include_directories(winml_lib_ort PRIVATE ${CMAKE_CURRENT_BINARY_DIR})
+
+target_include_directories(winml_lib_ort PRIVATE ${REPO_ROOT}/winml)
+target_include_directories(winml_lib_ort PRIVATE ${winml_lib_api_dir})                            # needed for generated headers
+target_include_directories(winml_lib_ort PRIVATE ${winml_lib_api_core_dir})
+target_include_directories(winml_lib_ort PRIVATE ${winml_lib_api_ort_dir})
+target_include_directories(winml_lib_ort PRIVATE ${winml_lib_common_dir}/inc)
+target_include_directories(winml_lib_ort PRIVATE ${ONNXRUNTIME_INCLUDE_DIR})
+target_include_directories(winml_lib_ort PRIVATE ${ONNXRUNTIME_ROOT})
+
+set_target_properties(winml_lib_ort
+  PROPERTIES
+  FOLDER
+  ${target_folder})
+
+# Add deps
+add_dependencies(winml_lib_ort winml_sdk_cppwinrt)
+add_dependencies(winml_lib_ort winml_api)
+add_dependencies(winml_lib_ort winml_api_native)
+add_dependencies(winml_lib_ort winml_api_native_internal)
+
+# Link libraries
+target_link_libraries(winml_lib_ort PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/packages/DirectML.0.0.1/build/DirectML.targets)
+target_link_libraries(winml_lib_ort PRIVATE wil)
+
+
 ###########################
 # Add winml_adapter
 ###########################
 
 list(APPEND winml_adapter_files
-    ${winml_adapter_dir}/CpuOrtSessionBuilder.cpp
-    ${winml_adapter_dir}/CpuOrtSessionBuilder.h
-    ${winml_adapter_dir}/CustomRegistryHelper.h
-    ${winml_adapter_dir}/FeatureDescriptorFactory.cpp
-    ${winml_adapter_dir}/FeatureDescriptorFactory.h
-    ${winml_adapter_dir}/LotusEnvironment.cpp
-    ${winml_adapter_dir}/LotusEnvironment.h
     ${winml_adapter_dir}/pch.h
-    ${winml_adapter_dir}/WinMLAdapter.cpp
-    ${winml_adapter_dir}/WinMLAdapter.h
-    ${winml_adapter_dir}/ZeroCopyInputStreamWrapper.cpp
-    ${winml_adapter_dir}/ZeroCopyInputStreamWrapper.h
+    ${winml_adapter_dir}/winml_adapter_apis.h
+    ${winml_adapter_dir}/winml_adapter_c_api.h
+    ${winml_adapter_dir}/winml_adapter_c_api.cpp
+    ${winml_adapter_dir}/winml_adapter_dml.cpp
+    ${winml_adapter_dir}/winml_adapter_environment.cpp
+    ${winml_adapter_dir}/winml_adapter_execution_provider.cpp
+    ${winml_adapter_dir}/winml_adapter_model.cpp
+    ${winml_adapter_dir}/winml_adapter_model.h
+    ${winml_adapter_dir}/winml_adapter_session.cpp
     )
-
+	
 if (onnxruntime_USE_DML)
   list(APPEND winml_adapter_files
-      ${winml_adapter_dir}/AbiCustomRegistryImpl.cpp
-      ${winml_adapter_dir}/AbiCustomRegistryImpl.h
-      ${winml_adapter_dir}/DmlOrtSessionBuilder.cpp
-      ${winml_adapter_dir}/DmlOrtSessionBuilder.h
-      )
+    ${winml_adapter_dir}/abi_custom_registry_impl.cpp
+    ${winml_adapter_dir}/abi_custom_registry_impl.h
+    )
 endif(onnxruntime_USE_DML)
 
 add_library(winml_adapter ${winml_adapter_files})
@@ -329,6 +400,7 @@ target_include_directories(winml_lib_api PRIVATE ${winml_lib_api_dir})
 target_include_directories(winml_lib_api PRIVATE ${winml_lib_api_dir}/pch)
 target_include_directories(winml_lib_api PRIVATE ${winml_adapter_dir})
 target_include_directories(winml_lib_api PRIVATE ${winml_lib_api_image_dir}/inc)
+target_include_directories(winml_lib_api PRIVATE ${winml_lib_api_ort_dir}/inc)
 target_include_directories(winml_lib_api PRIVATE ${winml_lib_telemetry_dir}/inc)
 target_include_directories(winml_lib_api PRIVATE ${winml_lib_common_dir}/inc)
 
@@ -370,6 +442,19 @@ endif(onnxruntime_USE_DML)
 ###########################
 
 add_library(winml_lib_common STATIC
+  ${winml_lib_common_dir}/inc/common.h
+  ${winml_lib_common_dir}/inc/CommonDeviceHelpers.h
+  ${winml_lib_common_dir}/inc/cppwinrt_onnx.h
+  ${winml_lib_common_dir}/inc/dx.h
+  ${winml_lib_common_dir}/inc/errors.h
+  ${winml_lib_common_dir}/inc/iengine.h
+  ${winml_lib_common_dir}/inc/NamespaceAliases.h
+  ${winml_lib_common_dir}/inc/onnx.h
+  ${winml_lib_common_dir}/inc/PheonixSingleton.h
+  ${winml_lib_common_dir}/inc/StringHelpers.h
+  ${winml_lib_common_dir}/inc/WinMLTelemetryHelper.h
+  ${winml_lib_common_dir}/inc/WinML_Lock.h
+  ${winml_lib_common_dir}/inc/winrt_headers.h
   ${winml_lib_common_dir}/CommonDeviceHelpers.cpp
 )
 
@@ -448,6 +533,7 @@ target_include_directories(winml_dll PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/winml/s
 target_include_directories(winml_dll PRIVATE ${winml_dll_dir})
 target_include_directories(winml_dll PRIVATE ${winml_lib_api_dir})
 target_include_directories(winml_dll PRIVATE ${winml_lib_api_dir}/impl)
+target_include_directories(winml_dll PRIVATE ${winml_lib_api_ort_dir}/inc)
 target_include_directories(winml_dll PRIVATE ${winml_adapter_dir})
 target_include_directories(winml_dll PRIVATE ${winml_lib_api_image_dir}/inc)
 target_include_directories(winml_dll PRIVATE ${winml_lib_telemetry_dir}/inc)
@@ -514,6 +600,7 @@ target_link_libraries(winml_dll PRIVATE re2)
 target_link_libraries(winml_dll PRIVATE wil)
 target_link_libraries(winml_dll PRIVATE winml_lib_api)
 target_link_libraries(winml_dll PRIVATE winml_lib_image)
+target_link_libraries(winml_dll PRIVATE winml_lib_ort)
 target_link_libraries(winml_dll PRIVATE winml_lib_telemetry)
 target_link_libraries(winml_dll PRIVATE delayimp.lib)
 target_link_libraries(winml_dll PRIVATE ${DBGHELP})

diff --git a/include/onnxruntime/core/providers/winml/winml_provider_factory.h b/include/onnxruntime/core/providers/winml/winml_provider_factory.h
@@ -1,14 +1,9 @@
 // Copyright (c) Microsoft Corporation. All rights reserved.
 // Licensed under the MIT License.
-#include "onnxruntime_c_api.h"
 
-#ifdef __cplusplus
-#include <WinMLAdapter.h>
-using namespace Windows::AI::MachineLearning::Adapter;
-#else
-struct IWinMLAdapter;
-typedef struct IWinMLAdapter IWinMLAdapter;
-#endif
+#include "onnxruntime_c_api.h"
 
-ORT_EXPORT STDAPI OrtGetWinMLAdapter(IWinMLAdapter** adapter);
+struct WinmlAdapterApi;
+typedef struct WinmlAdapterApi WinmlAdapterApi;
 
+ORT_EXPORT const WinmlAdapterApi* ORT_API_CALL OrtGetWinMLAdapter(_In_ const OrtApi* ort_api) NO_EXCEPTION;
diff --git a/include/onnxruntime/core/session/onnxruntime_c_api.h b/include/onnxruntime/core/session/onnxruntime_c_api.h
@@ -156,6 +156,8 @@ ORT_RUNTIME_CLASS(TypeInfo);
 ORT_RUNTIME_CLASS(TensorTypeAndShapeInfo);
 ORT_RUNTIME_CLASS(SessionOptions);
 ORT_RUNTIME_CLASS(CustomOpDomain);
+ORT_RUNTIME_CLASS(MapTypeInfo);
+ORT_RUNTIME_CLASS(SequenceTypeInfo);
 
 // When passing in an allocator to any ORT function, be sure that the allocator object
 // is not destroyed until the last allocated object using it is freed.

diff --git a/include/onnxruntime/core/session/onnxruntime_cxx_api.h b/include/onnxruntime/core/session/onnxruntime_cxx_api.h
@@ -75,17 +75,11 @@ ORT_DEFINE_RELEASE(Value);
 // This is used internally by the C++ API. This is the common base class used by the wrapper objects.
 template <typename T>
 struct Base {
-  Base() {
-    p_ = nullptr;
-  }
+  Base() = default;
   Base(T* p) : p_{p} {
     if (!p) throw Ort::Exception("Allocation failure", ORT_FAIL);
   }
-  ~Base() {
-    if (p_ != nullptr) {
-      OrtRelease(p_);
-    }
-  }
+  ~Base() { OrtRelease(p_); }
 
   operator T*() { return p_; }
   operator const T*() const { return p_; }
@@ -96,19 +90,12 @@ struct Base {
     return p;
   }
 
-  T** put() noexcept {
-    assert(p_ == nullptr);
-    return &p_;
-  }
-
  protected:
   Base(const Base&) = delete;
   Base& operator=(const Base&) = delete;
   Base(Base&& v) noexcept : p_{v.p_} { v.p_ = nullptr; }
   void operator=(Base&& v) noexcept {
-    if (p_ != nullptr) {
-      OrtRelease(p_);
-    }
+    OrtRelease(p_);
     p_ = v.p_;
     v.p_ = nullptr;
   }
@@ -275,7 +262,6 @@ struct Value : Base<OrtValue> {
 
   size_t GetStringTensorDataLength() const;
   void GetStringTensorContent(void* buffer, size_t buffer_length, size_t* offsets, size_t offsets_count) const;
-  std::vector<std::string> GetStrings();
 
   template <typename T>
   T* GetTensorMutableData();
@@ -306,9 +292,6 @@ struct MemoryInfo : Base<OrtMemoryInfo> {
   MemoryInfo(const char* name, OrtAllocatorType type, int id, OrtMemType mem_type);
 
   explicit MemoryInfo(OrtMemoryInfo* p) : Base<OrtMemoryInfo>{p} {}
-
-  const char* Name() const;
-  OrtMemType MemType() const;
 };
 
 //
@@ -371,4 +354,4 @@ struct CustomOpBase : OrtCustomOp {
 
 }  // namespace Ort
 
-#include "onnxruntime_cxx_inline.h"
+#include "onnxruntime_cxx_inline.h"
diff --git a/include/onnxruntime/core/session/onnxruntime_cxx_inline.h b/include/onnxruntime/core/session/onnxruntime_cxx_inline.h
@@ -76,18 +76,6 @@ inline MemoryInfo::MemoryInfo(const char* name, OrtAllocatorType type, int id, O
   ThrowOnError(Global<void>::api_.CreateMemoryInfo(name, type, id, mem_type, &p_));
 }
 
-inline const char* MemoryInfo::Name() const {
-  const char* out = nullptr;
-  ThrowOnError(Global<void>::api_.MemoryInfoGetName(p_, &out));
-  return out;
-}
-
-inline OrtMemType MemoryInfo::MemType() const {
-  OrtMemType out;
-  ThrowOnError(Global<void>::api_.MemoryInfoGetMemType(p_, &out));
-  return out;
-}
-
 inline Env::Env(OrtLoggingLevel default_warning_level, _In_ const char* logid) {
   ThrowOnError(Global<void>::api_.CreateEnv(default_warning_level, logid, &p_));
 }
@@ -357,21 +345,6 @@ inline Value Value::CreateTensor(const OrtMemoryInfo* info, T* p_data, size_t p_
   return CreateTensor(info, p_data, p_data_element_count * sizeof(T), shape, shape_len, TypeToTensorType<T>::type);
 }
 
-template <>
-inline Value Value::CreateTensor<std::string>(const OrtMemoryInfo*, std::string* p_data, size_t p_data_element_count, const int64_t* shape, size_t shape_len) {
-  // convert the array of std::string to an array of const char *
-  std::vector<const char*> string_vector;
-  for (size_t i = 0; i < p_data_element_count; ++i) {
-    string_vector.push_back(p_data[i].c_str());
-  }
-  // now make an empty tensor using the default allocator (strings have to make a copy)
-  AllocatorWithDefaultOptions allocator;
-  auto tensor = Value::CreateTensor(static_cast<OrtAllocator*>(allocator), shape, shape_len, ONNX_TENSOR_ELEMENT_DATA_TYPE_STRING);
-  // now fill the string data
-  ThrowOnError(GetApi().FillStringTensor(tensor, string_vector.data(), string_vector.size()));
-  return tensor;
-}
-
 inline Value Value::CreateTensor(const OrtMemoryInfo* info, void* p_data, size_t p_data_byte_count, const int64_t* shape, size_t shape_len,
                                  ONNXTensorElementDataType type) {
   OrtValue* out;
@@ -444,33 +417,6 @@ inline void Value::GetStringTensorContent(void* buffer, size_t buffer_length, si
   ThrowOnError(Global<void>::api_.GetStringTensorContent(p_, buffer, buffer_length, offsets, offsets_count));
 }
 
-inline std::vector<std::string> Value::GetStrings() {
-  std::vector<std::string> out;
-  // make sure this is an array of strings
-  auto shape = this->GetTensorTypeAndShapeInfo().GetShape();
-  // there needs to be only one dimension
-  if (shape.size() != 1) throw Ort::Exception("shape.size() != 1", ORT_INVALID_ARGUMENT);
-  // make a big buffer to hold all the string data
-  size_t buflen = this->GetStringTensorDataLength();
-  std::vector<uint8_t> buf(buflen);
-  std::vector<size_t> offsets(shape[0]);
-  this->GetStringTensorContent(buf.data(), buf.size(), offsets.data(), offsets.size());
-  // now go build all the strings
-  for (auto i = 0; i < shape[0]; ++i) {
-    std::string str;
-    size_t strlen = 0;
-    // are we on the last one?
-    if (i == (shape[0] - 1ll)) {
-      strlen = buflen - offsets[i];
-    } else {
-      strlen = offsets[i + 1ll] - offsets[i];
-    }
-    str.append(reinterpret_cast<const char *>(buf.data() + offsets[i]), strlen);
-    out.push_back(str);
-  }
-  return out;
-}
-
 template <typename T>
 T* Value::GetTensorMutableData() {
   T* out;
@@ -607,4 +553,4 @@ inline OrtValue* CustomOpApi::KernelContext_GetOutput(OrtKernelContext* context,
   return out;
 }
 
-}  // namespace Ort
+}  // namespace Ort
diff --git a/onnxruntime/core/framework/allocatormgr.cc b/onnxruntime/core/framework/allocatormgr.cc
@@ -29,9 +29,4 @@ AllocatorPtr CreateAllocator(DeviceAllocatorRegistrationInfo info, int device_id
   return AllocatorPtr(std::move(device_allocator));
 }
 
-DeviceAllocatorRegistry& DeviceAllocatorRegistry::Instance() {
-  static DeviceAllocatorRegistry s_instance;
-  return s_instance;
-}
-
 }  // namespace onnxruntime
diff --git a/onnxruntime/core/framework/allocatormgr.h b/onnxruntime/core/framework/allocatormgr.h
@@ -18,25 +18,4 @@ struct DeviceAllocatorRegistrationInfo {
 
 AllocatorPtr CreateAllocator(DeviceAllocatorRegistrationInfo info, int device_id = 0);
 
-class DeviceAllocatorRegistry {
- public:
-  void RegisterDeviceAllocator(std::string&& name, DeviceAllocatorFactory factory, size_t max_mem,
-                               OrtMemType mem_type = OrtMemTypeDefault) {
-    DeviceAllocatorRegistrationInfo info({mem_type, factory, max_mem});
-    device_allocator_registrations_.emplace(std::move(name), std::move(info));
-  }
-
-  const std::map<std::string, DeviceAllocatorRegistrationInfo>& AllRegistrations() const {
-    return device_allocator_registrations_;
-  }
-
-  static DeviceAllocatorRegistry& Instance();
-
- private:
-  DeviceAllocatorRegistry() = default;
-  ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(DeviceAllocatorRegistry);
-
-  std::map<std::string, DeviceAllocatorRegistrationInfo> device_allocator_registrations_;
-};
-
 }  // namespace onnxruntime