Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xformer is built with files from tensorflow instead of lib_tflite_micro #924

Open
andresovela opened this issue Sep 6, 2024 · 7 comments

Comments

@andresovela
Copy link
Contributor

andresovela commented Sep 6, 2024

I've spent the last few days trying to integrate some TFLM patches into xformer, but I failed to add some simple changes due to what I now know is duplicated dependencies.

I'll describe how I found out that the xformer build pulls in files from the wrong repository as best as I can:

I wanted to patch in int8 bias support for CONV_2D in TFLM. So I added the following code here (ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/micro/kernels/conv.cc):

+      } else if (bias->type == kTfLiteInt8) {
+        reference_integer_ops::ConvPerChannel(
+            ConvParamsQuantized(params, data),
+            data.per_channel_output_multiplier, data.per_channel_output_shift,
+            tflite::micro::GetTensorShape(input),
+            tflite::micro::GetTensorData<int16_t>(input),
+            tflite::micro::GetTensorShape(filter),
+            tflite::micro::GetTensorData<int8_t>(filter),
+            tflite::micro::GetTensorShape(bias),
+            tflite::micro::GetOptionalTensorData<std::int8_t>(bias),
+            tflite::micro::GetTensorShape(output),
+            tflite::micro::GetTensorData<int16_t>(output));

That seemed to work, but it complained about a missing int8_t implementation for MultiplyByQuantizedMultiplier() defined here (ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.h).

So I proceeded to add the missing function both to common.h and common.cc located in ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/, but the linker somehow does not see the implementation I just added in ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.cc.

error: undefined reference to 'tflite::MultiplyByQuantizedMultiplier(signed char, int, int)'

After some experimentation, I found out that if I removed my changes to conv.cc (the diff shown above is no longer relevant from now on), I could actually comment out the int32/int64 impls for MultiplyByQuantizedMultiplier() in common.cc, but leave the declarations in common.h untouched, and the program would still build. The linker somehow is able to find the implementations for these functions elsewhere. Note that if I remove the function prototypes from common.h, the program does not compile anymore:

external/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/reference/integer_ops/transpose_conv.h:209:61: error: 'MultiplyByQuantizedMultiplier' was not declared in this scope

So the conclusion is that the build system is using the function prototypes from ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.h, but it is not using the implementations from ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.cc.

After two days of investigation, I found out that the implementations being used are actually from ai_tools/xformer/bazel-xformer/external/org_tensorflow/tensorflow/lite/kernels/internal/common.cc. It looks like this is a copy of the tensorflow repo that gets downloaded by Bazel when you build xformer.

I found this comment in the BUILD file:

ai_tools/xformer/BUILD

Lines 258 to 264 in 9b32da3

# Tensorflow and tflite-micro contain common tflite files in global namespace.
# Tell the compiler to allow multiple definitions when linking this.
linkopts = select({
"@org_tensorflow//tensorflow:macos": [],
"@org_tensorflow//tensorflow:windows": [],
"//conditions:default": ["-Wl,-z,muldefs"],
}),

All of this is to say that I find this duplication rather unintuitive, and it results in xformer unintentionally being built not with the source code from https://github.com/xmos/lib_tflite_micro, but rather code from https://github.com/tensorflow/tensorflow

I am not familiar with the Bazel build system, so I haven't been able to find a solution for this yet.

@panickal-xmos
Copy link
Collaborator

Hi Andres, thank you. Let me take a look. Bazel is unfortunately not that straightforward, but the project is tied to it due to the Tensorflow dependency. I didn't quite understand the 8-bit bias. Why would be bias be 8-bit?

@andresovela
Copy link
Contributor Author

I wrote you an email about the 8-bit bias issue a few days ago. However, this is not relevant to this issue. I was just trying to get the model to move past that problem to see what else needs patching.

@andresovela
Copy link
Contributor Author

andresovela commented Sep 6, 2024

Btw, this is not a suspicion. I confirmed that the symbols being linked come from the tensorflow code:

I compiled the project with debug symbols and I ran the following:

$ objdump -t xcore-opt | rg MultiplyByQuantizedMultiplier
00000000014c25d0  w    F .text	0000000000000a1c              _ZN6tflite34MultiplyByQuantizedMultiplier4RowsE11int32x4x4_tii
00000000014c9a08 g     F .text	0000000000000120              _ZN6tflite29MultiplyByQuantizedMultiplierElii
000000000053fc0b  w    F .text	0000000000000029              _ZN6tflite43MultiplyByQuantizedMultiplierGreaterThanOneEiii
000000000053fbd4  w    F .text	0000000000000037              _ZN6tflite46MultiplyByQuantizedMultiplierSmallerThanOneExpEiii
00000000014c99af g     F .text	0000000000000059              _ZN6tflite29MultiplyByQuantizedMultiplierEiii

$ addr2line -e xcore-opt 0x14c99af
/proc/self/cwd/external/org_tensorflow/tensorflow/lite/kernels/internal/common.cc:21

@andresovela
Copy link
Contributor Author

andresovela commented Sep 6, 2024

I figured this one out.

I had to add "@tflite_micro//tensorflow/lite/kernels/internal:common.cc", here:

filegroup(
name = "TFLITE_SOURCES",
srcs = [
"@tflite_micro//tensorflow/lite/core/c:common.cc",
"@tflite_micro//tensorflow/lite/core/api:error_reporter.cc",
"@tflite_micro//tensorflow/lite/core/api:tensor_utils.cc",
"@tflite_micro//tensorflow/lite/core/api:flatbuffer_conversions.cc",
"@tflite_micro//tensorflow/lite/kernels:kernel_util.cc",
"@tflite_micro//tensorflow/lite/kernels/internal:quantization_util.cc",
"@tflite_micro//tensorflow/lite/kernels/internal:portable_tensor_utils.cc",
"@tflite_micro//tensorflow/lite/kernels/internal:tensor_ctypes.cc",
"@tflite_micro//tensorflow/lite/schema:schema_utils.cc",
"@tflite_micro//tensorflow/lite/micro:micro_log.cc",
],
)

While it makes sense from a build system point-of-view, I think this is a massive footgun. In my case, I was able to notice that the wrong file was being compiled in because I was trying to add a new function. However, if I had tried to just change an existing function, I'd have wasted even more time chasing myself around the room trying to figure out why my changes are not working.

Right now, it's even hard to tell whether there currently are patches that are not being included in xformer due to the source file not being listed in lib_tflmc.BUILD, because the program would still compile using the original tensorflow sources as a fallback.

@panickal-xmos
Copy link
Collaborator

Thank you @andresovela for investigating this. Tensorflow and tflite-micro used to be the same repo, but since they split into two repos, it's difficult to manage this in a clean manner. They both use tflite namespace and functions with the same name with subtly different functionality.
One potential solution is to rename all tflite namespaces in the tflite-micro repo to tflite_micro namespace or enclose the current tflite namespace in the tflite-micro repo inside a new tflite_micro namespace. But that would be a change that would touch almost all files in the tflite-micro repo, and I doubt if we can upstream that.
Thoughts?

@andresovela
Copy link
Contributor Author

That's not a bad idea. I'll propose it upstream and see if I can get a discussion going.

@panickal-xmos
Copy link
Collaborator

#928 adds in the namespace changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants