Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inliner] Don't count a call penalty for foldable __memcpy_chk and similar #117876

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

citymarina
Copy link
Contributor

When the size is an appropriate constant, __memcpy_chk will turn into a memcpy that gets folded away by InstCombine. Therefore this patch avoids counting these as calls for purposes of inlining costs.

This is only really relevant on platforms whose headers redirect memcpy to __memcpy_chk (such as Mac). On platforms that use intrinsics, memcpy and similar functions are already exempt from call penalties.

…milar

When the size is an appropriate constant, __memcpy_chk will turn into a
memcpy that gets folded away by InstCombine. Therefore this patch avoids
counting these as calls for purposes of inlining costs.

This is only really relevant on platforms whose headers redirect memcpy to
__memcpy_chk (such as Mac). On platforms that use intrinsics, memcpy and
similar functions are already exempt from call penalties.
@llvmbot
Copy link
Member

llvmbot commented Nov 27, 2024

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Marina Taylor (citymarina)

Changes

When the size is an appropriate constant, __memcpy_chk will turn into a memcpy that gets folded away by InstCombine. Therefore this patch avoids counting these as calls for purposes of inlining costs.

This is only really relevant on platforms whose headers redirect memcpy to __memcpy_chk (such as Mac). On platforms that use intrinsics, memcpy and similar functions are already exempt from call penalties.


Patch is 20.97 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117876.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Analysis/InlineCost.h (+2)
  • (modified) llvm/lib/Analysis/InlineCost.cpp (+73-17)
  • (added) llvm/test/Transforms/Inline/AArch64/memcpy-constant-size.ll (+114)
  • (added) llvm/test/Transforms/PhaseOrdering/AArch64/memcpy-constant-size.ll (+95)
diff --git a/llvm/include/llvm/Analysis/InlineCost.h b/llvm/include/llvm/Analysis/InlineCost.h
index 22e0b1bc901a43..ed54b0c077b4a4 100644
--- a/llvm/include/llvm/Analysis/InlineCost.h
+++ b/llvm/include/llvm/Analysis/InlineCost.h
@@ -318,6 +318,7 @@ std::optional<int> getInliningCostEstimate(
     CallBase &Call, TargetTransformInfo &CalleeTTI,
     function_ref<AssumptionCache &(Function &)> GetAssumptionCache,
     function_ref<BlockFrequencyInfo &(Function &)> GetBFI = nullptr,
+    function_ref<const TargetLibraryInfo &(Function &)> GetTLI = nullptr,
     ProfileSummaryInfo *PSI = nullptr,
     OptimizationRemarkEmitter *ORE = nullptr);
 
@@ -327,6 +328,7 @@ std::optional<InlineCostFeatures> getInliningCostFeatures(
     CallBase &Call, TargetTransformInfo &CalleeTTI,
     function_ref<AssumptionCache &(Function &)> GetAssumptionCache,
     function_ref<BlockFrequencyInfo &(Function &)> GetBFI = nullptr,
+    function_ref<const TargetLibraryInfo &(Function &)> GetTLI = nullptr,
     ProfileSummaryInfo *PSI = nullptr,
     OptimizationRemarkEmitter *ORE = nullptr);
 
diff --git a/llvm/lib/Analysis/InlineCost.cpp b/llvm/lib/Analysis/InlineCost.cpp
index 22bb406c01a4ed..798358814c62f6 100644
--- a/llvm/lib/Analysis/InlineCost.cpp
+++ b/llvm/lib/Analysis/InlineCost.cpp
@@ -249,6 +249,9 @@ class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
   /// Getter for BlockFrequencyInfo
   function_ref<BlockFrequencyInfo &(Function &)> GetBFI;
 
+  /// Getter for TargetLibraryInfo
+  function_ref<const TargetLibraryInfo &(Function &)> GetTLI;
+
   /// Profile summary information.
   ProfileSummaryInfo *PSI;
 
@@ -433,6 +436,7 @@ class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
   bool simplifyIntrinsicCallIsConstant(CallBase &CB);
   bool simplifyIntrinsicCallObjectSize(CallBase &CB);
   ConstantInt *stripAndComputeInBoundsConstantOffsets(Value *&V);
+  bool isLoweredToCall(Function *F, CallBase &Call);
 
   /// Return true if the given argument to the function being considered for
   /// inlining has the given attribute set either at the call site or the
@@ -492,13 +496,15 @@ class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
   bool visitUnreachableInst(UnreachableInst &I);
 
 public:
-  CallAnalyzer(Function &Callee, CallBase &Call, const TargetTransformInfo &TTI,
-               function_ref<AssumptionCache &(Function &)> GetAssumptionCache,
-               function_ref<BlockFrequencyInfo &(Function &)> GetBFI = nullptr,
-               ProfileSummaryInfo *PSI = nullptr,
-               OptimizationRemarkEmitter *ORE = nullptr)
+  CallAnalyzer(
+      Function &Callee, CallBase &Call, const TargetTransformInfo &TTI,
+      function_ref<AssumptionCache &(Function &)> GetAssumptionCache,
+      function_ref<BlockFrequencyInfo &(Function &)> GetBFI = nullptr,
+      function_ref<const TargetLibraryInfo &(Function &)> GetTLI = nullptr,
+      ProfileSummaryInfo *PSI = nullptr,
+      OptimizationRemarkEmitter *ORE = nullptr)
       : TTI(TTI), GetAssumptionCache(GetAssumptionCache), GetBFI(GetBFI),
-        PSI(PSI), F(Callee), DL(F.getDataLayout()), ORE(ORE),
+        GetTLI(GetTLI), PSI(PSI), F(Callee), DL(F.getDataLayout()), ORE(ORE),
         CandidateCall(Call) {}
 
   InlineResult analyze();
@@ -688,7 +694,8 @@ class InlineCostCallAnalyzer final : public CallAnalyzer {
       /// FIXME: if InlineCostCallAnalyzer is derived from, this may need
       /// to instantiate the derived class.
       InlineCostCallAnalyzer CA(*F, Call, IndirectCallParams, TTI,
-                                GetAssumptionCache, GetBFI, PSI, ORE, false);
+                                GetAssumptionCache, GetBFI, GetTLI, PSI, ORE,
+                                false);
       if (CA.analyze().isSuccess()) {
         // We were able to inline the indirect call! Subtract the cost from the
         // threshold to get the bonus we want to apply, but don't go below zero.
@@ -1106,10 +1113,12 @@ class InlineCostCallAnalyzer final : public CallAnalyzer {
       const TargetTransformInfo &TTI,
       function_ref<AssumptionCache &(Function &)> GetAssumptionCache,
       function_ref<BlockFrequencyInfo &(Function &)> GetBFI = nullptr,
+      function_ref<const TargetLibraryInfo &(Function &)> GetTLI = nullptr,
       ProfileSummaryInfo *PSI = nullptr,
       OptimizationRemarkEmitter *ORE = nullptr, bool BoostIndirect = true,
       bool IgnoreThreshold = false)
-      : CallAnalyzer(Callee, Call, TTI, GetAssumptionCache, GetBFI, PSI, ORE),
+      : CallAnalyzer(Callee, Call, TTI, GetAssumptionCache, GetBFI, GetTLI, PSI,
+                     ORE),
         ComputeFullInlineCost(OptComputeFullInlineCost ||
                               Params.ComputeFullInlineCost || ORE ||
                               isCostBenefitAnalysisEnabled()),
@@ -1228,8 +1237,8 @@ class InlineCostFeaturesAnalyzer final : public CallAnalyzer {
           InlineConstants::IndirectCallThreshold;
 
       InlineCostCallAnalyzer CA(*F, Call, IndirectCallParams, TTI,
-                                GetAssumptionCache, GetBFI, PSI, ORE, false,
-                                true);
+                                GetAssumptionCache, GetBFI, GetTLI, PSI, ORE,
+                                false, true);
       if (CA.analyze().isSuccess()) {
         increment(InlineCostFeatureIndex::nested_inline_cost_estimate,
                   CA.getCost());
@@ -1355,9 +1364,11 @@ class InlineCostFeaturesAnalyzer final : public CallAnalyzer {
       const TargetTransformInfo &TTI,
       function_ref<AssumptionCache &(Function &)> &GetAssumptionCache,
       function_ref<BlockFrequencyInfo &(Function &)> GetBFI,
+      function_ref<const TargetLibraryInfo &(Function &)> GetTLI,
       ProfileSummaryInfo *PSI, OptimizationRemarkEmitter *ORE, Function &Callee,
       CallBase &Call)
-      : CallAnalyzer(Callee, Call, TTI, GetAssumptionCache, GetBFI, PSI) {}
+      : CallAnalyzer(Callee, Call, TTI, GetAssumptionCache, GetBFI, GetTLI,
+                     PSI) {}
 
   const InlineCostFeatures &features() const { return Cost; }
 };
@@ -2260,6 +2271,48 @@ bool CallAnalyzer::simplifyCallSite(Function *F, CallBase &Call) {
   return false;
 }
 
+bool CallAnalyzer::isLoweredToCall(Function *F, CallBase &Call) {
+  // Calls to memcpy with a known constant size 1/2/4/8 should not incur a call
+  // penalty, as the calls will be folded away by InstCombine. This is only
+  // really relevant on platforms whose headers redirect memcpy to __memcpy_chk
+  // (e.g. Mac), as other platforms use memcpy intrinsics, which are already
+  // exempt from the call penalty.
+  if (GetTLI) {
+    auto TLI = GetTLI(*F);
+    LibFunc LF;
+    if (TLI.getLibFunc(*F, LF) && TLI.has(LF)) {
+      switch (LF) {
+      case LibFunc_memcpy_chk:
+      case LibFunc_memmove_chk:
+      case LibFunc_mempcpy_chk:
+      case LibFunc_memset_chk: {
+        auto LenOp = dyn_cast_or_null<ConstantInt>(Call.getOperand(2));
+        if (!LenOp)
+          LenOp = dyn_cast_or_null<ConstantInt>(
+              SimplifiedValues.lookup(Call.getOperand(2)));
+        auto ObjSizeOp = dyn_cast_or_null<ConstantInt>(Call.getOperand(3));
+        if (!ObjSizeOp)
+          ObjSizeOp = dyn_cast_or_null<ConstantInt>(
+              SimplifiedValues.lookup(Call.getOperand(3)));
+        if (LenOp && ObjSizeOp) {
+          auto Len = LenOp->getLimitedValue();
+          auto ObjSize = ObjSizeOp->getLimitedValue();
+          if (ObjSize >= Len &&
+              (Len == 1 || Len == 2 || Len == 4 || Len == 8)) {
+            return false;
+          }
+        }
+        break;
+      }
+      default:
+        break;
+      }
+    }
+  }
+
+  return TTI.isLoweredToCall(F);
+}
+
 bool CallAnalyzer::visitCallBase(CallBase &Call) {
   if (!onCallBaseVisitStart(Call))
     return true;
@@ -2341,7 +2394,7 @@ bool CallAnalyzer::visitCallBase(CallBase &Call) {
       return false;
   }
 
-  if (TTI.isLoweredToCall(F)) {
+  if (isLoweredToCall(F, Call)) {
     onLoweredCall(F, Call, IsIndirectCall);
   }
 
@@ -2945,6 +2998,7 @@ std::optional<int> llvm::getInliningCostEstimate(
     CallBase &Call, TargetTransformInfo &CalleeTTI,
     function_ref<AssumptionCache &(Function &)> GetAssumptionCache,
     function_ref<BlockFrequencyInfo &(Function &)> GetBFI,
+    function_ref<const TargetLibraryInfo &(Function &)> GetTLI,
     ProfileSummaryInfo *PSI, OptimizationRemarkEmitter *ORE) {
   const InlineParams Params = {/* DefaultThreshold*/ 0,
                                /*HintThreshold*/ {},
@@ -2958,7 +3012,7 @@ std::optional<int> llvm::getInliningCostEstimate(
                                /*EnableDeferral*/ true};
 
   InlineCostCallAnalyzer CA(*Call.getCalledFunction(), Call, Params, CalleeTTI,
-                            GetAssumptionCache, GetBFI, PSI, ORE, true,
+                            GetAssumptionCache, GetBFI, GetTLI, PSI, ORE, true,
                             /*IgnoreThreshold*/ true);
   auto R = CA.analyze();
   if (!R.isSuccess())
@@ -2970,9 +3024,10 @@ std::optional<InlineCostFeatures> llvm::getInliningCostFeatures(
     CallBase &Call, TargetTransformInfo &CalleeTTI,
     function_ref<AssumptionCache &(Function &)> GetAssumptionCache,
     function_ref<BlockFrequencyInfo &(Function &)> GetBFI,
+    function_ref<const TargetLibraryInfo &(Function &)> GetTLI,
     ProfileSummaryInfo *PSI, OptimizationRemarkEmitter *ORE) {
-  InlineCostFeaturesAnalyzer CFA(CalleeTTI, GetAssumptionCache, GetBFI, PSI,
-                                 ORE, *Call.getCalledFunction(), Call);
+  InlineCostFeaturesAnalyzer CFA(CalleeTTI, GetAssumptionCache, GetBFI, GetTLI,
+                                 PSI, ORE, *Call.getCalledFunction(), Call);
   auto R = CFA.analyze();
   if (!R.isSuccess())
     return std::nullopt;
@@ -3072,7 +3127,7 @@ InlineCost llvm::getInlineCost(
                           << ")\n");
 
   InlineCostCallAnalyzer CA(*Callee, Call, Params, CalleeTTI,
-                            GetAssumptionCache, GetBFI, PSI, ORE);
+                            GetAssumptionCache, GetBFI, GetTLI, PSI, ORE);
   InlineResult ShouldInline = CA.analyze();
 
   LLVM_DEBUG(CA.dump());
@@ -3263,7 +3318,8 @@ InlineCostAnnotationPrinterPass::run(Function &F,
           continue;
         OptimizationRemarkEmitter ORE(CalledFunction);
         InlineCostCallAnalyzer ICCA(*CalledFunction, *CB, Params, TTI,
-                                    GetAssumptionCache, nullptr, &PSI, &ORE);
+                                    GetAssumptionCache, nullptr, nullptr, &PSI,
+                                    &ORE);
         ICCA.analyze();
         OS << "      Analyzing call of " << CalledFunction->getName()
            << "... (caller:" << CB->getCaller()->getName() << ")\n";
diff --git a/llvm/test/Transforms/Inline/AArch64/memcpy-constant-size.ll b/llvm/test/Transforms/Inline/AArch64/memcpy-constant-size.ll
new file mode 100644
index 00000000000000..763b6d2ca39091
--- /dev/null
+++ b/llvm/test/Transforms/Inline/AArch64/memcpy-constant-size.ll
@@ -0,0 +1,114 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt %s -mtriple=arm64-apple-macosx -passes=inline -inline-threshold=2 -inline-call-penalty=5 -S | FileCheck %s
+
+declare i64 @llvm.objectsize.i64.p0(ptr, i1, i1, i1)
+declare ptr @__memcpy_chk(ptr, ptr, i64, i64)
+declare ptr @__memmove_chk(ptr, ptr, i64, i64)
+declare ptr @__mempcpy_chk(ptr, ptr, i64, i64)
+declare ptr @__memset_chk(ptr, i32, i64, i64)
+
+define void @callee(ptr %dst, ptr %src, i64 %size) {
+; CHECK-LABEL: define void @callee
+; CHECK-SAME: (ptr [[DST:%.*]], ptr [[SRC:%.*]], i64 [[SIZE:%.*]]) {
+; CHECK-NEXT:    [[OBJSIZE:%.*]] = call i64 @llvm.objectsize.i64.p0(ptr [[DST]], i1 false, i1 true, i1 false)
+; CHECK-NEXT:    [[CALL_MEMCPY:%.*]] = call ptr @__memcpy_chk(ptr [[DST]], ptr [[SRC]], i64 [[SIZE]], i64 [[OBJSIZE]])
+; CHECK-NEXT:    [[CALL_MEMMOVE:%.*]] = call ptr @__memmove_chk(ptr [[DST]], ptr [[SRC]], i64 [[SIZE]], i64 [[OBJSIZE]])
+; CHECK-NEXT:    [[CALL_MEMPCPY:%.*]] = call ptr @__mempcpy_chk(ptr [[DST]], ptr [[SRC]], i64 [[SIZE]], i64 [[OBJSIZE]])
+; CHECK-NEXT:    [[CALL_MEMSET:%.*]] = call ptr @__memset_chk(ptr [[DST]], i32 0, i64 [[SIZE]], i64 [[OBJSIZE]])
+; CHECK-NEXT:    ret void
+;
+  %objsize = call i64 @llvm.objectsize.i64.p0(ptr %dst, i1 false, i1 true, i1 false)
+  %call.memcpy = call ptr @__memcpy_chk(ptr %dst, ptr %src, i64 %size, i64 %objsize)
+  %call.memmove = call ptr @__memmove_chk(ptr %dst, ptr %src, i64 %size, i64 %objsize)
+  %call.mempcpy = call ptr @__mempcpy_chk(ptr %dst, ptr %src, i64 %size, i64 %objsize)
+  %call.memset = call ptr @__memset_chk(ptr %dst, i32 0, i64 %size, i64 %objsize)
+  ret void
+}
+
+define void @caller3(ptr %dst, ptr %src) {
+; CHECK-LABEL: define void @caller3
+; CHECK-SAME: (ptr [[DST:%.*]], ptr [[SRC:%.*]]) {
+; CHECK-NEXT:    call void @callee(ptr [[DST]], ptr [[SRC]], i64 3)
+; CHECK-NEXT:    ret void
+;
+  call void @callee(ptr %dst, ptr %src, i64 3)
+  ret void
+}
+
+define void @caller4(ptr %dst, ptr %src) {
+; CHECK-LABEL: define void @caller4
+; CHECK-SAME: (ptr [[DST:%.*]], ptr [[SRC:%.*]]) {
+; CHECK-NEXT:    [[OBJSIZE_I:%.*]] = call i64 @llvm.objectsize.i64.p0(ptr [[DST]], i1 false, i1 true, i1 false)
+; CHECK-NEXT:    [[CALL_MEMCPY_I:%.*]] = call ptr @__memcpy_chk(ptr [[DST]], ptr [[SRC]], i64 4, i64 [[OBJSIZE_I]])
+; CHECK-NEXT:    [[CALL_MEMMOVE_I:%.*]] = call ptr @__memmove_chk(ptr [[DST]], ptr [[SRC]], i64 4, i64 [[OBJSIZE_I]])
+; CHECK-NEXT:    [[CALL_MEMPCPY_I:%.*]] = call ptr @__mempcpy_chk(ptr [[DST]], ptr [[SRC]], i64 4, i64 [[OBJSIZE_I]])
+; CHECK-NEXT:    [[CALL_MEMSET_I:%.*]] = call ptr @__memset_chk(ptr [[DST]], i32 0, i64 4, i64 [[OBJSIZE_I]])
+; CHECK-NEXT:    ret void
+;
+  call void @callee(ptr %dst, ptr %src, i64 4)
+  ret void
+}
+
+define void @objsize_toosmall_callee(ptr %dst, ptr %src, i64 %size) {
+; CHECK-LABEL: define void @objsize_toosmall_callee
+; CHECK-SAME: (ptr [[DST:%.*]], ptr [[SRC:%.*]], i64 [[SIZE:%.*]]) {
+; CHECK-NEXT:    [[CALL_MEMCPY:%.*]] = call ptr @__memcpy_chk(ptr [[DST]], ptr [[SRC]], i64 [[SIZE]], i64 1)
+; CHECK-NEXT:    [[CALL_MEMMOVE:%.*]] = call ptr @__memmove_chk(ptr [[DST]], ptr [[SRC]], i64 [[SIZE]], i64 1)
+; CHECK-NEXT:    [[CALL_MEMPCPY:%.*]] = call ptr @__mempcpy_chk(ptr [[DST]], ptr [[SRC]], i64 [[SIZE]], i64 1)
+; CHECK-NEXT:    [[CALL_MEMSET:%.*]] = call ptr @__memset_chk(ptr [[DST]], i32 0, i64 [[SIZE]], i64 1)
+; CHECK-NEXT:    ret void
+;
+  %call.memcpy = call ptr @__memcpy_chk(ptr %dst, ptr %src, i64 %size, i64 1)
+  %call.memmove = call ptr @__memmove_chk(ptr %dst, ptr %src, i64 %size, i64 1)
+  %call.mempcpy = call ptr @__mempcpy_chk(ptr %dst, ptr %src, i64 %size, i64 1)
+  %call.memset = call ptr @__memset_chk(ptr %dst, i32 0, i64 %size, i64 1)
+  ret void
+}
+
+define void @objsize_toosmall_caller(ptr %dst, ptr %src) {
+; CHECK-LABEL: define void @objsize_toosmall_caller
+; CHECK-SAME: (ptr [[DST:%.*]], ptr [[SRC:%.*]]) {
+; CHECK-NEXT:    call void @objsize_toosmall_callee(ptr [[DST]], ptr [[SRC]], i64 4)
+; CHECK-NEXT:    ret void
+;
+  call void @objsize_toosmall_callee(ptr %dst, ptr %src, i64 4)
+  ret void
+}
+
+define void @intrinsics_callee(ptr %dst, ptr %src, i64 %size) {
+; CHECK-LABEL: define void @intrinsics_callee
+; CHECK-SAME: (ptr [[DST:%.*]], ptr [[SRC:%.*]], i64 [[SIZE:%.*]]) {
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[DST]], ptr [[SRC]], i64 4, i1 false)
+; CHECK-NEXT:    call void @llvm.memmove.p0.p0.i64(ptr [[DST]], ptr [[SRC]], i64 4, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[DST]], i8 0, i64 4, i1 false)
+; CHECK-NEXT:    ret void
+;
+  call void @llvm.memcpy.p0.p0.i64(ptr %dst, ptr %src, i64 4, i1 false)
+  call void @llvm.memmove.p0.p0.i64(ptr %dst, ptr %src, i64 4, i1 false)
+  call void @llvm.memset.p0.i64(ptr %dst, i8 0, i64 4, i1 false)
+  ret void
+}
+
+define void @intrinsics_caller3(ptr %dst, ptr %src) {
+; CHECK-LABEL: define void @intrinsics_caller3
+; CHECK-SAME: (ptr [[DST:%.*]], ptr [[SRC:%.*]]) {
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[DST]], ptr [[SRC]], i64 4, i1 false)
+; CHECK-NEXT:    call void @llvm.memmove.p0.p0.i64(ptr [[DST]], ptr [[SRC]], i64 4, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[DST]], i8 0, i64 4, i1 false)
+; CHECK-NEXT:    ret void
+;
+  call void @intrinsics_callee(ptr %dst, ptr %src, i64 3)
+  ret void
+}
+
+define void @intrinsics_caller4(ptr %dst, ptr %src) {
+; CHECK-LABEL: define void @intrinsics_caller4
+; CHECK-SAME: (ptr [[DST:%.*]], ptr [[SRC:%.*]]) {
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[DST]], ptr [[SRC]], i64 4, i1 false)
+; CHECK-NEXT:    call void @llvm.memmove.p0.p0.i64(ptr [[DST]], ptr [[SRC]], i64 4, i1 false)
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[DST]], i8 0, i64 4, i1 false)
+; CHECK-NEXT:    ret void
+;
+  call void @intrinsics_callee(ptr %dst, ptr %src, i64 4)
+  ret void
+}
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/memcpy-constant-size.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/memcpy-constant-size.ll
new file mode 100644
index 00000000000000..10b07ad6e7491d
--- /dev/null
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/memcpy-constant-size.ll
@@ -0,0 +1,95 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt %s -mtriple=arm64-apple-macosx -passes='default<O3>' -inline-threshold=2 -inline-call-penalty=5 -S | FileCheck %s
+
+declare i64 @llvm.objectsize.i64.p0(ptr, i1, i1, i1)
+declare ptr @__memcpy_chk(ptr, ptr, i64, i64)
+declare ptr @__memmove_chk(ptr, ptr, i64, i64)
+declare ptr @__mempcpy_chk(ptr, ptr, i64, i64)
+declare ptr @__memset_chk(ptr, i32, i64, i64)
+
+define void @callee_memcpy(ptr %dst, ptr %src, i64 %size) {
+; CHECK-LABEL: define void @callee_memcpy
+; CHECK-SAME: (ptr [[DST:%.*]], ptr nocapture readonly [[SRC:%.*]], i64 [[SIZE:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    tail call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[DST]], ptr align 1 [[SRC]], i64 [[SIZE]], i1 false)
+; CHECK-NEXT:    ret void
+;
+  %objsize = call i64 @llvm.objectsize.i64.p0(ptr %dst, i1 false, i1 true, i1 false)
+  %call.memcpy = call ptr @__memcpy_chk(ptr %dst, ptr %src, i64 %size, i64 %objsize)
+  ret void
+}
+
+define void @callee_memmove(ptr %dst, ptr %src, i64 %size) {
+; CHECK-LABEL: define void @callee_memmove
+; CHECK-SAME: (ptr [[DST:%.*]], ptr [[SRC:%.*]], i64 [[SIZE:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:    tail call void @llvm.memmove.p0.p0.i64(ptr align 1 [[DST]], ptr align 1 [[SRC]], i64 [[SIZE]], i1 false)
+; CHECK-NEXT:    ret void
+;
+  %objsize = call i64 @llvm.objectsize.i64.p0(ptr %dst, i1 false, i1 true, i1 false)
+  %call.memmove = call ptr @__memmove_chk(ptr %dst, ptr %src, i64 %size, i64 %objsize)
+  ret void
+}
+
+define void @callee_mempcpy(ptr %dst, ptr %src, i64 %size) {
+; CHECK-LABEL: define void @callee_mempcpy
+; CHECK-SAME: (ptr [[DST:%.*]], ptr [[SRC:%.*]], i64 [[SIZE:%.*]]) local_unnamed_addr #[[ATTR1]] {
+; CHECK-NEXT:    tail call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[DST]], ptr align 1 [[SRC]], i64 [[SIZE]], i1 false)
+; CHECK-NEXT:    ret void
+;
+  %objsize = call i64 @llvm.objectsize.i64.p0(ptr %dst, i1 false, i1 true, i1 false)
+  %call.mempcpy = call ptr @__mempcpy_chk(ptr %dst, ptr %src, i64 %size, i64 %objsize)
+  ret void
+}
+
+define void @callee_memset(ptr %dst, i64 %size) {
+; CHECK-LABEL: define void @callee_memset
+; CHECK-SAME: (ptr [[DST:%.*]], i64 [[SIZE:%.*]]) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:    tail call void @llvm.memset.p0.i64(ptr align 1 [[DST]], i8 0, i64 [[SIZE]], i1 false)
+; CHECK-NEXT:    ret void
+;
+  %objsize = call i64 @llvm.objectsize.i64.p0(ptr %dst, i1 false, i1 true, i1 false)
+  %call.mempcpy = call ptr @__memset_chk(ptr %dst, i32 0, i64 %size, i64 %objsize)
+  ret void
+}
+
+define void @caller_memcpy(ptr %dst, ptr %src) {
+; CHECK-LABEL: define void @caller_memcpy
+; CHECK-SAME: (ptr [[DST:%.*]], ptr nocapture readonly [[SRC:%.*]]) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[SRC]], align 1
+; CHECK-NEXT:    store i32 [[TMP1]], ptr [[DST]], align 1
+; CHECK-NEXT:    ret void
+;
+  call void @callee_memcpy(ptr %dst, ptr %src, i64 4)
+  ret void
+}
+
+define void @caller_memmove(ptr %dst, ptr %src) {
+; CHECK-LABEL: define void @caller_memmove
+; CHE...
[truncated]

Comment on lines +2280 to +2283
if (GetTLI) {
auto TLI = GetTLI(*F);
LibFunc LF;
if (TLI.getLibFunc(*F, LF) && TLI.has(LF)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth adding an early exit if GetTLI isn't available or it's not a libfunc, to reduce the indent level in the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way do you suggest to exit? The ultimate fallback at the end of the function is the TTI, so neither a quick return true nor return false would be ideal here.

@@ -2260,6 +2271,48 @@ bool CallAnalyzer::simplifyCallSite(Function *F, CallBase &Call) {
return false;
}

bool CallAnalyzer::isLoweredToCall(Function *F, CallBase &Call) {
// Calls to memcpy with a known constant size 1/2/4/8 should not incur a call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to move the comment to the case in the switch doing the checking, in case other libfuncs are added in the future to the switch?

Comment on lines +2289 to +2292
auto LenOp = dyn_cast_or_null<ConstantInt>(Call.getOperand(2));
if (!LenOp)
LenOp = dyn_cast_or_null<ConstantInt>(
SimplifiedValues.lookup(Call.getOperand(2)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up refactoring, it would be nice to introduce a helper that returns a constant if either the value itself is a constant or if it has an entry in SimplifiedValues that is constant. I think there are other places that could also use such a helper.

Comment on lines +2280 to +2283
if (GetTLI) {
auto TLI = GetTLI(*F);
LibFunc LF;
if (TLI.getLibFunc(*F, LF) && TLI.has(LF)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like below, which would need to duplicate the fallback

Suggested change
if (GetTLI) {
auto TLI = GetTLI(*F);
LibFunc LF;
if (TLI.getLibFunc(*F, LF) && TLI.has(LF)) {
TargetLibraryInfo *TLI = GetTLI ? &GetTLI(*F) : nullptr;
LibFunc LF;
if (!TLI || !TLI->getLibFunc(*F, LF) || !TLI->has(LF) )
return TI.isLoweredToCall(F);

auto Len = LenOp->getLimitedValue();
auto ObjSize = ObjSizeOp->getLimitedValue();
if (ObjSize >= Len &&
(Len == 1 || Len == 2 || Len == 4 || Len == 8)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the check for these lengths? Even if only these are folded in the middle-end, the back-end will lower fixed-length mallocs that are substantially larger than this to inline load+store expansions as well. It doesn't look like the add a penalty to large llvm.memcpy intrinsics, so we probably shouldn't be doing it here either, if we know we'll fold to llvm.memcpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants