Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CIRGen][Builtin][Neon] Lower neon_vabs_v and neon_vabsq_v #1081

Draft
wants to merge 2,033 commits into
base: main
Choose a base branch
from

Conversation

ghehg
Copy link
Collaborator

@ghehg ghehg commented Nov 7, 2024

Now implement the same as OG, which is to call llvm aarch64 intrinsic which would eventually become an ARM64 instruction.
However, clearly there is an alternative, which is to extend CIR::AbsOp and CIR::FAbsOp to support vector type and only lower it at LLVM Lowering stage to either LLVM::FAbsOP or [LLVM::AbsOP ], provided LLVM dialect could do the right thing of TargetLowering by translating to llvm aarch64 intrinsic eventually.

The question is whether it is worth doing it?

Any way, put up this diff for suggestions and ideas.

bcardosolopes and others added 30 commits November 2, 2024 23:02
This is currently crashing on Linux only internally, but not on GitHub's CI,
disable it temporarily while we investigate.
This is a straightforward adaption of existing CodeGen logic. While I'm
here, move block comments inside their blocks, so that they look nicer.
…VM (llvm#810)

As title.
Add setAlignmentAttr for GlobalOps created from AST.
LLVM Lowering should have LLVM GlobalOp's alignment attribute inherited
from CIR::GlobalOp.

This PR is definitely needed to fix issue
llvm#801 (comment), but
the issue doesn't have alignment and comdat attribute for CIR Ops to
begin with, so I'll keep investigating and fix CIR problem in another
PR.
…y cast op (llvm#812)

There are two occurrences of `cir.cast(array_to_ptrdecay, ...)` that
drop address spaces unexpectedly for its result pointer type. This PR
fixes them with the source address space.

```mlir
// Before
%1 = cir.cast(array_to_ptrdecay, %0 : !cir.ptr<!cir.array<!s32i x 32>, addrspace(offload_local)>), !cir.ptr<!s32i>
// After
%1 = cir.cast(array_to_ptrdecay, %0 : !cir.ptr<!cir.array<!s32i x 32>, addrspace(offload_local)>), !cir.ptr<!s32i, addrspace(offload_local)>
```
As mentioned at
llvm#809 (comment) , this
PR adds more simplify transformations for select op:

- `cir.select if %0 then x else x` -> `x`
- `cir.select if %0 then #true else #false` -> `%0`
- `cir.select if %0 then #false else #true` -> `cir.unary not %0`
…p ops (llvm#818)

This PR makes simple lowering generate the result type lowering logic
and make it suitable for unary fp2int operations and binary fp2fp
operations.
…ization (llvm#820)

This PR adds initial support for temporary materialization of local
temporaries with trivial cleanups.

Materialization of global temporaries and local temporaries with
non-trivial cleanups is far more trickier that I initially thought and I
decide to submit this easy part first.
This is a straightforward adaption from CodeGen. I checked the uses of
the Delegating arg that's passed in various places, and it only appears
to be used by virtual inheritance, which should be handled by
llvm#624.
…on (llvm#813)

Currently some bitcasts would silently change the address space of
source pointer type, which hides some miscompilations of pointer type in
CIR.

llvm#812 is an example. The address space in result pointer type is dropped,
but the bitcast later keep the type consistency no matter what the
result type is. Such bitcast is commonly emitted in CodeGen.

CIR bitcasts are lowered to LLVM bitcasts, which also don't allow
mismatch between address spaces. This PR adds this verification.
We were previously printing the type alias for `struct S` as `!ty_22S22`
instead of just `!ty_S`. This was because our alias computation for a
StructType did the following:

    os << "ty_" << structType.getName()

`structType.getName()` is a `StringAttr`, and writing a `StringAttr` to
an output stream adds double quotes around the actual string [1][2].
These double quotes then get hex-encoded as 22 [3].

We can fix this by writing the actual string value instead. Aliases that
would end in a number will now receive a trailing underscore because of
MLIR's alias sanitization not allowing a trailing digit [4] (which
ironically didn't kick in even though our aliases were always previously
ending with a number, which might be a bug in the sanitization logic).
Aliases containing other special characters (e.g. `::`) will still be
escaped as before. In other words:

```
struct S {};
// before: !ty_22S22 = ...
// after:  !ty_S = ...

struct S1 {};
// before: !ty_22S122 = ...
// after:  !ty_S1_ = ...

struct std::string {};
// before: !ty_22std3A3Astring22
// after:  !ty_std3A3Astring
```

I'm not a big fan of the trailing underscore special-case, but I also
don't want to touch core MLIR logic, and I think the end result is still
nicer than before.

The tests were mechanically updated with the following command run
inside `clang/test/CIR`, and the same commands can be run to update the
tests for any in-flight patches. (These are for GNU sed; for macOS
change the `-i` to `-i ''`.)

find . -type f | xargs sed -i -E -e
's/ty_22([A-Za-z0-9_$]+[0-9])22/ty_\1_/g' -e
's/ty_22([A-Za-z0-9_$]+)22/ty_\1/g'

clang/test/CIR/CodeGen/stmtexpr-init.c needed an additional minor fix to
swap the expected order of two type aliases in the CIR output.
clang/test/CIR/CodeGen/coro-task.cpp needed some surgery because it was
searching for `22` to find the end of a type alias; I changed it to
search for the actual alias instead.

If you run into merge conflicts after this change, you can auto-resolve
them via

smeenai@715f061

[1]
https://github.com/llvm/llvm-project/blob/b3d2d5039b9b8aa10a86c593387f200b15c02aef/mlir/lib/IR/AsmPrinter.cpp#L2295
[2]
https://github.com/llvm/llvm-project/blob/b3d2d5039b9b8aa10a86c593387f200b15c02aef/mlir/lib/IR/AsmPrinter.cpp#L2763
[3]
https://github.com/llvm/llvm-project/blob/b3d2d5039b9b8aa10a86c593387f200b15c02aef/mlir/lib/IR/AsmPrinter.cpp#L1014
[4]
https://github.com/llvm/llvm-project/blob/b3d2d5039b9b8aa10a86c593387f200b15c02aef/mlir/lib/IR/AsmPrinter.cpp#L1154
This PR fixes the lowering for BrCond. 

Consider the following code snippet: 
```
#include <stdbool.h>

bool test() {
  bool x = false;
  if (x) 
    return x;
  return x;
} 
```
Emitting the CIR to `tmp.cir` using `-fclangir-mem2reg` produces the
following CIR (truncated):
```
!s32i = !cir.int<s, 32>
#fn_attr = #cir<extra({inline = #cir.inline<no>, nothrow = #cir.nothrow, optnone = #cir.optnone})>
module { cir.func no_proto  @test() -> !cir.bool extra(#fn_attr) {
    %0 = cir.const #cir.int<0> : !s32i 
    %1 = cir.cast(int_to_bool, %0 : !s32i), !cir.bool 
    cir.br ^bb1 
  ^bb1:  // pred: ^bb0
    cir.brcond %1 ^bb2, ^bb3 
  ^bb2:  // pred: ^bb1
    cir.return %1 : !cir.bool 
  ^bb3:  // pred: ^bb1
    cir.br ^bb4 
  ^bb4:  // pred: ^bb3
    cir.return %1 : !cir.bool 
  } 
}
```
Lowering the CIR to LLVM using `cir-opt tmp.cir -cir-to-llvm` fails
with:
```
tmp.cir:5:10: error: failed to legalize operation 'llvm.zext' marked as erased
```

The CIR cast `%1 = cir.cast(int_to_bool, %0 : !s32i)` is lowered to a
CIR comparison with zero, which is then lowered to an `LLVM::ICmpOp` and
`LLVM::ZExtOp`.

In the BrCond lowering, the zext is deleted when `zext->use_empty()`,
but during this phase the lowering for the CIR above is not complete
yet, because the zext will still have usage(s) later.

The current check for when the zext is deleted is error-prone and can be
improved.

To fix this, in addition to checking that the use of the zext is empty,
an additional check that the defining operation for the BrCond in the
CIR (the cast operation in this case) is used exactly once is added.
We haven't been able to find the root cause of
llvm#829 just yet, the problem does also not
show up under a ASANified build.

Add some extra information before we crash, hopefully that might shed some
light.
…m#822)

Allow from the clang driver the use of lowering from CIR to MLIR
standard dialect.
Update the test to match the real output when
`-fno-clangir-direct-lowering` is used, or with a combination of both
`-fclangir-direct-lowering` and `-fno-clangir-direct-lowering`.

---------

Co-authored-by: Bruno Cardoso Lopes <[email protected]>
Co-authored-by: Shoaib Meenai <[email protected]>
We can now get the cleanup right for other potential throwing ctors,
still missing LLVM lowering support.
This PR adds a new transformation that transform suitable ternary
operations into select operations.

Currently the "suitable" ternary operations are those ternary operations
whose both branches satisfy either one of the following criteria:

- The branch only contain a single `cir.yield` operation;
- The branch contains a `cir.const` followed by a `cir.yield` that
yields the constant value produced by the `cir.const`.
- ~~The branch contains a `cir.load` followed by a `cir.yield` that
yields the value loaded by the `cir.load`. The load operation cannot be
volatile and must load from an alloca.~~

These criteria are hardcoded now so that simple C/C++ ternary
expressions could be eventually lowered to a `cir.select` operation
instead.
This is permitted by the language, and IRGen emits traps for destructors
other than the base object destructor. Make CIRGen follow suit.
)

The first patch to fix llvm#803 . This PR adds the calling convention
attribute to CallOp directly, which is similar to LLVM, rather than
adding the information to function type, which mimics Clang AST function
type.

The syntax of it in CIR assembly is between the function type and extra
attributes, as follows:

```mlir
%1 = cir.call %fnptr(%a) : (!fnptr, !s32i) -> !s32i cc(spir_kernel) extra(#fn_attr)
```

The verification of direct calls is not included. It will be included in
the next patch extending CIRGen & Lowering.

---

For every builder method of Call Op, an optional parameter `callingConv`
is inserted right before the parameter of extra attribute. However,
apart from the parser / printer, this PR does not introduce any
functional changes.
FlattenCFG will soon get the necessary support for lowering to LLVM,
this is CIRGen only for now.
…`constructAttributeList` (llvm#831)

Similar to llvm#830 , this PR completes the `setCIRFunctionAttributes` part
with the call to `constructAttributeList` method, so that func op and
call op share the logic of handling these kinds of attributes, which is
the design of OG CodeGen.

It also includes other refactors. The function `constructAttributeList`
now use `mlir::NamedAttrList &` rather than immutable attribute
`mlir::DictionaryAttr &` as the inout result parameter, which benefits
the additive merging of attributes.
…ith OG (llvm#830)

Previously the body of `setExtraAttributesForFunc` corresponds to
`SetLLVMFunctionAttributesForDefinition`, but the callsite of it does
not reside at the right position. This PR rename it and adjust the calls
to it following OG CodeGen.

To be specific, `setExtraAttributesForFunc` is called right after the
initialization of `FuncOp`. But in OG CodeGen, the list of attributes is
constructed by several more functions: `SetLLVMFunctionAttributes` and
`SetLLVMFunctionAttributesForDefinition`.

This results in diff in attributes of function declarations, which is
reflected by the changes of test files. Apart from them, there is no
functional change. In other words, the two code path calling
`setCIRFunctionAttributesForDefinition` are tested by existing tests:

* Caller `buildGlobalFunctionDefinition`: tested by
`CIR/CodeGen/function-attrs.cpp`, ...
* Caller `codegenCXXStructor`: tested by
`CIR/CodeGen/delegating-ctor.cpp`, `defined-pure-virtual-func.cpp`, ...
The parser was looking for extra(...) before the return type while the
pretty-printer put it after the return type.
This was breaking the LSP-server for example.
Change the parser behavior accordingly.
…#836)

This PR implements the CIRGen and Lowering part of calling convention
attribute of `cir.call`-like operations. Here we have **4 kinds of
operations**: (direct or indirect) x (`call` or `try_call`).

According to our need and feasibility of constructing a test case, this
PR includes:

* For CIRGen, only direct `call`. Until now, the only extra calling
conventions are SPIR ones, which cannot be set from source code manually
using attributes. Meanwhile, OpenCL C *does not allow* function pointers
or exceptions, therefore the only case remaining is direct call.
* For Lowering, direct and indirect `call`, but not any `try_call`.
Although it's possible to write all 4 kinds of calls with calling
convention in ClangIR assembly, exceptions is quite hard to write and
read. I prefer source-code-level test for it when it's available in the
future. For example, possibly C++ `thiscall` with exceptions.
* Extra: the verification of calling convention consistency for direct
`call` and direct `try_call`.

All unsupported cases are guarded by assertions or MLIR diags.
Consider the following code snippet `test.c`: 

```
int test(int x) {
  static int arr[10] = {0, 1, 0, 0};
  return arr[x];
}
```

When lowering from CIR to LLVM using `bin/clang test.c -Xclang -fclangir
-Xclang -emit-llvm -S -o -` It produces:
```
clangir/mlir/lib/IR/BuiltinAttributes.cpp:1015: static mlir::DenseElementsAttr mlir::DenseElementsAttr::get(mlir::ShapedType, llvm::ArrayRef<llvm::APInt>): Assertion `hasSameElementsOrSplat(type, values)' failed.
```

I traced the bug back to `Lowering/LoweringHelpers.cpp` where we fill
trailing zeros, and I believe this PR does it the right way. I have also
added a very simple test for verification.
The main purpose of this PR is to add support for C/C++ attribute
annotate. The PR involves both CIR generation and Lowering Prepare. In
the rest of this description, we first introduce the concept of
attribute annotate, then talk about expectations of LLVM regarding
annotation, after it, we describe how ClangIR handles it in this PR.
Finally, we list trivial differences between LLVM code generated by
clang codegen and ClangIR codegen.
**The concept of attribute annotate. and expected LLVM IR**
the following is C code example of annotation.
say in example.c
`int *b __attribute__((annotate("withargs", "21", 12 )));
int *a __attribute__((annotate("oneargs", "21", )));
int *c __attribute__((annotate("noargs")));
`
here "withargs" is the annotation string, "21" and 12 are arguments for
this annotation named "withargs". LLVM-based compiler is expected keep
these information and build a global variable capturing all annotations
used in the translation unit when emitting into LLVM IR. This global
variable itself is **not** constant, but will be initialized with
constants that are related to annotation representation, e.g. "withargs"
should be literal string variable in IR.

This global variable has a fixed name "llvm.global.annotations", and its
of array of struct type, and should be initialized with a const array of
const structs, each const struct is a representation of an annotation
site, which has 5-field.
[ptr to global var/func annotated, ptr to translation unit string const,
line_no, annotation_name, ptr to arguments const]
annotation name string and args constants, as well as this global var
should be in section "llvm.metadata".
e.g. In the above example, 
We shall have following in the generated LLVM IR like the following 
```
@b = global ptr null, align 8
@.str = private unnamed_addr constant [9 x i8] c"withargs\00", section "llvm.metadata"
@.str.1 = private unnamed_addr constant [10 x i8] c"example.c\00", section "llvm.metadata"
@.str.2 = private unnamed_addr constant [3 x i8] c"21\00", align 1
@.args = private unnamed_addr constant { ptr, i32 } { ptr @.str.2, i32 12 }, section "llvm.metadata"
@A = global ptr null, align 8
@.str.3 = private unnamed_addr constant [8 x i8] c"oneargs\00", section "llvm.metadata"
@.args.4 = private unnamed_addr constant { ptr } { ptr @.str.2 }, section "llvm.metadata"
@c = global ptr null, align 8
@.str.5 = private unnamed_addr constant [7 x i8] c"noargs\00", section "llvm.metadata"
@llvm.global.annotations = appending global [3 x { ptr, ptr, ptr, i32, ptr }] [{ ptr, ptr, ptr, i32, ptr } { ptr @b, ptr @.str, ptr @.str.1, i32 1, ptr @.args }, { ptr, ptr, ptr, i32, ptr } { ptr @A, ptr @.str.3, ptr @.str.1, i32 2, ptr @.args.4 }, { ptr, ptr, ptr, i32, ptr } { ptr @c, ptr @.str.5, ptr @.str.1, i32 3, ptr null }], section "llvm.metadata"
```
notice that since variable c's annotation has no arg, the last field of
its corresponding annotation entry is a nullptr.

**ClangIR's handling of annotations**
In CIR, we introduce AnnotationAttr to GlobalOp and FuncOp to record its
annotations. That way, we are able to make fast query about annotation
if in future a CIR pass is interested in them.
We leave the work of generating const variables as well as global
annotations' var to LLVM lowering. But at LoweringPrepare we collect all
annotations and create a module attribute "cir.global_annotations" so to
facilitate LLVM lowering.

**Some implementation details and trivial differences between clangir
generated LLVM code and vanilla LLVM code**
1. I suffix names of constants generated for annotation purpose with
".annotation" to avoid redefinition, but clang codegen doesn't do it.
3. clang codegen seems to visit FuncDecls in slightly different orders
than CIR, thus, sometimes the order of elements of the initial value
const array for llvm.global.annotations var is different from clang
generated LLVMIR, it should be trivial, as I don't expect consumer of
this var is assuming a fixed order of collecting annotations.

Otherwise, clang codegen and clangir pretty much generate same LLVM IR
for annotations!
Now that the basic is working, start adding cleanups to be attached to
cir.call's instead. This is necessary in order to tie the pieces
(landing pads and cleanups) more properly, allowing multiple calls
inside cir.try op to be connected with the right cleanup.

This is the first piece of a series, tests coming next.
ghehg and others added 11 commits November 6, 2024 10:35
This should fix NYI like `BI__builtin_char_memchr NYI UNREACHABLE
executed at clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp:1402`
The test is from
[OG](https://github.com/llvm/clangir/blob/3ef67c19917ad26ed8b19d4d13a43458a952fddb/clang/test/CodeGenCXX/builtins.cpp#L64)

see builtin's prototype 
[char *__builtin_char_memchr(const char *haystack, int needle, size_t
size);
](https://clang.llvm.org/docs/LanguageExtensions.html#string-builtins)
fix llvm#1057

---------

Co-authored-by: Bruno Cardoso Lopes <[email protected]>
Co-authored-by: Sirui Mu <[email protected]>
…memory (llvm#1059)

This PR covers one more case for return values of struct type, where
`memcpy` is emitted.
This PR adds LLVMIR lowering support for `cir.assume`,
`cir.assume.aligned`, and `cir.assume.separate_storage`.
In OG CodeGen, string literals has `private` linkage as default (marked
by `cir_private` in CIR assembly). But CIR uses `internal`, which is
probably an ancient typo.

This PR keeps align with it and thus modifies the existing test files.
…bits (llvm#1068)

This PR adds a partial support for so-called indirect function arguments
for struct types with size > 128 bits for aarch64.

#### Couple words about the implementation
The hard part is that it's not one-to-one copy from the original
codegen, but the code is inspired by it of course.
In the original codegen there is no much job is done for the indirect
arguments inside the loop in the `EmitFunctionProlog`, and additional
alloca is added in the end, in the call for `EmitParamDecl` function.

In our case, for the indirect argument (which is a pointer) we replace
the original alloca with a new one, and store the pointer in there. And
replace all the uses of the old alloca with the load from the new one,
i.e. in both cases users works with the pointer to a structure.

Also, I added several missed features in the `constructAttributeList`
for indirect arguments, but didn't preserve the original code structure,
so let me know if I need to do it.
Note that we lack two pieces of support for aliases in LLVM IR dialect
globals: the `alias` keyword and function types `void (ptr)`, this needs
to be done before we can nail this for good, but it's outside the scope
of this commit.

The behavior is slightly different under -O1, which will be addressed next.
…m#1073)

This PR changes the naming format of string literals from `.str1` to
`.str.1`, making it easier to reuse the existing testcases of OG
CodeGen.
@ghehg ghehg marked this pull request as ready for review November 7, 2024 19:40
smeenai and others added 6 commits November 7, 2024 13:35
It's currently polluting the `cir` namespace with very generic symbols
like `Integer` and `Memory`, which is pretty confusing. `X86_64ABIInfo`
already has `Class` alias for `X86ArgClass`, so we can use that alias to
qualify all uses.
Note that there are still missing pieces, which will be incrementally
addressed.
Just verified this is actually done by some LLVM optimization, not by the
frontend emitting directly, so this is a non-goal now, since CIR can also use
LLVM opts to do the same once we have real global alias.
Also verified this does not apply anymore, we match -O0. The only
remaing part is to lower to proper LLVM globals once LLVM IR dialect
gets the global alias support.
@bcardosolopes
Copy link
Member

The question is whether it is worth doing it?

It's always better to unify and/or make target specific things to be mapped as generic (easier to optimizers to latter understand). As the person working on this for some time, what's your take? When using C++ source that leads to CIR::AbsOp and CIR::FAbsOp with vectors against OG, does LLVM output get llvm.aarch64.neon.abs.v4i16 and llvm.aarch64.neon.abs.v8i16 if the vector size matches?

@ghehg
Copy link
Collaborator Author

ghehg commented Nov 10, 2024

The question is whether it is worth doing it?

It's always better to unify and/or make target specific things to be mapped as generic (easier to optimizers to latter
understand). As the person working on this for some time, what's your take?

I'm thinking about the same thing (just using CIR:AbsOp and CIR:FAbsOp and only lower to intrinsics later)

When using C++ source that leads to CIR::AbsOp and CIR::FAbsOp with vectors against OG, does LLVM output get
llvm.aarch64.neon.abs.v4i16 and llvm.aarch64.neon.abs.v8i16 if the vector size matches?

Unfortunately, just did experiment. using [LLVM::AbsOP ], gives use LLVM IR like this
%3 = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %0, i1 false),
not really using neon-specific intrinsics even for triplet "aarch64-none-linux-android24".

But it might be OK, we could just do our own smart TargetLowering at LoweringPrepare and lower vector ty CIR::AbsOp and CIR::FAbsOp to CIR::LLVMIntrinsicCallOp if target supports neon. so we can take advantage of hardware neon features.

@ghehg
Copy link
Collaborator Author

ghehg commented Nov 10, 2024

The question is whether it is worth doing it?

It's always better to unify and/or make target specific things to be mapped as generic (easier to optimizers to latter
understand). As the person working on this for some time, what's your take?

I'm thinking about the same thing (just using CIR:AbsOp and CIR:FAbsOp and only lower to intrinsics later)

When using C++ source that leads to CIR::AbsOp and CIR::FAbsOp with vectors against OG, does LLVM output get
llvm.aarch64.neon.abs.v4i16 and llvm.aarch64.neon.abs.v8i16 if the vector size matches?

Unfortunately, just did experiment. using [LLVM::AbsOP ], gives use LLVM IR like this %3 = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %0, i1 false), not really using neon-specific intrinsics even for triplet "aarch64-none-linux-android24".

But it might be OK, we could just do our own smart TargetLowering at LoweringPrepare and lower vector ty CIR::AbsOp and CIR::FAbsOp to CIR::LLVMIntrinsicCallOp if target supports neon. so we can take advantage of hardware neon features.

Very interesting.
[Traditional clang codegen seems only want to use neon abs in the case of neon abs builtin]
(https://godbolt.org/z/v8Ysxdrd1)
But,
same exact hardware instruction abs v0.4s, v0.4s is generated for both general and neon intrinsics

@ghehg
Copy link
Collaborator Author

ghehg commented Nov 11, 2024

My current plan is to extend AbsOp to take vector type. PR
Next, extend FAbsOp along with other FpUnaryOps to support vector type.
Then we visit this PR to decide whether we just use AbsOp and FAbsOp and lower them to generic llvm.abs/llvm.fabs which would give us a chance to optimize them in LLVM passes. Or either at code gen or lowering statge to lower them to llvm.aarch64.neon.abs, which would not be optimized away in generated assembly/machine code under -O0.

@bcardosolopes
Copy link
Member

But, same exact hardware instruction abs v0.4s, v0.4s is generated for both general and neon intrinsics

Nice - as long as the final ASM is the same feels like we're good to map all those things with the same higher level CIR operation.

Then we visit this PR to decide whether we just use AbsOp and FAbsOp

Sounds good, thanks

@ghehg
Copy link
Collaborator Author

ghehg commented Nov 22, 2024

back to draft, waiting for other PRs to land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.