-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sql failed when using replace function #9524
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: “EricZequan” <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @breezewish |
@@ -1063,6 +1063,166 @@ struct ReplaceStringImpl | |||
} | |||
} | |||
|
|||
// Handle the case where `column_src` and `replace` are const | |||
static void vectorConstReplacement( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we modify existing functions instead of introducing new ones? Existing functions are already capable of handling a list of sources. It should be better to allow it handling constant source (i.e. one source), minimizing the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these functions are necessary because the first parameter of existing functions is Column
by default, which will be parsed into ColumnString
in FunctionsStringSearch.cpp
, and our implementation needs to parse the parameter into ColumnConst
. ColumnConst
does not have getChars
and getOffsets
methods, so existing functions cannot be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense. Think about it. A function can now calculate a bunch of FN(A, B), why cannot calculate a single row of FN(ConstA, ConstB)? What we want to support is a subset of the current capability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I found a way to convert between the two types, so that now the existing function can be used directly, and the amount of code is greatly reduced.
if (!needle_const && replacement_const) | ||
{ | ||
executeImplConstReplacement( | ||
column_src, | ||
column_needle, | ||
column_replacement, | ||
pos, | ||
occ, | ||
match_type, | ||
column_result | ||
); | ||
}else if (!needle_const && !replacement_const) | ||
{ | ||
executeImplConstFirstParaReplacement( | ||
column_src, | ||
column_needle, | ||
column_replacement, | ||
pos, | ||
occ, | ||
match_type, | ||
column_result | ||
); | ||
}else | ||
{ | ||
throw Exception( | ||
"UnImplement function.", | ||
ErrorCodes::BAD_ARGUMENTS); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly no need for a lot of impls because we are fixing for an edge case that only discovered in tests. It is acceptable to be not performance optimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my test case, some sql like :
explain SELECT /*+ read_from_storage(tiflash[my_table]) */ REPLACE('Hello World', my_table.col_11, my_table.col_14) FROM my_table;
will get same error. I think it may need to process, so I add these function to deal with such problems. 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I mean we could use a slower but more general impl for such cases.
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
/retest |
1 similar comment
/retest |
Signed-off-by: “EricZequan” <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add a const data version of vectorNonConstNeedle
, vectorNonConstReplacement
, and vectorNonConstNeedleReplacement
.
Signed-off-by: “EricZequan” <[email protected]>
The processing logic of the by the way, I have tried this method in the |
const auto & const_data = col_const->getDataColumn(); | ||
const auto * col = typeid_cast<const ColumnString *>(&const_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's wrong. The length of this column is just 1. You will find some errors if you correctly add a test for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. Now I add vectorConstSrcAndReplace
、vectorConstSrcAndNeedle
、vectorConstSrc
to handle the corresponding case to ensure the same number of rows.
Signed-off-by: “EricZequan” <[email protected]>
const ColumnString::Chars_t & data, | ||
const ColumnString::Offsets & offsets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the data is const, how about using const std::string &
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. PTAL~
Signed-off-by: “EricZequan” <[email protected]>
I'm still confused by why do we need almost 300 lines in order to make something work for a subset? It is wired. |
Signed-off-by: “EricZequan” <[email protected]>
I understand what you mean. Constant can actually be seen as a cloumn with only one line. It is reasonable to reuse the original function as a subset. |
Signed-off-by: “EricZequan” <[email protected]>
toVec({"Good Night", "Bad Afternoon", "Good Afterwhile"}), | ||
executeFunction( | ||
"replaceAll", | ||
toVec({"Good Afternoon"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be toConst
? I suppose only const values could have different lengths (1) compared as other columns. And BTW this is exactly the case what we want to make it run correctly.
toVec({"Good Night", "Good Bad", "Good while"}), | ||
executeFunction( | ||
"replaceAll", | ||
toVec({"Good Afternoon"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be toConst
?
toVec({"Good Night", "Night Afternoon", "Good AfterNight"}), | ||
executeFunction( | ||
"replaceAll", | ||
toVec({"Good Afternoon"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be toConst
?
@EricZequan: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
auto data_col = ColumnString::create(); | ||
data_col->insert(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do this, this greatly reduce the performance.
What I mean isn't to say that separate functions should be used, you can use templates to reduce repetitive code, and of course, also maintain good performance. |
I created a new PR here: #9536 |
What problem does this PR solve?
Issue Number: ref #9522
Problem Summary:
What is changed and how it works?
Tiflash only supports the replace function with the first parameter being of ColumnString type.
So I added type judgment to
col_src
in the replace execution function and converted col_src into theColumnString
type that can be parsed correctly.At the same time, I added the following functions to handle the corresponding Const examples:
vectorConstSrcAndReplace
- handlereplace(Const, Column, Const)
vectorConstSrcAndNeedle
- handlereplace(Const, Const, Column)
vectorConstSrc
- handlereplace(Const, Column, Column)
The following are the test results of the relevant sql:
other case:
Check List
Tests
Side effects
Documentation
Release note