-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mlx5: fix a use-after-free error in mlx5_next_poll #1526
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1070,6 +1070,8 @@ static inline void _mlx5_end_poll(struct ibv_cq_ex *ibcq, | |
if (lock) | ||
mlx5_spin_unlock(&cq->lock); | ||
|
||
cq->cur_rsc = NULL; | ||
|
||
if (stall) { | ||
if (stall == POLLING_MODE_STALL_ADAPTIVE) { | ||
if (!(cq->flags & MLX5_CQ_FLAGS_FOUND_CQES)) { | ||
|
@@ -1822,6 +1824,14 @@ void __mlx5_cq_clean(struct mlx5_cq *cq, uint32_t rsn, struct mlx5_srq *srq) | |
if (!cq || cq->flags & MLX5_CQ_FLAGS_DV_OWNED) | ||
return; | ||
|
||
/* | ||
* Reset the cq->cur_rsc if it is associated with the QP to be | ||
* destroyed in order to prevent use-after-free errors in the | ||
* next ibv_next_poll(). | ||
*/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the scenario that you are referring to here ? a CQ which servers more than a single QP ? In addition, if the 'lock' mode was used, this code will run only after mlx5_end_poll(), so the next mlx5_start_poll() should do the work by setting the pointer to NULL upon its entrance. Note: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah in this case multiple QPs are attached to the same CQ. ibv_start_poll(); // get a work completion associated with the QP A
ibv_destroy_qp(); // destroy QP A here since we get a completion error
ibv_next_poll(); // try to get the next work completion for other QPs from the same CQ. UAF error is triggered here
Do you mean that destroying a QP between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's permitted but doesn't look like a good practice, a few notes below. You are referring only to a single-threaded application, correct? In a multi-threaded application, the call to ibv_destroy_qp() will remain blocked until ibv_end_poll() is invoked, ensuring that ibv_next_poll() is safe to use. Additionally, we are only discussing a scenario where the CQ serves multiple QPs. Otherwise, it would not make sense to destroy a QP and continue polling its CQ, as this would clearly indicate incorrect application behavior. The commit log should be rephrased to clarify the exact use case that we are talking about. So, in the specific scenario that you are talking about, it can make sense to set the NULL upon __mlx5_cq_clean() while narrowing the comment near this line to be more specific as was mentioned above. In any case, no need for a change in mlx5_end_poll() as I already mentioned. |
||
if (unlikely(cq->cur_rsc && rsn == cq->cur_rsc->rsn)) | ||
cq->cur_rsc = NULL; | ||
|
||
/* | ||
* First we need to find the current producer index, so we | ||
* know where to start cleaning from. It doesn't matter if HW | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have that code (i.e. cq->cur_rsc = NULL) as part of mlx5_start_poll(), so it seems redundant.
Note:
In any case, it had to be done before line 1070 to be done under the lock, if the lock was applicable.