Send email if reproducible built fails in the CI #7897

ShahanaFarooqui · 2024-12-03T04:03:30Z

Changelog-None.

ShahanaFarooqui · 2024-12-03T04:17:48Z

Hi @s373nZ, Please review this PR which adds the functionality to send an email notification if the CI fails during any reproducible build step.

I have also updated the folder location for this script from /release to /repro just to avoid confusion with the release.yml/build-release.sh scripts.

I am also considering merging repro.yml with release.yml in the future, because repro.yml serves as a pre-stage for release.yml. I would appreciate your thoughts on that too.

s373nZ

I generally receive an automatic email directly from Github whenever any CI I triggered fails, so I was a little curious around the circumstances regarding the requirement to send emails and did some digging. My guess is that the team is:

drowning in CI failure emails and most of the notifications are ignored or filtered
unclear who is receiving the notifications for scheduled workflows like the nightly repro builds

I found this documentation on workflow runs which states:

Notifications for scheduled workflows are sent to the user who initially created the workflow. If a different user updates the cron syntax in the workflow file, subsequent notifications will be sent to that user instead. If a scheduled workflow is disabled and then re-enabled, notifications will be sent to the user who re-enabled the workflow rather than the user who last modified the cron syntax.

Disabling and re-enabling the scheduled job (if you have permissions) or committing to modify the cron syntax could shift the automatic notifications to you. These seem a little flaky, so the requirement to have a solid solution captured in code is understandable, and for others in the community as well. Maybe Slack notifications could be an interesting alternative?

That said, your current approach looks pretty good to me. I would consider trying to consolidate the three different action-send-email steps into one by making it a completely separate job, something like:

jobs:
  ubuntu: [...]

  failure-notify:
    needs: ubuntu
    if: failure()
    steps:
      uses: dawidd6/action-send-mail@v3
      ...

It might not work, esp. with the matrix build, but it could be worth a shot. Inspiration here.

I have also updated the folder location for this script from /release to /repro just to avoid confusion with the release.yml/build-release.sh scripts.

Good idea! My first thought was to suggest changing cl-repro to cl-release in release.yml as well, but we have a logical dependency in build-release.sh.

I am also considering merging repro.yml with release.yml in the future, because repro.yml serves as a pre-stage for release.yml. I would appreciate your thoughts on that too.

We could try to reuse the repro.yml steps using a reusable workflow or a composite action. I considered trying to do this at the outset of the release automation work, but I think it belongs in a separate PR.

Unless you are suggesting to do away with the nightly builds in favor of detecting dirty builds only during the release process. I think there is value in early nightly detection so the release captain isn't tasked with too much triage at the last minute, but merging the two workflows is reasonable and should be possible.

Overall, LGTM pending your feedback re: the email step consolidation and the SMTP config.

.github/workflows/repro.yml

ShahanaFarooqui · 2024-12-03T21:51:53Z

@s373nZ

That said, your current approach looks pretty solid to me.

Yes, the goal is to send a customised email so that it stands out from the others and is not overlooked.

I would suggest consolidating the three different action-send-email steps into one, by making it a separate job.

Thanks for pushing me to avoid being lazy 😄! I was not happy about repeating the step, but wanted to capture the details of the failed step as well. It took a little time, but the email and workflow are much cleaner now. I ended up merging them into a single step at the end.

We could try reusing the repro.yml steps by utilizing a [reusable workflow or a composite action].

I would prefer to keep everything in one workflow. I plan to run the repro step on a scheduled basis, while the other steps (including the repro) will execute when a tag is pushed.

s373nZ · 2024-12-03T21:56:16Z

@ShahanaFarooqui It just occurred to me that this line (in all the Ubuntu Dockerfiles) might cause a problem with changing the folder location to /repro:

lightning/contrib/reprobuild/Dockerfile.noble

Line 74 in 2791c60

&& cp *.xz /repo/release/

Since the files are reused in both the repro build process and the release process, it might cause an error.

ShahanaFarooqui · 2024-12-03T22:27:56Z

It just occurred to me that this line (in all the Ubuntu Dockerfiles) might cause a problem with changing the folder location to /repro:

Should this be an issue, considering that the Dockerfile.noble is used exclusively to build the cl-repro-noble image, and the next step only uses this newly created image? Isn't the repro folder mainly responsible for creating the version.txt and git.log files and changing user permissions?

s373nZ · 2024-12-03T23:13:18Z

Should this be an issue, considering that the Dockerfile.noble is used exclusively to build the cl-repro-noble image, and the next step only uses this newly created image? Isn't the repro folder mainly responsible for creating the version.txt and git.log files and changing user permissions?

That CMD statement in the Dockerfile says that is the default command that is executed when you docker run the built image, so it is getting run to actually build the release here:

lightning/.github/workflows/repro.yml

Line 40 in 2791c60

    
                     docker run --name cl-build -v $GITHUB_WORKSPACE:/repo -e FORCE_MTIME=$(date +%F) -t cl-repro-${{ matrix.version }}

That line is copying the repro build archive to the ./release directory, and the rest of the repro.yml steps look there to parse the filename here:

lightning/.github/workflows/repro.yml

Line 62 in 2791c60

releasefile=$(ls release/clightning-*)

Also, IIRC I needed to create the ./release directory in the CI because it doesn't exist after a fresh checkout.

One initial idea to get around this might be to try adding an ARG to the Dockerfile which defaults to release but pass in repro in this case. Probably the thing to do is run the action in a test branch set to trigger on.branches.<test-branch-name> just to be sure.

Hope this makes sense. LMK if it doesn't, or I'm missing something. I can help or chime back in tomorrow.

ShahanaFarooqui · 2024-12-04T00:53:31Z

The action is still successfully completing with the /repro directory. However, I have reverted the change and switched back to using the /release folder for now. This will not be a concern once we merge both workflows anyways.

I also added the step Upload release artifact for easier debugging.

s373nZ · 2024-12-04T11:02:07Z

@ShahanaFarooqui I'm curious how it completed successfully w/o the release directory, but I can't see the build output from the run in the CI logs anymore. By grouping the commands and redirecting the output to log files, we would be sacrificing centralized observability in the Github Actions CI interface (for both successes and failures) in order to gain more context in the error email.

IMHO, I think the appropriate solution is to leave task commands as they were previously so we still get the log output in the UI, and just have a simpler email with less context that reports there was an error and provides a link to the CI output. The recipients can view the log output from Github actions in the case of an error, and others who are not on the DISTRIBUTION_LIST can also see the workflow output in cases of both success or failure. What do you think?

.github/workflows/repro.yml

Changelog-None.

ShahanaFarooqui · 2024-12-04T20:43:07Z

IMHO, I think the appropriate solution is to leave task commands as they were previously so we still get the log output in the UI, and just have a simpler email with less context that reports there was an error and provides a link to the CI output. The recipients can view the log output from Github actions in the case of an error, and others who are not on the DISTRIBUTION_LIST can also see the workflow output in cases of both success or failure. What do you think?

Agree, I removed the error capturing code as it was causing more complexity than it was worth. I also tried using the tee command to write to both standard output and the file, but error handling at each step still added unnecessary complexity. It is better to rely on the receiver to check the details directly in the action itself.

Posting the tee error handling for future reference though:

sudo tar -C ${{ matrix.version }} -c . | docker import - ${{ matrix.version }} 2>&1 | tee command.log || exit_code=$?
if [ -n "$exit_code" ]; then
  echo "ERROR<<EOF" >> "$GITHUB_ENV"
  echo "$(cat command.log)" >> "$GITHUB_ENV"
  echo "EOF" >> "$GITHUB_ENV"
  exit 1
fi

Logs: <pre>${{ env.ERROR }}</pre><br/>

s373nZ

Very nice! LGTM :)

ACK f28587b

s373nZ · 2024-12-04T22:11:55Z

One final observation - it looks like if all three repro builds failed, then three separate emails would be sent because the failure email step is a part of the matrix job, right? You could try to consolidate it down into one email by making that step a separate job as per #7897 (review) (using needs: ubuntu) but it might not be straightforward to add the failing step name directly into the email body.

If maybe having multiple emails per run failure is fine for you, the current code still LGTM 👍

ShahanaFarooqui · 2024-12-04T22:28:16Z

You could try to consolidate it down into one email by making that step a separate job

For now, I would prefer to keep them separate since I am the only one on the distribution list :D. I will consider merging them if their frequency seems unnecessarily high.

but it might not be straightforward to add the failing step name directly into the email body

Capturing the failing step name should not be a big issue, as we can set it as output instead.

ShahanaFarooqui added this to the v25.02 milestone Dec 3, 2024

ShahanaFarooqui requested a review from cdecker December 3, 2024 04:03

s373nZ reviewed Dec 3, 2024

View reviewed changes

.github/workflows/repro.yml Outdated Show resolved Hide resolved

ShahanaFarooqui force-pushed the repro-step-failed-email branch 2 times, most recently from 2c84244 to d1ea50e Compare December 3, 2024 21:40

ShahanaFarooqui force-pushed the repro-step-failed-email branch 4 times, most recently from cc4fbaf to 507008f Compare December 4, 2024 00:51

ShahanaFarooqui force-pushed the repro-step-failed-email branch from 507008f to ef485cf Compare December 4, 2024 01:18

s373nZ reviewed Dec 4, 2024

View reviewed changes

.github/workflows/repro.yml Outdated Show resolved Hide resolved

ci: Send email if the reproducible build process fails

f28587b

Changelog-None.

ShahanaFarooqui force-pushed the repro-step-failed-email branch from ef485cf to f28587b Compare December 4, 2024 20:25

s373nZ approved these changes Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send email if reproducible built fails in the CI #7897

Send email if reproducible built fails in the CI #7897

ShahanaFarooqui commented Dec 3, 2024

ShahanaFarooqui commented Dec 3, 2024 •

edited

Loading

s373nZ left a comment

ShahanaFarooqui commented Dec 3, 2024

s373nZ commented Dec 3, 2024

ShahanaFarooqui commented Dec 3, 2024 •

edited

Loading

s373nZ commented Dec 3, 2024

ShahanaFarooqui commented Dec 4, 2024 •

edited

Loading

s373nZ commented Dec 4, 2024

ShahanaFarooqui commented Dec 4, 2024

s373nZ left a comment

s373nZ commented Dec 4, 2024

ShahanaFarooqui commented Dec 4, 2024

Send email if reproducible built fails in the CI #7897

Are you sure you want to change the base?

Send email if reproducible built fails in the CI #7897

Conversation

ShahanaFarooqui commented Dec 3, 2024

ShahanaFarooqui commented Dec 3, 2024 • edited Loading

s373nZ left a comment

Choose a reason for hiding this comment

ShahanaFarooqui commented Dec 3, 2024

s373nZ commented Dec 3, 2024

ShahanaFarooqui commented Dec 3, 2024 • edited Loading

s373nZ commented Dec 3, 2024

ShahanaFarooqui commented Dec 4, 2024 • edited Loading

s373nZ commented Dec 4, 2024

ShahanaFarooqui commented Dec 4, 2024

s373nZ left a comment

Choose a reason for hiding this comment

s373nZ commented Dec 4, 2024

ShahanaFarooqui commented Dec 4, 2024

ShahanaFarooqui commented Dec 3, 2024 •

edited

Loading

ShahanaFarooqui commented Dec 3, 2024 •

edited

Loading

ShahanaFarooqui commented Dec 4, 2024 •

edited

Loading