Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[README]: Contributing new modules #1136

Open
HofmeisterAn opened this issue Mar 4, 2024 · 11 comments
Open

[README]: Contributing new modules #1136

HofmeisterAn opened this issue Mar 4, 2024 · 11 comments
Labels
chore A change that doesn't impact the existing functionality, e.g. internal refactorings or cleanups module An official Testcontainers module

Comments

@HofmeisterAn
Copy link
Collaborator

Problem

Hi 👋, in the past few weeks, I've encountered several issues with the GitHub-hosted runners. Due to the increasing number of modules, pulling all the images occupies too much disk space. I've implemented a temporary fix to free up some disk space, but this won't be enough as we add more new modules.

Until we find a suitable fix and implement a mechanism to handle the growing number of modules and image sizes, I prefer not to merge pull requests that involve very simple module configurations. Which can be accomplished with just a few lines using the generic container builder API. If you believe the module is valuable for the community, please consider creating an issue first and discussing it there to avoid unnecessary work on creating a pull request. OC, I will review and try to merge the remaining and outstanding module PRs.

Solution

-

Benefit

-

Alternatives

-

Would you like to help contributing this enhancement?

Yes

@HofmeisterAn HofmeisterAn added module An official Testcontainers module chore A change that doesn't impact the existing functionality, e.g. internal refactorings or cleanups labels Mar 4, 2024
@HofmeisterAn HofmeisterAn pinned this issue Mar 4, 2024
@mrudat
Copy link

mrudat commented Apr 26, 2024

Would it be worthwhile to refactor modules so that each image or class of images (e.g. Web/Database/... Servers that might want to share a common interface) has its own repository?

That would mean that exactly one docker image should be created for each image-specific repository, presuming that the same configuration used from multiple implementations in different languages results in an identical container being tested.

We could also describe the image-specific configuration in a common configuration format and use that to generate the language-specific code.

For example, username/password/... is often configured via environment variables, and the specific environment variable names are specific to individual images. However, the code for capturing the username/password is shared between multiple images across multiple languages, which suggests we have too much nearly identical custom-maintained code and are waiting for a refactoring pass to extract the common functionality.

@eddumelendez
Copy link
Member

Hi, I think this can help to free some more space for linux workers https://github.com/jlumbroso/free-disk-space

@HofmeisterAn
Copy link
Collaborator Author

Hi Eddú, we are doing this already:

# Our modules occupy too much disk space. The GitHub-hosted runners ran into the
# error: "no space left on device." The pulled images are not cleaned up between
# the test runs. One obvious approach is splitting the tests and running them on
# multiple runners. However, we need to keep in mind that running too many
# simultaneous builds has an impact on others as well. We observed that scheduled
# Dependabot builds blocked others in the Testcontainers organization.
- name: Free Disk Space
uses: jlumbroso/[email protected]
if: runner.os == 'Linux'
with:
tool-cache: true
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: true
swap-storage: false

@kiview
Copy link
Member

kiview commented May 23, 2024

Would it be worthwhile to refactor modules so that each image or class of images (e.g. Web/Database/... Servers that might want to share a common interface) has its own repository?

From past experiences with tc-java, this leads to a lot of operational overhead (e.g. releases), unless we have good automation in place.

Splitting in parallelizing the CI workflow on a per-module or module-sets level can help and it is what we essentially do in tc-java as well.

@kevin0x90
Copy link

As recommended by @kiview I took a look at the way test containers Java is doing the parallel execution and would like to support in this regard whenever I got some time to work on it.

@HofmeisterAn what do you think about doing the parallel execution analogous to the Java project?

@HofmeisterAn
Copy link
Collaborator Author

As recommended by @kiview I took a look at the way test containers Java is doing the parallel execution and would like to support in this regard whenever I got some time to work on it.

@HofmeisterAn what do you think about doing the parallel execution analogous to the Java project?

Definitely something we should try out 👍. We already have the implementation to merge coverage results from different runners (for Linux and Windows), so we probably just need to group (by core-linux, core-windows, modules), split and run the test projects accordingly. I'm not sure if we need to take a look at GitHub's usage limits. IIRC, we already had an issue with Dependabot once, where we created too many runners in a short period of time, which caused issues for other Testcontainers repos because they got degraded and couldn’t create new runners for a while.

Maybe we can also reuse the GitHub composite actions (.github/workflows/actions/tests/action.yml) from this branch.

@HofmeisterAn
Copy link
Collaborator Author

HofmeisterAn commented Nov 9, 2024

I began looking into this issue and prioritizing it to address the remaining module PRs afterward. So far, my initial tests look quite promising (diff). The max-parallel configuration is helpful for adjusting the maximum number of runners to prevent degradation for the Testcontainers organization (maybe the configuration does not help here). There are a few more improvements we can make, such as building the entire project just once, but for now, I think this is already a good step forward and doesn't require too many changes.

@HofmeisterAn
Copy link
Collaborator Author

The disadvantage is that we need to maintain a list of test projects in the GitHub workflow. Ideally, we can create this list somehow during runtime. At the very least, we need to ensure that no one forgets to add the test project here.

@HofmeisterAn
Copy link
Collaborator Author

I was thinking about using the following PowerShell command/script to determine which runner belongs to which project. Instead of reading the file content, we can simply use its name. If the file does not exist, we can either throw an error or fall back to a default runner.

Get-ChildItem -Path 'tests' -Directory | % { @{ "name" = $_.Name; "runs-on" = [string](Get-Content -LiteralPath (Join-Path -Path $_.FullName -ChildPath ".runs-on") -ErrorAction SilentlyContinue) } }

The command create a JSON similar to:

[
    {
        "name":  "Testcontainers.ActiveMq.Tests",
        "runs-on":  "ubuntu-22.04"
    },
    {
        "name":  "Testcontainers.ArangoDb.Tests",
        "runs-on":  "ubuntu-22.04"
    },
    {
        "name":  "Testcontainers.Azurite.Tests",
        "runs-on":  null
    }
]

@0xced
Copy link
Contributor

0xced commented Nov 23, 2024

The disadvantage is that we need to maintain a list of test projects in the GitHub workflow. Ideally, we can create this list somehow during runtime. At the very least, we need to ensure that no one forgets to add the test project here.

I'm going to submit a pull request for this really soon. 😉

Edit: done in #1305.

@HofmeisterAn
Copy link
Collaborator Author

I put together a PowerShell script/command to generate the JSON (strategy) during build time. But it looks like that using the output of a job as an input strategy isn't as straightforward as I thought. It seems to require some hacks to make it work.

# Retrieves the GH workflow 'runs-on' configuration for each test project.
# During runtime, the strategy is passed to the CI job, allowing test projects to
# run parallel on individual runners.
$testProjects = Get-ChildItem -Path 'tests' -Directory `
    | % { $testProject = $_.Name; Join-Path -Path $_.FullName -ChildPath '.runs-on' } `
    | % { If (Test-Path -LiteralPath $_) { Get-Content -LiteralPath $_ } Else { $Null } } `
    | % { [PSCustomObject]@{ 'project' = $testProject; 'runs-on' = [string]$_ } }

# Checks if any test project does not contain a valid '.runs-on' configuration.
# If a project is missing this configuration, an error is thrown to prevent
# developers from forgetting to add the configuration.
$runsOnNotFound = $testProjects `
    | Where-Object 'runs-on' -Eq '' `
    | Select-Object -ExpandProperty 'project'

If ($runsOnNotFound)
{
    Write-Error "Please add a '.runs-on' configuration file to the test project:`n  $($runsOnNotFound -Join "`n  ")"
    Exit 1
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore A change that doesn't impact the existing functionality, e.g. internal refactorings or cleanups module An official Testcontainers module
Projects
None yet
Development

No branches or pull requests

6 participants