Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor sync #3312

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open

Refactor sync #3312

wants to merge 45 commits into from

Conversation

eemeli
Copy link
Member

@eemeli eemeli commented Sep 2, 2024

Closes #2023 -- entity- and translation-level changes are logged in ActionLog
Fixes #2057
Fixes #2068
Fixes #2078
Closes #2083 -- source changes are now sync'd to targets eagerly
Fixes #2087 -- git file moves are caught, but not copies
Closes #2110 -- the remaining sync tests are largely replaced here
Closes #2129 -- refactoring the sync changes the performance characteristics completely
Fixes #2169
Fixes #2175
Fixes #2189
Fixes #2211
Fixes #2242
Closes #2285 -- not relevant after the refactor
Fixes #2641
Fixes #3302
Fixes #3449

This is effectively a rewrite of the sync_project() function that's currently here, and which ends up calling most of the code under pontoon/sync/.

The end results of the code here should be the same as currently, but the implementation is completely new, and does things in a different order.

Per-locale repositories are dropped here, as per #3303.

Explicitly left out of this PR but liable to change later:

  • What sync logging information is persisted in the database.
  • How files are read and written.
  • How localizable messages are represented in the database.
  • How aggregated stats are gathered.

Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very high-level look, but this already looks very promising!

Currently, Pretranslation is tightly integrated with sync in order to minimize time between exposing new strings for localization and pretranslating them. What's your plan with this?

pontoon/base/models/changed_entity_locale.py Show resolved Hide resolved
pontoon/base/models/project.py Show resolved Hide resolved
paths = get_paths(project, checkouts)
except Exception as e:
log.error(f"{log_prefix} {e}")
project_sync_log.skip()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: when we'll be refactoring the sync log, we should make a distinction between skip() (no changes needed during sync) and fail() (sync failed due to an error).

pontoon/sync/sync_project.py Outdated Show resolved Hide resolved
if db_repo.last_synced_revisions is None:
self.prev_commit = None
else:
pc = db_repo.last_synced_revisions.get("single_locale", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should drop the single_locale but if we're resolving #3303.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. But let's do it as a follow-up; I would much prefer being able to do this refactor without any db changes, so that it's easier to roll back if necessary.

pontoon/sync/sync_entities_from_repo.py Outdated Show resolved Hide resolved
pontoon/sync/sync_entities_from_repo.py Outdated Show resolved Hide resolved
removed_source_paths: set[str],
now: datetime,
) -> None:
lc_readonly = set(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: There are quite a few prefixes and acronyms used in this patch (lc_, mod_, tx etc.), that are not immediately understandable like some of the more established ones (pk, db, etc.).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. lc is short for "locale code", mod for "modified", and tx for "translation", but those are by no means obvious.

For local variables, I have a strong preference for keeping their names short, and relying on the reader to be able to figure out the meaning from where the values come from. Would documenting these somewhere be a sufficient improvement, or do you think that some or all of these are just too obscure? Or are there alternative shortenings that would be better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me even single-letter variables are acceptable in cases like this. I'd find l and teasier to understand than lc and tx.

I don't think we should document that.

@eemeli
Copy link
Member Author

eemeli commented Sep 4, 2024

Currently, Pretranslation is tightly integrated with sync in order to minimize time between exposing new strings for localization and pretranslating them. What's your plan with this?

Ah, I'd missed that! Yeah, that needs to happen the same as before.

requirements/default.txt Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 90.24552% with 147 lines in your changes missing coverage. Please review.

Project coverage is 79.92%. Comparing base (24f37e5) to head (cebc5b2).
Report is 4 commits behind head on main.

Additional details and impacted files

@eemeli
Copy link
Member Author

eemeli commented Oct 1, 2024

This is now at the dangerous stage of looking like it works. But some verification work still remains:

  • Add tests for the fixtures used by the old sync tests.
  • Check the sync results against the active pontoon.mozilla.org projects.

Comment on lines +745 to +746
# FIXME: zip downloads should only be for projects with 2..10 resources
from pontoon.sync.utils import download_translations_zip
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a compromise reached after discussion with @mathjazz elsewhere. We'll be refactoring this code again as a part of the subsequent data model refactor, at which point we'll be able to introduce download support that won't require us to re-fetch any files.

Comment on lines +83 to +85
if project.pretranslation_enabled and changed_paths:
# Pretranslate changed and added resources for all locales
pretranslate(project, changed_paths)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new sync is retaining slightly different information than before, so it makes more sense to trigger the pretranslation for all entities in changed resources. The pretranslation will then filter out the entities and locales for which pretranslation is not needed or wanted.

@eemeli
Copy link
Member Author

eemeli commented Oct 15, 2024

I can now download files from AMO Linter, but downloading from Firefox still fails on stage (it works locally).

From my own experimentation, it looks like there's at least some timing issue due to the git clone that's now needed. Retrying got amo-frontend download to work for me.

Locally I couldn't make upload work, though. I downloaded the brand.ftl file, deleted a string in Pontoon, uploaded the file, and the deleted string didn't get imported.

Does this work even now? As in, does uploading a resource with a string removed remove it from Pontoon?

@mathjazz
Copy link
Collaborator

Does this work even now? As in, does uploading a resource with a string removed remove it from Pontoon?

I uploaded a resource which had a translation that didn't exist in Pontoon.

@mathjazz
Copy link
Collaborator

From my own experimentation, it looks like there's at least some timing issue due to the git clone that's now needed. Retrying got amo-frontend download to work for me.

Can we use git clone --depth 1 for downloading files?

@mathjazz
Copy link
Collaborator

We should update the README file.

@flodolo

This comment was marked as resolved.

@eemeli
Copy link
Member Author

eemeli commented Oct 16, 2024

The stats update should now be much more robust, and a bit simpler -- albeit written in SQL, as I couldn't figure out the corresponding Django invocations. In particular the COUNT(*) FILTER (WHERE ...) values are probably not even possible.

This also guarantees that the work is entirely done within the database, where even for the most complex projects it should take at most a few seconds even though the update is applied to the entire current project.

A "messy local setup" as mentioned above should no longer be able to produce the error that @flodolo encountered.

As a minor regression, gettext plurals are now counted as separate entries. This means that when translating from English to a locale with more plural categories may result in an approved_strings count greater than total_strings. This will get fixed by my subsequent work, which will change gettext plurals are stored in the database.

@flodolo
Copy link
Collaborator

flodolo commented Oct 16, 2024

A "messy local setup" as mentioned above should no longer be able to produce the error that @flodolo encountered.

Confirming that this worked locally 👍🏼

@flodolo
Copy link
Collaborator

flodolo commented Oct 16, 2024

Confirming that this worked locally 👍🏼

Correction: the error now shows up when you update any string (e.g. reject one), e.g.

2024-10-16 15:07:05 server-1      | django.db.utils.IntegrityError: new row for relation "base_locale" violates check constraint "base_locale_approved_strings_check"
2024-10-16 15:07:05 server-1      | DETAIL:  Failing row contains (50, 0, -1, 0, 0, 0, 1, it, it, it, it-it, , Italian, (n != 1), 1,5, Latin, ltr, 69025000, , 25173, 100, 99, , , , t).

@mathjazz
Copy link
Collaborator

mathjazz commented Oct 16, 2024

The stats update should now be much more robust, and a bit simpler -- albeit written in SQL, as I couldn't figure out the corresponding Django invocations. In particular the COUNT(*) FILTER (WHERE ...) values are probably not even possible.

As a minor regression, gettext plurals are now counted as separate entries. This means that when translating from English to a locale with more plural categories may result in an approved_strings count greater than total_strings. This will get fixed by my subsequent work, which will change gettext plurals are stored in the database.

Note that calculate_stats() is called from many places (including the translate view) and that approved_strings greater than total_strings might cause our charts (in dashboards as well as in translate view) to break (I didn't actually test). All that means we shouldn't just thoroughly test the sync module, but the entire app.

@flodolo

This comment was marked as resolved.

@flodolo
Copy link
Collaborator

flodolo commented Oct 30, 2024

Also for the same new file (not committed to the repo). Trying to download translations will crash Pontoon, because the file is not in the repo.

2024-10-30 07:56:43 [ERROR:django.request] 2024-10-30 06:56:43,698 Internal Server Error: /translations/
2024-10-30 07:56:43 Traceback (most recent call last):
2024-10-30 07:56:43   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
2024-10-30 07:56:43     response = get_response(request)
2024-10-30 07:56:43                ^^^^^^^^^^^^^^^^^^^^^
2024-10-30 07:56:43   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
2024-10-30 07:56:43     response = wrapped_callback(request, *callback_args, **callback_kwargs)
2024-10-30 07:56:43                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-10-30 07:56:43   File "/usr/local/lib/python3.11/contextlib.py", line 81, in inner
2024-10-30 07:56:43     return func(*args, **kwds)
2024-10-30 07:56:43            ^^^^^^^^^^^^^^^^^^^
2024-10-30 07:56:43   File "/app/pontoon/base/views.py", line 768, in download_translations
2024-10-30 07:56:43     content, filename = download_translations_zip(project, locale)
2024-10-30 07:56:43                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-10-30 07:56:43   File "/app/pontoon/sync/utils.py", line 31, in download_translations_zip
2024-10-30 07:56:43     zipfile.write(lc_path, basename(tgt_path))
2024-10-30 07:56:43   File "/usr/local/lib/python3.11/zipfile.py", line 1796, in write
2024-10-30 07:56:43     zinfo = ZipInfo.from_file(filename, arcname,
2024-10-30 07:56:43             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-10-30 07:56:43   File "/usr/local/lib/python3.11/zipfile.py", line 535, in from_file
2024-10-30 07:56:43     st = os.stat(filename)
2024-10-30 07:56:43          ^^^^^^^^^^^^^^^^^
2024-10-30 07:56:43 FileNotFoundError: [Errno 2] No such file or directory: '/app/media/projects/firefox/[email protected]:flodolo/firefox-l10n.git/it/mobile/android/mobile-l10n.js'

EDIT: it actually fails also after syncing to the repo, so the file is actually there.

@flodolo
Copy link
Collaborator

flodolo commented Oct 31, 2024

Looks like I'm once again unable to set up Firefox from scratch (sync doesn't pick up translations, even with --force).
The log is full of messages like about the TOML config

[DEBUG:pontoon.sync.core.translations_from_repo] 2024-10-31 11:20:03,375 [firefox:../../[email protected]:mozilla-l10n/firefox-l10n.git/it/devtools/client/toolbox.ftl] Not an L10n target path

Update: Not sure what happened, but I think my local DB is toasted (too many tests). Creating a new project with a different slug worked 🤷🏼

lc_str = f"{len(updated_locales)} localizations"
else:
lc_str = ", ".join(f"{loc.name} ({loc.code})" for loc in updated_locales)
commit_msg = f"Pontoon/{project.name}: Update {lc_str}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this commit message.

Pontoon/Firefox (local): Update Italian (it)

I find the existing one more readable:

Pontoon: Update Italian (it) localization of Firefox (local)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered what the message looks like if it contains changes for multiple locales? With the old format, it quickly gets much more variable and clunkier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pontoon only commits one locale at a time now. Is that changing with the new code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing yes, given lc_str = "1 locale" if lc_count == 1 else f"{lc_count} locales".

I feel like we have discussed this at some point, but it's been long enough that I don't know if that happened, or where 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment