You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working and searching in the Parser/EmailParser.php file, I thought about something:
Instead of having things like
private$quoteHeadersRegex = array(
'/^.{0,5}(On(?:(?!\bOn\b|\bwrote(\s|\xc2\xa0)?:).){0,1000}wrote(\s|\xc2\xa0)?:)$/ms', // On DATE, NAME <EMAIL> wrote:'/^.{0,5}(Le\b(?:(?!\bLe\b|\bécrit(\s|\xc2\xa0)?:).){0,1000}écrit(\s|\xc2\xa0)?:)$/ms', // Le DATE, NAME <EMAIL> a écrit :'/^.{0,5}(El(?:(?!\bEl\b|\bescribió\s?:).){0,1000}escribió\s?:)$/ms', // El DATE, NAME <EMAIL> escribió:'/^.{0,5}(El(?:(?!\bEl\b|\bha escrit\s?:).){0,1000}ha escrit\s?:)$/ms', // El DATE, NAME <EMAIL> ha escrit:'/^.{0,5}(Il(?:(?!\bIl\b|\bscritto(\s|\xc2\xa0)?:).){0,1000}scritto(\s|\xc2\xa0)?:)$/ms', // Il DATE, NAME <EMAIL> ha scritto:
[...]
'/^\s*(From\s?:.+\s?(\[|<).+(\]|>))/mu', // "From: NAME <EMAIL>" OR "From : NAME <EMAIL>" OR "From : NAME<EMAIL>"(With support whitespace before start and before <)'/^\s*(发件人\s?:.+\s?(\[|<).+(\]|>))/mu', // "发件人: NAME <EMAIL>" OR "发件人 : NAME <EMAIL>" OR "发件人 : NAME<EMAIL>"(With support whitespace before start and before <)'/^\s*(De\s?:.+\s?(\[|<).+(\]|>))/mu', // "De: NAME <EMAIL>" OR "De : NAME <EMAIL>" OR "De : NAME<EMAIL>" (With support whitespace before start and before <)'/^\s*(Van\s?:.+\s?(\[|<).+(\]|>))/mu', // "Van: NAME <EMAIL>" OR "Van : NAME <EMAIL>" OR "Van : NAME<EMAIL>" (With support whitespace before start and before <)'/^\s*(Da\s?:.+\s?(\[|<).+(\]|>))/mu', // "Da: NAME <EMAIL>" OR "Da : NAME <EMAIL>" OR "Da : NAME<EMAIL>" (With support whitespace before start and before <)
[...]
);
couldn't we have only one variabilized line for each "type" of reply like that (of course it's only a draft):
private$quoteHeadersRegex = array(
'/^.{0,5}($on(?:(?!\b$on\b|\b$wrote(\s|\xc2\xa0)?:).){0,1000}$wrote(\s|\xc2\xa0)?:)$/ms', // On DATE, NAME <EMAIL> wrote:
[...]
'/^\s*($from\s?:.+\s?(\[|<).+(\]|>))/mu', // "From: NAME <EMAIL>" OR "From : NAME <EMAIL>" OR "From : NAME<EMAIL>"(With support whitespace before start and before <)
[...]
);
Then we would run these Regex checks using a list of language files, so for example $wrote would be checked with "wrote", then "a écrit", then "escribió", ...
Here are the advantages I see in that modification:
Adding a new language or variation is easier
You don't have to duplicate X times the same Regex, modifying one or two words each time
You're less likely to make a mistake in a Regex
That was my two cents, thanks for reading 😉
The text was updated successfully, but these errors were encountered:
Hello again,
While working and searching in the
Parser/EmailParser.php
file, I thought about something:Instead of having things like
couldn't we have only one variabilized line for each "type" of reply like that (of course it's only a draft):
Then we would run these Regex checks using a list of language files, so for example $wrote would be checked with "wrote", then "a écrit", then "escribió", ...
Here are the advantages I see in that modification:
That was my two cents, thanks for reading 😉
The text was updated successfully, but these errors were encountered: