Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to return empty string rather than None for the zero_or_more case? #22

Open
lmmx opened this issue Dec 10, 2023 · 3 comments
Open

Comments

@lmmx
Copy link

lmmx commented Dec 10, 2023

Hi there, I was very pleased to find a solution to the inability to generate a regex for (.*?) in capture groups via the parse library, only (.+?). I feel it's a shame that the libraries could not be merged, but such is open source.

I've studied your docs and comments on the other repo and written out test cases for the behaviour I'm after.

I've only managed to make "optional strings" (nullable strings, Union[str,None]) whereas what I really want is "any width strings" (length 0+, str).

Here's the code I wrote to achieve it:

from parse import with_pattern

from parse_type.cfparse import Parser


def check(parser: Parser, schema: str, expected: list[str], /) -> None:
    """Validate the parsed field values against their expected values."""
    result = parser.parse(schema)
    try:
        assert result is not None, f"Parse failed for {schema!r} ({expected=})"
        values = [result[f] or "" for f in parser.named_fields]
        assert values == expected, f"Parsed {schema!r} as {values} ({expected=})"
    except AssertionError as exc:
        print(f"  F {exc}")
    else:
        print(f"  P {schema!r} ---> {result}")


@with_pattern(r".+")
def parse_str(text: str) -> str:
    return text


extra_types = {"Stringlike": parse_str}

parser = Parser("-{content:Stringlike?}", extra_types=extra_types)
print(f"EXPR {parser._expression}")
check(parser, "-hello world", ["hello world"])
check(parser, "-", [""])

print()

parser = Parser("-{a:Stringlike?} {b:Stringlike?}", extra_types=extra_types)
print(f"EXPR {parser._expression}")
check(parser, "-A B", ["A", "B"])
check(parser, "-A ", ["A", ""])  # ["A", ""]
check(parser, "- B", ["", "B"])  # ["", "B"]
check(parser, "- ", ["", ""])  # ["", ""]

Which results in

EXPR -(?P<content>(.+)?)
  P '-hello world' ---> <Result () {'content': 'hello world'}>
  P '-' ---> <Result () {'content': None}>

EXPR -(?P<a>(.+)?) (?P<b>(.+)?)
  P '-A B' ---> <Result () {'a': 'A', 'b': 'B'}>
  P '-A ' ---> <Result () {'a': 'A', 'b': None}>
  P '- B' ---> <Result () {'a': None, 'b': 'B'}>
  P '- ' ---> <Result () {'a': None, 'b': None}>

Note that in my code I extract the field value or "" so I can test against lists of strings including the empty string rather than None.

What I would really like here is to eliminate that or statement, I really just want strings.

I suspect that the place to do so would be to hook into the TypeBuilder but I'm falling very far down the rabbit hole at this point! If you could guide me I would appreciate it greatly :-)

@lmmx
Copy link
Author

lmmx commented Dec 10, 2023

I note the following difference, I think I read somewhere that you use search to extract the groups so perhaps here is where I ought make the change

>>> m=re.search(r"-(?P<a>(.+)?) (?P<b>(.+)?)", "- "); f"a={m.group('a')} b={m.group('b')}"
'a= b='
>>> m.groups()
('', None, '', None)
>>> m=re.search(r"-(?P<a>(.*)?) (?P<b>(.*)?)", "- "); f"a={m.group('a')} b={m.group('b')}"
'a= b='
>>> m.groups()
('', '', '', '')

If it's not a match an error is raised

>>> m=re.search(r"-(?P<a>(.*)?) (?P<b>(.*)?)", "x"); f"a={m.group('a')} b={m.group('b')}"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

@lmmx
Copy link
Author

lmmx commented Dec 11, 2023

On closer review it looks like the culprit is parse.convert_first. Here is the breakpoint debugged process that occurs in evaluate_result where the None is produced from convert_first which is stored in the dict self._type_conversions.

This is the example of:

@with_pattern(r".*")
def parse_str(text: str) -> str:
    return text


extra_types = {"Stringlike": parse_str}

parser = Parser("-{content:Stringlike?}", extra_types=extra_types)
parser.parse("-")
(Pdb) p m.groupdict()
{'content': ''}
...
(Pdb) n
> /home/louis/miniconda3/lib/python3.10/site-packages/parse.py(580)evaluate_result()
-> if k in self._type_conversions:
(Pdb) p self._type_conversions
{'content': <parse.convert_first object at 0x7f3d9deb7d90>}

This function is just a helper (source):

class convert_first:
    """Convert the first element of a pair.
    This equivalent to lambda s,m: converter(s). But unlike a lambda function, it can be pickled
    """

    def __init__(self, converter):
        self.converter = converter

    def __call__(self, string, match):
        return self.converter(string)

You can use it with anything

>>> parse.convert_first(print)(1,2)
1

i.e. it equals print(1)

(Pdb) n
> /home/louis/miniconda3/lib/python3.10/site-packages/parse.py(581)evaluate_result()
-> value = self._type_conversions[k](groupdict[k], m)

It turns out this is where the TypeBuilder is stored, and the culprit is the convert_optional which I suppose is what to override to give an empty string?

(Pdb) p self._type_conversions[k].converter
<function TypeBuilder.with_zero_or_one.<locals>.convert_optional at 0x7f3d9dea7520>
(Pdb) n
> /home/louis/miniconda3/lib/python3.10/site-packages/parse.py(585)evaluate_result()
-> named_fields[korig] = value
(Pdb) p value
None

@lmmx
Copy link
Author

lmmx commented Dec 11, 2023

Update: here is the implementation I ended up with for reference, I found this quite involved but it appears to be robust. I think this might constitute a candidate for adding to the library, I'd be interested in your thoughts.

from parse import Parser, with_pattern

from parse_type import TypeBuilder


@with_pattern(r".*")
def parse_str(text: str) -> str:
    return text


class SmolStrTypeBuilder(TypeBuilder):
    @classmethod
    def with_zero_or_more_chars(cls, converter, pattern=None):
        nullable_optional = cls.with_zero_or_one(converter=converter, pattern=pattern)

        @with_pattern(nullable_optional.pattern)
        def convert_optional(text, m=None):
            """Uses the empty string as the sentinel instead of `None`."""
            return converter(text) if text else ""

        convert_optional.regex_group_count = nullable_optional.regex_group_count
        return convert_optional


def check(parser: Parser, schema: str, expected: list[str], /) -> None:
    """Validate the parsed field values against their expected values."""
    result = parser.parse(schema)
    try:
        assert result is not None, f"Parse failed for {schema!r} ({expected=})"
        values = [result[f] for f in parser.named_fields]
        assert values == expected, f"Parsed {schema!r} as {values} ({expected=})"
    except AssertionError as exc:
        print(f"  F {exc}")
    else:
        print(f"  P {schema!r} ---> {result}")


parse_any_width_string = SmolStrTypeBuilder.with_zero_or_more_chars(parse_str)
extra_types = {"?": parse_any_width_string}

parser = Parser("-{content:?}", extra_types=extra_types)
print(f"EXPR {parser._expression}")
check(parser, "-hello world", ["hello world"])
check(parser, "-", [""])
print()
parser = Parser("-{a:?} {b:?}", extra_types=extra_types)
print(f"EXPR {parser._expression}")
check(parser, "-A B", ["A", "B"])
check(parser, "-A ", ["A", ""])
check(parser, "- B", ["", "B"])
check(parser, "- ", ["", ""])

This outputs all "P" (passed tests):

EXPR -(?P<content>(.*)?)
  P '-hello world' ---> <Result () {'content': 'hello world'}>
  P '-' ---> <Result () {'content': ''}>

EXPR -(?P<a>(.*)?) (?P<b>(.*)?)
  P '-A B' ---> <Result () {'a': 'A', 'b': 'B'}>
  P '-A ' ---> <Result () {'a': 'A', 'b': ''}>
  P '- B' ---> <Result () {'a': '', 'b': 'B'}>
  P '- ' ---> <Result () {'a': '', 'b': ''}>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant