Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex support #69

Open
jonaprieto opened this issue Aug 3, 2022 · 16 comments · Fixed by #79
Open

Regex support #69

jonaprieto opened this issue Aug 3, 2022 · 16 comments · Fixed by #79

Comments

@jonaprieto
Copy link

Hey

I didn't see it any support for testing the output against a regex expression in the documentation.
Sorry if I miss it, but I consider it essential. One of the reasons I'm still using ShellTestRunner.

@jonaprieto jonaprieto changed the title Regex suuport Regex support Aug 3, 2022
@SamirTalwar
Copy link
Owner

I think this is an excellent idea.

Could you give me an example of the kind of thing you need to run a regex over? Please be as specific as possible: it would be helpful in designing such a feature to have some example input, the expected output, and the kind of regex you'd need.

@jonaprieto
Copy link
Author

jonaprieto commented Aug 4, 2022

Let's imagine I want to check the stderr output of a given command containing a particular pattern.

For example, let's say I want to check if an error in a specific location occurs while running MyTool. The error given by this tool follows a particular form. Then, I would like to write something like the following where I can check the error happens effectively at line 11, cols 7-8.

command : MyTool
inputFile: tests/negative/MyFile.X
stderr : "(.+)\/([^\/]+)\.X\:11\:7\-8\: error.*"

PS. The error pattern in the example is used by GHC compiler, if I'm not mistaken.

@SamirTalwar
Copy link
Owner

That seems quite sensible. You could run a filter over it to remove the unimportant parts but that might make a test failure message very confusing.

How do you feel about this kind of syntax?

command:
  - MyTool

tests:
  - name: negative
    args:
      - tests/negative/MyFile.X
    exit-status: 1
    stderr:
      match: '(.+)/([^/]+)\.X:11:7-8: error.*'

(You can probably get away with fewer escapes if we use Perl-compatible regular expressions and don't need to support Perl- or sed-style syntax.)

You do lose one nice feature, in that you can't show a diff when the match is a regex, but as long as you're using it on short outputs, I don't think that's an issue.

@jonaprieto
Copy link
Author

jonaprieto commented Aug 4, 2022

That syntax seems excellent. Fewer escapes for a regex, the better. Another possible option that may become handy is if we specify if the match has to occur in the whole output or a line. Maybe, something like this. But for now, the initial proposal sounds fantastic!

...
    exit-status: 1
    stderr:
      match: 
        inline: '^(.+)/([^/]+)\.X:11:7-\d+: error.*'
      
...

@SamirTalwar
Copy link
Owner

I suggest we make use of PCRE's native functionality. You can use \A and \Z to match the start and end of the whole text ("subject" in the docs), or if multiline mode is enabled, you can use ^ and $ for the start and end of a line.

I suggest a syntax along the lines of:

match: <pattern>

or:

match:
  pattern: <pattern>
  flags:
    - multiline  # or `m`
    - caseless   # or `i`
    - etc.

The flags are specified in the pcrepattern specification.

We won't have a way to match against multiple regular expressions and and the result (Smoke supports or through a choice mechanism, but not and). I think I'm OK with this until someone comes up with a use case.

I can probably take a crack at this sometime in the next week, though I make no guarantees as to when I will finish it. If you'd like to give it a go, you're more than welcome. What would you prefer?

@SamirTalwar
Copy link
Owner

Just to give you an update: in trying to implement this, I have found a few bugs in edge cases when interpreting the YAML, which are exacerbated by trying to add another type of matcher. So it's taking longer than expected. (I'm also not working on this full-time, as you might have guessed.)

Long story short, probably don't implement even a simple expression language in YAML.

I hope I can get this done soon-ish, but there is of course the family, who come first, and the day job, who pay me, so please don't hold your breath on it happening in the next couple of weeks.

@jonaprieto
Copy link
Author

Understandable. I have similar time constraints and a trip this week, so please don't feel pressured. Maybe you want to share a branch to see the progress or suggest which places in the codebase to look first for anyone to experiment.

@SamirTalwar
Copy link
Owner

This boils down to adding a new assertion type in Types/Assert.hs, handling it in Assert.hs, and parsing it in Types/Values.hs.

I have laid a little bit of the groundwork in the regex branch, which might be a decent place to start if you want to give it a shot.

@SamirTalwar
Copy link
Owner

Thinking about it, we may be able to just ignore the parsing issue that tripped me up. The main issue is that we can write:

stdout:
  contains: "hello"

But we can't write:

stdout:
  content:
    contains: "hello"

This is inconsistent with assertions that use a file. But perhaps we can just live with it for now, as long as we can write matches:.

@SamirTalwar
Copy link
Owner

I have managed to make the time to sort out those vaguely-related issues.

The next challenge is picking a regex engine. I was hoping to support Perl-compatible regular expressions (PCRE) as they're very powerful and I like them a lot. However, all the implementations require linking to a native library, which (a) means that we have a dynamic dependency we didn't have before, and (b) Windows support will be a nightmare.

I tried getting both regex-pcre and text-icu to build with no luck, but even if they worked, I think asking people to install PCRE or ICU4C on Windows via MSYS2 is a bit much.

I guess I'll be trying regex-pcre-builtin or regexpr next.

@SamirTalwar
Copy link
Owner

I left this for a few months and when I came back, Windows support magically started working.

I've now added this in #79. Please give it a shot, following the README changes, and let me know if you have any feedback. I'll merge it and ship a new release if it looks good for you (or in a few days if I get no response).

@SamirTalwar
Copy link
Owner

I ended up reverting this change because I couldn't release; there seems to be no way to use text-icu statically across operating systems and architectures.

Now I'm considering the regex-tdfa library, which supports POSIX regular expressions. They would be more limited. If you would still find this feature useful, could you let me know if that suits your purposes?

@SamirTalwar
Copy link
Owner

Gonna re-open this. In #100, @ivan19871002 points out that regex-tdfa has some memory consumption issues, so that's not really an option. I will keep looking for a pure Haskell alternative.

@SamirTalwar SamirTalwar reopened this Feb 22, 2023
@ivan19871002
Copy link

I revert this commit 888148d. re-add regex support. works well on all platform (x86_x64 linux , x86 mac, aarch64 linux)

@SamirTalwar
Copy link
Owner

I'm glad it's working well for you.

I have done the same thing and pushed the changes as the regex-icu branch, which you are welcome to use instead. I will support this branch and fix any issues until I can find a better solution, or figure out how to compile things statically on Windows and macOS.

@ivan19871002
Copy link

macOS

brew link icu4c

before compile

export LDFLAGS="-L/usr/local/opt/icu4c/lib"
export CPPFLAGS="-I/usr/local/opt/icu4c/include"
export PKG_CONFIG_PATH="/usr/local/opt/icu4c/lib/pkgconfig"

then
stack install --local-bin-path=./out/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants