This is a Scanner generator that works along the lines of flex, a popular C++ parser generator.
An example project is included to demonstrate the current capabilities. Some of the files need to be rearranged for better organization, but the basics are there.
Regexes currently have a rudamentary implementation. All possibilities are as follows:
*: Kleene star. Zero or more of the preceeding letters. It is best to wrap this in parentheses, as expressions like a(ba)* will allow '' and force an 'aba' pattern.
(): Parentheses. An expression in parentheses will be evaluated on its own - it's a recursive algorithm. When in doubt, use more.
[*-*]: A macro for adding a set of characters. Uses the character value to check whether it is between first and second, for example [a-z] will allow all lowercase letters.
|: Or. Will allow the entire left half of the expression or the entire right half, so it should be used with parentheses - a(a|b) allows aa or ab
There is an example of definitions included in the project. All text within a usings bracket will be placed at the top of the generated code. Return should give the return type of the expression - default is int. Error should give the error value - default is -1. Endtype should give the end of file value - default is 0.
Between begin and and are the expressions. All regex should be placed between brackets, followed by a colon and curly brackets containing your code. This code is place verbatim into the generated C# as a lambda expression. In the future I will include an example that shows how to get the value of the parsed string.
%usings
%{
using TestProject;
%}
%return {Types}
%errortype {Types.Error}
%endtype {Types.End}
%begin
[int] : { return Types.INT; }
[([1-9])([0-9]*)] : { return Types.Num; }
["((([#-Z^-~])|(\\"|\r|\n|\t|\[|\]))*)"] : { return Types.STRING; }
[([a-zA-Z])(([a-zA-Z0-9])*)] : { return Types.IDENTIFIER; }
[\w(\w*)] : { return Lex(); }
%end