Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why rex find a shift-reduce with repetition one or more ? #7

Open
mingodad opened this issue Dec 22, 2024 · 11 comments
Open

Why rex find a shift-reduce with repetition one or more ? #7

mingodad opened this issue Dec 22, 2024 · 11 comments

Comments

@mingodad
Copy link

While testing this grammar https://github.com/parinayc20/Java_Compiler/blob/main/milestone_4/src/parser.y I've noticed that ebnf-convert translates one or more rules to use a repetition operator that should be equivalent but rex doesn't seem to understand it that way.

Simplified yacc grammar:

%token CURLY_OPEN COMMA CURLY_CLOSE NUMBER

%%

ArrayInitializer :
	CURLY_OPEN VariableInitializers COMMA CURLY_CLOSE
	| CURLY_OPEN VariableInitializers CURLY_CLOSE
	| CURLY_OPEN COMMA CURLY_CLOSE
	| CURLY_OPEN CURLY_CLOSE
	;

VariableInitializers :
	VariableInitializer
	| VariableInitializers COMMA VariableInitializer
	;

VariableInitializer :
	NUMBER
	;
%%

Converted by ebnf-convert plus tokens definitions:

/* converted on Sun Dec 22, 2024, 21:19 (UTC+01) by bison-to-w3c v0.68-SNAPSHOT which is Copyright (c) 2011-2024 by Gunther Rademacher <grd@gmx.net> */

//rex -lalr 1 -main -tree -javascript ebnf-convert-bug.ebnf

ArrayInitializer
         ::= CURLY_OPEN VariableInitializers? COMMA? CURLY_CLOSE
VariableInitializers
         ::= //VariableInitializer ( COMMA VariableInitializer )*
	 VariableInitializer
	 | VariableInitializers COMMA VariableInitializer

VariableInitializer
         ::= NUMBER

<?TOKENS?>

CURLY_OPEN ::= '{'
CURLY_CLOSE ::= '}'
COMMA ::= ','
NUMBER ::= [0-9]+

When trying to generate a parser with rex using VariableInitializer ( COMMA VariableInitializer )*:

rex -lalr 1 -main -tree -javascript ebnf-convert-bug.ebnf
LALR(1) conflict #1 (shift-reduce):
    VariableInitializers ::= VariableInitializer ( COMMA VariableInitializer )* .
    VariableInitializers ::= VariableInitializer ( . COMMA VariableInitializer )*
  conflicting lookahead token:
    COMMA

1 LALR(1)-conflict
grammar fails to be LALR(1)

Without using repetition operator (as shown above):

rex -lalr 1 -main -tree -javascript ebnf-convert-bug.ebnf
grammar is LALR(1)

Is this a bug ?

@mingodad
Copy link
Author

How to see the (probably) rewritten grammar ?

@GuntherRademacher
Copy link
Owner

There is no rewriting to BNF, the LR items are based on the EBNF rules, resulting in a different setup of the parser states.

In the actual case, based on the LR item

VariableInitializers ::= ・ VariableInitializer ( COMMA VariableInitializer )*

there is a goto state on VariableInitializer containing these items:

VariableInitializers ::= VariableInitializer ( ・ COMMA VariableInitializer )*
VariableInitializers ::= VariableInitializer ( COMMA VariableInitializer )* ・

which has the shift-reduce conflict on COMMA.

In the BNF variant, the shift and reduce are in different states. But where the BNF variant does many reductions to VariableInitializers, the EBNF one does only a single one.

You can see the LR states in the XML document resulting from the -xml command line option.

LALR(2) may come to help here.

@mingodad
Copy link
Author

Thank you for reply and help !
Indeed it works with -lalr 2, I'm going to test both ways to see how they compare in terms of performance and memory usage.

@mingodad
Copy link
Author

Trying to generate a parser for the whole converted grammar doesn't work even with lalr 4 and the time and memory usage grows a lot.

/usr/bin/time rex -lalr 1 -main -tree -javascript Java_Compiler.ebnf
LALR(1) conflict #1 (shift-reduce):
    VariableInitializers ::= VariableInitializer ( COMMA VariableInitializer )* .
    VariableInitializers ::= VariableInitializer ( . COMMA VariableInitializer )*
  conflicting lookahead token:
    COMMA

LALR(1) conflict #2 (shift-reduce):
    SwitchBlockStatementGroups ::= ( SwitchBlockStatementGroup )+ .
    SwitchBlockStatementGroups ::= ( . SwitchBlockStatementGroup )+
    SwitchBlockStatementGroup ::= . SwitchLabels BlockStatements
    SwitchLabels ::= ( . SwitchLabel )+
    SwitchLabel ::= ( . 'case' ConstantExpression
    | 'default'
    ) COLON
    SwitchLabel ::= ( 'case' ConstantExpression
    | . 'default'
    ) COLON
  conflicting lookahead tokens:
    'case'
    'default'

LALR(1) conflict #3 (shift-reduce):
    DimExprs ::= ( DimExpr )+ .
    DimExprs ::= ( . DimExpr )+
    DimExpr ::= . SQUARE_OPEN Expression SQUARE_CLOSE
  conflicting lookahead token:
    SQUARE_OPEN

3 LALR(1)-conflicts
grammar fails to be LALR(1)
Command exited with non-zero status 1
0.04user 0.00system 0:00.05elapsed 98%CPU (0avgtext+0avgdata 9884maxresident)k
0inputs+0outputs (0major+1021minor)pagefaults 0swaps
/usr/bin/time rex -lalr 2 -main -tree -javascript Java_Compiler.ebnf
LALR(2) conflict #1 (shift-reduce):
    SwitchBlockStatementGroups ::= ( SwitchBlockStatementGroup )+ .
    SwitchBlockStatementGroups ::= ( . SwitchBlockStatementGroup )+
    SwitchBlockStatementGroup ::= . SwitchLabels BlockStatements
    SwitchLabels ::= ( . SwitchLabel )+
    SwitchLabel ::= ( . 'case' ConstantExpression
    | 'default'
    ) COLON
    SwitchLabel ::= ( 'case' ConstantExpression
    | . 'default'
    ) COLON
  conflicting lookahead token sequences:
    'case'
    'default'

1 LALR(2)-conflict
grammar fails to be LALR(2)
Command exited with non-zero status 1
0.24user 0.01system 0:00.26elapsed 99%CPU (0avgtext+0avgdata 13384maxresident)k
0inputs+0outputs (0major+1929minor)pagefaults 0swaps
/usr/bin/time rex -lalr 3 -main -tree -javascript Java_Compiler.ebnf
LALR(3) conflict #1 (shift-reduce):
    SwitchBlockStatementGroups ::= ( SwitchBlockStatementGroup )+ .
    SwitchBlockStatementGroups ::= ( . SwitchBlockStatementGroup )+
    SwitchBlockStatementGroup ::= . SwitchLabels BlockStatements
    SwitchLabels ::= ( . SwitchLabel )+
    SwitchLabel ::= ( . 'case' ConstantExpression
    | 'default'
    ) COLON
    SwitchLabel ::= ( 'case' ConstantExpression
    | . 'default'
    ) COLON
  conflicting lookahead token sequences:
    'case' PAR_OPEN IDENTIFIER
    'case' INCREMENT IDENTIFIER
    'case' DECREMENT IDENTIFIER
    'case' EXCLAMATION IDENTIFIER
    'case' NEGATION IDENTIFIER
    'case' PLUS IDENTIFIER
    'case' MINUS IDENTIFIER
    'case' 'new' IDENTIFIER
    'case' PAR_OPEN INTEGER_LITERAL
    'case' INCREMENT INTEGER_LITERAL
    'case' DECREMENT INTEGER_LITERAL
    'case' EXCLAMATION INTEGER_LITERAL
    'case' NEGATION INTEGER_LITERAL
    'case' PLUS INTEGER_LITERAL
    'case' MINUS INTEGER_LITERAL
    'case' PAR_OPEN FLOATING_POINT_LITERAL
    ... 413 more

1 LALR(3)-conflict
grammar fails to be LALR(3)
Command exited with non-zero status 1
5.57user 0.01system 0:05.58elapsed 100%CPU (0avgtext+0avgdata 92924maxresident)k
0inputs+0outputs (0major+21908minor)pagefaults 0swaps
/usr/bin/time rex -lalr 4 -main -tree -javascript Java_Compiler.ebnf
LALR(4) conflict #1 (shift-reduce):
    SwitchBlockStatementGroups ::= ( SwitchBlockStatementGroup )+ .
    SwitchBlockStatementGroups ::= ( . SwitchBlockStatementGroup )+
    SwitchBlockStatementGroup ::= . SwitchLabels BlockStatements
    SwitchLabels ::= ( . SwitchLabel )+
    SwitchLabel ::= ( . 'case' ConstantExpression
    | 'default'
    ) COLON
    SwitchLabel ::= ( 'case' ConstantExpression
    | . 'default'
    ) COLON
  conflicting lookahead token sequences:
    'case' PLUS IDENTIFIER IDENTIFIER
    'case' MINUS IDENTIFIER IDENTIFIER
    'case' PLUS INTEGER_LITERAL IDENTIFIER
    'case' MINUS INTEGER_LITERAL IDENTIFIER
    'case' PLUS FLOATING_POINT_LITERAL IDENTIFIER
    'case' MINUS FLOATING_POINT_LITERAL IDENTIFIER
    'case' PLUS CHARACTER_LITERAL IDENTIFIER
    'case' MINUS CHARACTER_LITERAL IDENTIFIER
    'case' PLUS STRING_LITERAL IDENTIFIER
    'case' MINUS STRING_LITERAL IDENTIFIER
    'case' PLUS TEXT_BLOCK_LITERAL IDENTIFIER
    'case' MINUS TEXT_BLOCK_LITERAL IDENTIFIER
    'case' IDENTIFIER NOT_EQUAL_TO IDENTIFIER
    'case' INTEGER_LITERAL NOT_EQUAL_TO IDENTIFIER
    'case' FLOATING_POINT_LITERAL NOT_EQUAL_TO IDENTIFIER
    'case' CHARACTER_LITERAL NOT_EQUAL_TO IDENTIFIER
    ... 8153 more

1 LALR(4)-conflict
grammar fails to be LALR(4)
Command exited with non-zero status 1
279.25user 0.39system 4:39.57elapsed 100%CPU (0avgtext+0avgdata 1921580maxresident)k
0inputs+0outputs (0major+479680minor)pagefaults 0swaps

The whole grammar:

/* converted on Sun Dec 22, 2024, 17:49 (UTC+01) by bison-to-w3c v0.68-SNAPSHOT which is Copyright (c) 2011-2024 by Gunther Rademacher <grd@gmx.net> */
//From: https://github.com/parinayc20/Java_Compiler/blob/d61abead845b35b561df9d914af4b840dc62318c/milestone_4/src/parser.y

//rex -lalr 2 -main -tree -javascript Java_Compiler.ebnf

input ::= CompilationUnit EOF

Literal  ::= INTEGER_LITERAL
           | FLOATING_POINT_LITERAL
           | CHARACTER_LITERAL
           | STRING_LITERAL
           | TEXT_BLOCK_LITERAL
           | "null"
           | "true"|"false"
Type     ::= PrimitiveType
           | ReferenceType
PrimitiveType
         ::= NumericType
           | "boolean"
           | "String"
NumericType
         ::= IntegralType
           | FloatingPointType
IntegralType
         ::= "byte"
           | "short"
           | "int"
           | "long"
           | "char"
FloatingPointType
         ::= "float"
           | "double"
ReferenceType
         ::= ClassOrInterfaceType
           | ( PrimitiveType | Name ) ( SQUARE_OPEN SQUARE_CLOSE )+
ClassOrInterfaceType
         ::= Name
ClassType
         ::= ClassOrInterfaceType
InterfaceType
         ::= ClassOrInterfaceType
Name     ::= SimpleName
           | QualifiedName
SimpleName
         ::= IDENTIFIER
QualifiedName
         ::= Name DOT IDENTIFIER
CompilationUnit
         ::= ( PackageDeclaration ImportDeclarations? | ImportDeclarations ) TypeDeclarations?
           | TypeDeclarations
ImportDeclarations
         ::= ImportDeclaration+
TypeDeclarations
         ::= TypeDeclaration+
PackageDeclaration
         ::= "package" Name SEMICOLON
ImportDeclaration
         ::= SingleTypeImportDeclaration
           | TypeImportOnDemandDeclaration
SingleTypeImportDeclaration
         ::= "import" Name SEMICOLON
TypeImportOnDemandDeclaration
         ::= "import" Name DOT MULTIPLY SEMICOLON
TypeDeclaration
         ::= ClassDeclaration
           | InterfaceDeclaration
           | SEMICOLON
MethodTypeDeclaration
         ::= ClassDeclaration
           | InterfaceDeclaration
Modifiers
         ::= Modifier+
Modifier ::= "public"
           | "protected"
           | "private"
           | "static"
           | "abstract"
           | "final"
           | "native"
           | "synchronized"
           | "transient"
           | "volatile"
ClassDeclaration
         ::= Modifiers? "class" IDENTIFIER Super? Interfaces? ClassBody
Super    ::= "extends" ClassType
Interfaces
         ::= "implements" InterfaceType ( COMMA InterfaceType )*
ClassBody
         ::= CURLY_OPEN ClassBodyDeclaration* CURLY_CLOSE
ClassBodyDeclaration
         ::= ClassMemberDeclaration
           | StaticInitializer
           | ConstructorDeclaration
           | TypeDeclaration
ClassMemberDeclaration
         ::= FieldDeclaration
           | MethodDeclaration
FieldDeclaration
         ::= Modifiers? Type VariableDeclarators SEMICOLON
VariableDeclarators
         ::= VariableDeclarator ( COMMA VariableDeclarator )*
VariableDeclarator
         ::= VariableDeclaratorId ( ASSIGN VariableInitializer )?
VariableDeclaratorId
         ::= IDENTIFIER ( SQUARE_OPEN SQUARE_CLOSE )*
VariableInitializer
         ::= Expression
           | ArrayInitializer
MethodDeclaration
         ::= MethodHeader MethodBody
MethodHeader
         ::= Modifiers? ( Type | "void" ) MethodDeclarator Throws?
MethodDeclarator
         ::= IDENTIFIER PAR_OPEN FormalParameterList? PAR_CLOSE ( SQUARE_OPEN SQUARE_CLOSE )*
FormalParameterList
         ::= FormalParameter ( COMMA FormalParameter )*
FormalParameter
         ::= VariableModifier? Type VariableDeclaratorId
Throws   ::= "throws" ClassType ( COMMA ClassType )*
MethodBody
         ::= MethodBlock
           | SEMICOLON
StaticInitializer
         ::= "static" Block
ConstructorDeclaration
         ::= Modifiers? ConstructorDeclarator Throws? ConstructorBody
ConstructorDeclarator
         ::= SimpleName PAR_OPEN FormalParameterList? PAR_CLOSE
ConstructorBody
         ::= CURLY_OPEN ExplicitConstructorInvocation? BlockStatements? CURLY_CLOSE
ExplicitConstructorInvocation
         ::= ( "this" | "super" ) PAR_OPEN ArgumentList? PAR_CLOSE SEMICOLON
InterfaceDeclaration
         ::= Modifiers? "interface" IDENTIFIER ExtendsInterfaces? InterfaceBody
ExtendsInterfaces
         ::= "extends" InterfaceType ( COMMA InterfaceType )*
InterfaceBody
         ::= CURLY_OPEN InterfaceMemberDeclaration* CURLY_CLOSE
InterfaceMemberDeclaration
         ::= ConstantDeclaration
           | AbstractMethodDeclaration
ConstantDeclaration
         ::= FieldDeclaration
AbstractMethodDeclaration
         ::= MethodHeader SEMICOLON
ArrayInitializer
         ::= CURLY_OPEN VariableInitializers? COMMA? CURLY_CLOSE
VariableInitializers
         ::= VariableInitializer ( COMMA VariableInitializer )*
	 //VariableInitializer
	 //| VariableInitializers COMMA VariableInitializer
Block    ::= CURLY_OPEN BlockStatements? CURLY_CLOSE
BlockStatements
         ::= BlockStatement+
BlockStatement
         ::= LocalVariableDeclarationStatement
           | Statement
MethodBlock
         ::= CURLY_OPEN MethodBlockStatement* CURLY_CLOSE
MethodBlockStatement
         ::= LocalVariableDeclarationStatement
           | Statement
           | MethodTypeDeclaration
LocalVariableDeclarationStatement
         ::= LocalVariableDeclaration SEMICOLON
LocalVariableDeclaration
         ::= VariableModifier? Type VariableDeclarators
VariableModifier
         ::= "final"
Statement
         ::= StatementWithoutTrailingSubstatement
           | LabeledStatement
           | IfThenStatement
           | IfThenElseStatement
           | WhileStatement
           | ForStatement
StatementNoShortIf
         ::= StatementWithoutTrailingSubstatement
           | LabeledStatementNoShortIf
           | IfThenElseStatementNoShortIf
           | WhileStatementNoShortIf
           | ForStatementNoShortIf
StatementWithoutTrailingSubstatement
         ::= Block
           | EmptyStatement
           | ExpressionStatement
           | SwitchStatement
           | DoStatement
           | BreakStatement
           | ContinueStatement
           | ReturnStatement
           | SynchronizedStatement
           | ThrowStatement
           | TryStatement
EmptyStatement
         ::= SEMICOLON
LabeledStatement
         ::= IDENTIFIER COLON Statement
LabeledStatementNoShortIf
         ::= IDENTIFIER COLON StatementNoShortIf
ExpressionStatement
         ::= StatementExpression SEMICOLON
StatementExpression
         ::= Assignment
           | PreIncrementExpression
           | PreDecrementExpression
           | PostIncrementExpression
           | PostDecrementExpression
           | MethodInvocation
           | ClassInstanceCreationExpression
IfThenStatement
         ::= "if" PAR_OPEN Expression PAR_CLOSE Statement
IfThenElseStatement
         ::= "if" PAR_OPEN Expression PAR_CLOSE StatementNoShortIf "else" Statement
IfThenElseStatementNoShortIf
         ::= "if" PAR_OPEN Expression PAR_CLOSE StatementNoShortIf "else" StatementNoShortIf
SwitchStatement
         ::= "switch" PAR_OPEN Expression PAR_CLOSE SwitchBlock
SwitchBlock
         ::= CURLY_OPEN SwitchBlockStatementGroups? SwitchLabels? CURLY_CLOSE
SwitchBlockStatementGroups
         ::= SwitchBlockStatementGroup+
	 //SwitchBlockStatementGroup
	 //| SwitchBlockStatementGroups SwitchBlockStatementGroup
SwitchBlockStatementGroup
         ::= SwitchLabels BlockStatements
SwitchLabels
         ::= SwitchLabel+
SwitchLabel
         ::= ( "case" ConstantExpression | "default" ) COLON
WhileStatement
         ::= "while" PAR_OPEN Expression PAR_CLOSE Statement
WhileStatementNoShortIf
         ::= "while" PAR_OPEN Expression PAR_CLOSE StatementNoShortIf
DoStatement
         ::= "do" Statement "while" PAR_OPEN Expression PAR_CLOSE SEMICOLON
ForStatement
         ::= "for" PAR_OPEN ForInit? SEMICOLON Expression? SEMICOLON ForUpdate? PAR_CLOSE Statement
ForStatementNoShortIf
         ::= "for" PAR_OPEN ForInit? SEMICOLON Expression? SEMICOLON ForUpdate? PAR_CLOSE StatementNoShortIf
ForInit  ::= StatementExpressionList
           | LocalVariableDeclaration
ForUpdate
         ::= StatementExpressionList
StatementExpressionList
         ::= StatementExpression ( COMMA StatementExpression )*
BreakStatement
         ::= "break" IDENTIFIER? SEMICOLON
ContinueStatement
         ::= "continue" IDENTIFIER? SEMICOLON
ReturnStatement
         ::= "return" Expression? SEMICOLON
ThrowStatement
         ::= "throw" Expression SEMICOLON
SynchronizedStatement
         ::= "synchronized" PAR_OPEN Expression PAR_CLOSE Block
TryStatement
         ::= "try" Block ( Catches Finally? | Finally )
Catches  ::= CatchClause+
CatchClause
         ::= "catch" PAR_OPEN FormalParameter PAR_CLOSE Block
Finally  ::= "finally" Block
Primary  ::= PrimaryNoNewArray
           | ArrayCreationExpression
PrimaryNoNewArray
         ::= Literal
           | "this"
           | PAR_OPEN Expression PAR_CLOSE
           | ClassInstanceCreationExpression
           | FieldAccess
           | MethodInvocation
           | ArrayAccess
ClassInstanceCreationExpression
         ::= "new" ClassType PAR_OPEN ArgumentList? PAR_CLOSE
ArgumentList
         ::= Expression ( COMMA Expression )*
ArrayCreationExpression
         ::= "new" ( PrimitiveType | ClassOrInterfaceType ) ( DimExprs Dims? | Dims ArrayInitializer )
DimExprs ::= DimExpr+
	//DimExpr
	//| DimExprs DimExpr
DimExpr  ::= SQUARE_OPEN Expression SQUARE_CLOSE
Dims     ::= ( SQUARE_OPEN SQUARE_CLOSE )+
FieldAccess
         ::= ( Primary | "super" ) DOT IDENTIFIER
MethodInvocation
         ::= ( Name | ( Primary | "super" ) DOT IDENTIFIER ) PAR_OPEN ArgumentList? PAR_CLOSE
ArrayAccess
         ::= ( Name | PrimaryNoNewArray ) SQUARE_OPEN Expression SQUARE_CLOSE
PostfixExpression
         ::= Primary
           | Name
           | PostIncrementExpression
           | PostDecrementExpression
PostIncrementExpression
         ::= PostfixExpression INCREMENT
PostDecrementExpression
         ::= PostfixExpression DECREMENT
UnaryExpression
         ::= ( PLUS | MINUS )* ( PreIncrementExpression | PreDecrementExpression | UnaryExpressionNotPlusMinus )
PreIncrementExpression
         ::= INCREMENT UnaryExpression
PreDecrementExpression
         ::= DECREMENT UnaryExpression
UnaryExpressionNotPlusMinus
         ::= PostfixExpression
           | ( NEGATION | EXCLAMATION ) UnaryExpression
           | CastExpression
CastExpression
         ::= PAR_OPEN ( PrimitiveType Dims? PAR_CLOSE UnaryExpression | ( Expression | Name Dims ) PAR_CLOSE UnaryExpressionNotPlusMinus )
MultiplicativeExpression
         ::= UnaryExpression ( ( MULTIPLY | DIVIDE | MODULO ) UnaryExpression )*
AdditiveExpression
         ::= MultiplicativeExpression ( ( PLUS | MINUS ) MultiplicativeExpression )*
ShiftExpression
         ::= AdditiveExpression ( ( LEFT_SHIFT | RIGHT_SHIFT | UNSIGNED_RIGHT_SHIFT ) AdditiveExpression )*
RelationalExpression
         ::= ShiftExpression ( ( LESS_THAN | GREATER_THAN | LESS_THAN_OR_EQUAL_TO | GREATER_THAN_OR_EQUAL_TO ) ShiftExpression | "instanceof" ReferenceType )*
EqualityExpression
         ::= RelationalExpression ( ( EQUAL_TO | NOT_EQUAL_TO ) RelationalExpression )*
AndExpression
         ::= EqualityExpression ( BITWISE_AND EqualityExpression )*
ExclusiveOrExpression
         ::= AndExpression ( BITWISE_XOR AndExpression )*
InclusiveOrExpression
         ::= ExclusiveOrExpression ( BITWISE_OR ExclusiveOrExpression )*
ConditionalAndExpression
         ::= InclusiveOrExpression ( AND_LOGICAL InclusiveOrExpression )*
ConditionalOrExpression
         ::= ConditionalAndExpression ( OR_LOGICAL ConditionalAndExpression )*
AssignmentExpression
         ::= ConditionalOrExpression ( QUESTION Expression COLON ConditionalOrExpression )*
           | Assignment
Assignment
         ::= LeftHandSide AssignmentOperator AssignmentExpression
LeftHandSide
         ::= Name
           | FieldAccess
           | ArrayAccess
AssignmentOperator
         ::= ASSIGN
           | MULTIPLY_ASSIGN
           | DIVIDE_ASSIGN
           | MOD_ASSIGN
           | PLUS_ASSIGN
           | MINUS_ASSIGN
           | LEFT_SHIFT_ASSIGN
           | RIGHT_SHIFT_ASSIGN
           | UNSIGNED_RIGHT_SHIFT_ASSIGN
           | AND_ASSIGN
           | XOR_ASSIGN
           | OR_ASSIGN
Expression
         ::= AssignmentExpression
ConstantExpression
         ::= Expression

Ignorable
         ::= WhiteSpace
           | Comment
          /* ws: definition */

<?TOKENS?>

KEYWORDS ::=
	"abstract"
	| "boolean"
	| "break"
	| "byte"
	| "case"
	| "catch"
	| "char"
	| "class"
	| "continue"
	| "default"
	| "do"
	| "double"
	| "else"
	| "extends"
	| "final"
	| "finally"
	| "float"
	| "for"
	| "if"
	| "implements"
	| "import"
	| "instanceof"
	| "int"
	| "interface"
	| "long"
	| "native"
	| "new"
	| "null"
	| "package"
	| "private"
	| "protected"
	| "public"
	| "return"
	| "short"
	| "static"
	| "String"
	| "super"
	| "switch"
	| "synchronized"
	| "this"
	| "throw"
	| "throws"
	| "transient"
	| "try"
	| "void"
	| "volatile"
	| "while"
	| "true"|"false"

IDENTIFIER ::= ([A-Za-z_][A-Za-z0-9_]+) - KEYWORDS

INTEGER_LITERAL ::= [0-9]+
FLOATING_POINT_LITERAL ::= [0-9]+"."[0-9]+
CHARACTER_LITERAL ::= "'"("\".|[^'\r\n\])"'"
STRING_LITERAL ::= '"'(("\".|[^"\r\n\]))*'"'

LF       ::= #x000A
CR       ::= #x000D
LINE_TERMINATOR ::=
	LF
	| CR
	| CR LF

TEXT_BLOCK_LITERAL ::= '"""'(([ \h\t\f]*)LINE_TERMINATOR(([^"\]|('"'[^"\])|('""'[^"\]))*))'"""'

NOT_EQUAL_TO ::= "!="
DOT ::= "."
COMMA ::= ","
MULTIPLY ::= "*"
SQUARE_OPEN ::= "["
SQUARE_CLOSE ::= "]"
PAR_OPEN ::= "("
PAR_CLOSE ::= ")"
CURLY_OPEN ::= "{"
CURLY_CLOSE ::= "}"
QUESTION ::= "?"
SEMICOLON ::= ";"
COLON ::= ":"
INCREMENT ::= "++"
DECREMENT ::= "--"
EXCLAMATION ::= "!"
NEGATION ::= "~"
PLUS ::= "+"
MINUS ::= "-"
DIVIDE ::= "/"
MODULO ::= "%"
BITWISE_AND ::= "&"
BITWISE_OR ::= "|"
BITWISE_XOR ::= "^"
AND_LOGICAL ::= "&&"
OR_LOGICAL ::= "||"
UNSIGNED_RIGHT_SHIFT ::= ">>>"
LEFT_SHIFT ::= "<<"
RIGHT_SHIFT ::= ">>"
LESS_THAN ::= "<"
GREATER_THAN ::= ">"
LESS_THAN_OR_EQUAL_TO ::= "<="
GREATER_THAN_OR_EQUAL_TO ::= ">="
EQUAL_TO ::= "=="
ASSIGN ::= "="
PLUS_ASSIGN ::= "+="
MINUS_ASSIGN ::= "-="
MULTIPLY_ASSIGN ::= "*="
DIVIDE_ASSIGN ::= "/="
MOD_ASSIGN ::= "%="
AND_ASSIGN ::= "&="
OR_ASSIGN ::= "|="
XOR_ASSIGN ::= "^="
LEFT_SHIFT_ASSIGN ::= "<<="
RIGHT_SHIFT_ASSIGN ::= ">>="
UNSIGNED_RIGHT_SHIFT_ASSIGN ::= ">>>="

WhiteSpace
         ::= ' '
           | #x09
           | #x0C
           | LINE_TERMINATOR
Comment  ::= TraditionalComment
           | EndOfLineComment
TraditionalComment
         ::= '/*' [^*] '*/'
EndOfLineComment
         ::= '//' [^#x0A]*

EOF      ::= $

@mingodad mingodad reopened this Dec 23, 2024
@mingodad
Copy link
Author

But manually fixing the converted grammar on the offending rules that uses the repetition operator can generate a -lalr 1 parser:

/usr/bin/time rex -lalr 1 -main -tree -javascript Java_Compiler2.ebnf
grammar is LALR(1)
0.08user 0.00system 0:00.08elapsed 98%CPU (0avgtext+0avgdata 13676maxresident)k
0inputs+304outputs (0major+2378minor)pagefaults 0swaps

The fixed grammar:

/* converted on Sun Dec 22, 2024, 17:49 (UTC+01) by bison-to-w3c v0.68-SNAPSHOT which is Copyright (c) 2011-2024 by Gunther Rademacher <grd@gmx.net> */
//From: https://github.com/parinayc20/Java_Compiler/blob/d61abead845b35b561df9d914af4b840dc62318c/milestone_4/src/parser.y

//rex -lalr 1 -main -tree -javascript Java_Compiler2.ebnf

input ::= CompilationUnit EOF

Literal  ::= INTEGER_LITERAL
           | FLOATING_POINT_LITERAL
           | CHARACTER_LITERAL
           | STRING_LITERAL
           | TEXT_BLOCK_LITERAL
           | "null"
           | "true"|"false"
Type     ::= PrimitiveType
           | ReferenceType
PrimitiveType
         ::= NumericType
           | "boolean"
           | "String"
NumericType
         ::= IntegralType
           | FloatingPointType
IntegralType
         ::= "byte"
           | "short"
           | "int"
           | "long"
           | "char"
FloatingPointType
         ::= "float"
           | "double"
ReferenceType
         ::= ClassOrInterfaceType
           | ( PrimitiveType | Name ) ( SQUARE_OPEN SQUARE_CLOSE )+
ClassOrInterfaceType
         ::= Name
ClassType
         ::= ClassOrInterfaceType
InterfaceType
         ::= ClassOrInterfaceType
Name     ::= SimpleName
           | QualifiedName
SimpleName
         ::= IDENTIFIER
QualifiedName
         ::= Name DOT IDENTIFIER
CompilationUnit
         ::= ( PackageDeclaration ImportDeclarations? | ImportDeclarations ) TypeDeclarations?
           | TypeDeclarations
ImportDeclarations
         ::= ImportDeclaration+
TypeDeclarations
         ::= TypeDeclaration+
PackageDeclaration
         ::= "package" Name SEMICOLON
ImportDeclaration
         ::= SingleTypeImportDeclaration
           | TypeImportOnDemandDeclaration
SingleTypeImportDeclaration
         ::= "import" Name SEMICOLON
TypeImportOnDemandDeclaration
         ::= "import" Name DOT MULTIPLY SEMICOLON
TypeDeclaration
         ::= ClassDeclaration
           | InterfaceDeclaration
           | SEMICOLON
MethodTypeDeclaration
         ::= ClassDeclaration
           | InterfaceDeclaration
Modifiers
         ::= Modifier+
Modifier ::= "public"
           | "protected"
           | "private"
           | "static"
           | "abstract"
           | "final"
           | "native"
           | "synchronized"
           | "transient"
           | "volatile"
ClassDeclaration
         ::= Modifiers? "class" IDENTIFIER Super? Interfaces? ClassBody
Super    ::= "extends" ClassType
Interfaces
         ::= "implements" InterfaceType ( COMMA InterfaceType )*
ClassBody
         ::= CURLY_OPEN ClassBodyDeclaration* CURLY_CLOSE
ClassBodyDeclaration
         ::= ClassMemberDeclaration
           | StaticInitializer
           | ConstructorDeclaration
           | TypeDeclaration
ClassMemberDeclaration
         ::= FieldDeclaration
           | MethodDeclaration
FieldDeclaration
         ::= Modifiers? Type VariableDeclarators SEMICOLON
VariableDeclarators
         ::= VariableDeclarator ( COMMA VariableDeclarator )*
VariableDeclarator
         ::= VariableDeclaratorId ( ASSIGN VariableInitializer )?
VariableDeclaratorId
         ::= IDENTIFIER ( SQUARE_OPEN SQUARE_CLOSE )*
VariableInitializer
         ::= Expression
           | ArrayInitializer
MethodDeclaration
         ::= MethodHeader MethodBody
MethodHeader
         ::= Modifiers? ( Type | "void" ) MethodDeclarator Throws?
MethodDeclarator
         ::= IDENTIFIER PAR_OPEN FormalParameterList? PAR_CLOSE ( SQUARE_OPEN SQUARE_CLOSE )*
FormalParameterList
         ::= FormalParameter ( COMMA FormalParameter )*
FormalParameter
         ::= VariableModifier? Type VariableDeclaratorId
Throws   ::= "throws" ClassType ( COMMA ClassType )*
MethodBody
         ::= MethodBlock
           | SEMICOLON
StaticInitializer
         ::= "static" Block
ConstructorDeclaration
         ::= Modifiers? ConstructorDeclarator Throws? ConstructorBody
ConstructorDeclarator
         ::= SimpleName PAR_OPEN FormalParameterList? PAR_CLOSE
ConstructorBody
         ::= CURLY_OPEN ExplicitConstructorInvocation? BlockStatements? CURLY_CLOSE
ExplicitConstructorInvocation
         ::= ( "this" | "super" ) PAR_OPEN ArgumentList? PAR_CLOSE SEMICOLON
InterfaceDeclaration
         ::= Modifiers? "interface" IDENTIFIER ExtendsInterfaces? InterfaceBody
ExtendsInterfaces
         ::= "extends" InterfaceType ( COMMA InterfaceType )*
InterfaceBody
         ::= CURLY_OPEN InterfaceMemberDeclaration* CURLY_CLOSE
InterfaceMemberDeclaration
         ::= ConstantDeclaration
           | AbstractMethodDeclaration
ConstantDeclaration
         ::= FieldDeclaration
AbstractMethodDeclaration
         ::= MethodHeader SEMICOLON
ArrayInitializer
         ::= CURLY_OPEN VariableInitializers? COMMA? CURLY_CLOSE
VariableInitializers
         ::= //VariableInitializer ( COMMA VariableInitializer )*
	 VariableInitializer
	 | VariableInitializers COMMA VariableInitializer
Block    ::= CURLY_OPEN BlockStatements? CURLY_CLOSE
BlockStatements
         ::= BlockStatement+
BlockStatement
         ::= LocalVariableDeclarationStatement
           | Statement
MethodBlock
         ::= CURLY_OPEN MethodBlockStatement* CURLY_CLOSE
MethodBlockStatement
         ::= LocalVariableDeclarationStatement
           | Statement
           | MethodTypeDeclaration
LocalVariableDeclarationStatement
         ::= LocalVariableDeclaration SEMICOLON
LocalVariableDeclaration
         ::= VariableModifier? Type VariableDeclarators
VariableModifier
         ::= "final"
Statement
         ::= StatementWithoutTrailingSubstatement
           | LabeledStatement
           | IfThenStatement
           | IfThenElseStatement
           | WhileStatement
           | ForStatement
StatementNoShortIf
         ::= StatementWithoutTrailingSubstatement
           | LabeledStatementNoShortIf
           | IfThenElseStatementNoShortIf
           | WhileStatementNoShortIf
           | ForStatementNoShortIf
StatementWithoutTrailingSubstatement
         ::= Block
           | EmptyStatement
           | ExpressionStatement
           | SwitchStatement
           | DoStatement
           | BreakStatement
           | ContinueStatement
           | ReturnStatement
           | SynchronizedStatement
           | ThrowStatement
           | TryStatement
EmptyStatement
         ::= SEMICOLON
LabeledStatement
         ::= IDENTIFIER COLON Statement
LabeledStatementNoShortIf
         ::= IDENTIFIER COLON StatementNoShortIf
ExpressionStatement
         ::= StatementExpression SEMICOLON
StatementExpression
         ::= Assignment
           | PreIncrementExpression
           | PreDecrementExpression
           | PostIncrementExpression
           | PostDecrementExpression
           | MethodInvocation
           | ClassInstanceCreationExpression
IfThenStatement
         ::= "if" PAR_OPEN Expression PAR_CLOSE Statement
IfThenElseStatement
         ::= "if" PAR_OPEN Expression PAR_CLOSE StatementNoShortIf "else" Statement
IfThenElseStatementNoShortIf
         ::= "if" PAR_OPEN Expression PAR_CLOSE StatementNoShortIf "else" StatementNoShortIf
SwitchStatement
         ::= "switch" PAR_OPEN Expression PAR_CLOSE SwitchBlock
SwitchBlock
         ::= CURLY_OPEN SwitchBlockStatementGroups? SwitchLabels? CURLY_CLOSE
SwitchBlockStatementGroups
         ::= //SwitchBlockStatementGroup+
	 SwitchBlockStatementGroup
	 | SwitchBlockStatementGroups SwitchBlockStatementGroup
SwitchBlockStatementGroup
         ::= SwitchLabels BlockStatements
SwitchLabels
         ::= SwitchLabel+
SwitchLabel
         ::= ( "case" ConstantExpression | "default" ) COLON
WhileStatement
         ::= "while" PAR_OPEN Expression PAR_CLOSE Statement
WhileStatementNoShortIf
         ::= "while" PAR_OPEN Expression PAR_CLOSE StatementNoShortIf
DoStatement
         ::= "do" Statement "while" PAR_OPEN Expression PAR_CLOSE SEMICOLON
ForStatement
         ::= "for" PAR_OPEN ForInit? SEMICOLON Expression? SEMICOLON ForUpdate? PAR_CLOSE Statement
ForStatementNoShortIf
         ::= "for" PAR_OPEN ForInit? SEMICOLON Expression? SEMICOLON ForUpdate? PAR_CLOSE StatementNoShortIf
ForInit  ::= StatementExpressionList
           | LocalVariableDeclaration
ForUpdate
         ::= StatementExpressionList
StatementExpressionList
         ::= StatementExpression ( COMMA StatementExpression )*
BreakStatement
         ::= "break" IDENTIFIER? SEMICOLON
ContinueStatement
         ::= "continue" IDENTIFIER? SEMICOLON
ReturnStatement
         ::= "return" Expression? SEMICOLON
ThrowStatement
         ::= "throw" Expression SEMICOLON
SynchronizedStatement
         ::= "synchronized" PAR_OPEN Expression PAR_CLOSE Block
TryStatement
         ::= "try" Block ( Catches Finally? | Finally )
Catches  ::= CatchClause+
CatchClause
         ::= "catch" PAR_OPEN FormalParameter PAR_CLOSE Block
Finally  ::= "finally" Block
Primary  ::= PrimaryNoNewArray
           | ArrayCreationExpression
PrimaryNoNewArray
         ::= Literal
           | "this"
           | PAR_OPEN Expression PAR_CLOSE
           | ClassInstanceCreationExpression
           | FieldAccess
           | MethodInvocation
           | ArrayAccess
ClassInstanceCreationExpression
         ::= "new" ClassType PAR_OPEN ArgumentList? PAR_CLOSE
ArgumentList
         ::= Expression ( COMMA Expression )*
ArrayCreationExpression
         ::= "new" ( PrimitiveType | ClassOrInterfaceType ) ( DimExprs Dims? | Dims ArrayInitializer )
DimExprs ::= //DimExpr+
	DimExpr
	| DimExprs DimExpr
DimExpr  ::= SQUARE_OPEN Expression SQUARE_CLOSE
Dims     ::= ( SQUARE_OPEN SQUARE_CLOSE )+
FieldAccess
         ::= ( Primary | "super" ) DOT IDENTIFIER
MethodInvocation
         ::= ( Name | ( Primary | "super" ) DOT IDENTIFIER ) PAR_OPEN ArgumentList? PAR_CLOSE
ArrayAccess
         ::= ( Name | PrimaryNoNewArray ) SQUARE_OPEN Expression SQUARE_CLOSE
PostfixExpression
         ::= Primary
           | Name
           | PostIncrementExpression
           | PostDecrementExpression
PostIncrementExpression
         ::= PostfixExpression INCREMENT
PostDecrementExpression
         ::= PostfixExpression DECREMENT
UnaryExpression
         ::= ( PLUS | MINUS )* ( PreIncrementExpression | PreDecrementExpression | UnaryExpressionNotPlusMinus )
PreIncrementExpression
         ::= INCREMENT UnaryExpression
PreDecrementExpression
         ::= DECREMENT UnaryExpression
UnaryExpressionNotPlusMinus
         ::= PostfixExpression
           | ( NEGATION | EXCLAMATION ) UnaryExpression
           | CastExpression
CastExpression
         ::= PAR_OPEN ( PrimitiveType Dims? PAR_CLOSE UnaryExpression | ( Expression | Name Dims ) PAR_CLOSE UnaryExpressionNotPlusMinus )
MultiplicativeExpression
         ::= UnaryExpression ( ( MULTIPLY | DIVIDE | MODULO ) UnaryExpression )*
AdditiveExpression
         ::= MultiplicativeExpression ( ( PLUS | MINUS ) MultiplicativeExpression )*
ShiftExpression
         ::= AdditiveExpression ( ( LEFT_SHIFT | RIGHT_SHIFT | UNSIGNED_RIGHT_SHIFT ) AdditiveExpression )*
RelationalExpression
         ::= ShiftExpression ( ( LESS_THAN | GREATER_THAN | LESS_THAN_OR_EQUAL_TO | GREATER_THAN_OR_EQUAL_TO ) ShiftExpression | "instanceof" ReferenceType )*
EqualityExpression
         ::= RelationalExpression ( ( EQUAL_TO | NOT_EQUAL_TO ) RelationalExpression )*
AndExpression
         ::= EqualityExpression ( BITWISE_AND EqualityExpression )*
ExclusiveOrExpression
         ::= AndExpression ( BITWISE_XOR AndExpression )*
InclusiveOrExpression
         ::= ExclusiveOrExpression ( BITWISE_OR ExclusiveOrExpression )*
ConditionalAndExpression
         ::= InclusiveOrExpression ( AND_LOGICAL InclusiveOrExpression )*
ConditionalOrExpression
         ::= ConditionalAndExpression ( OR_LOGICAL ConditionalAndExpression )*
AssignmentExpression
         ::= ConditionalOrExpression ( QUESTION Expression COLON ConditionalOrExpression )*
           | Assignment
Assignment
         ::= LeftHandSide AssignmentOperator AssignmentExpression
LeftHandSide
         ::= Name
           | FieldAccess
           | ArrayAccess
AssignmentOperator
         ::= ASSIGN
           | MULTIPLY_ASSIGN
           | DIVIDE_ASSIGN
           | MOD_ASSIGN
           | PLUS_ASSIGN
           | MINUS_ASSIGN
           | LEFT_SHIFT_ASSIGN
           | RIGHT_SHIFT_ASSIGN
           | UNSIGNED_RIGHT_SHIFT_ASSIGN
           | AND_ASSIGN
           | XOR_ASSIGN
           | OR_ASSIGN
Expression
         ::= AssignmentExpression
ConstantExpression
         ::= Expression

Ignorable
         ::= WhiteSpace
           | Comment
          /* ws: definition */

<?TOKENS?>

KEYWORDS ::=
	"abstract"
	| "boolean"
	| "break"
	| "byte"
	| "case"
	| "catch"
	| "char"
	| "class"
	| "continue"
	| "default"
	| "do"
	| "double"
	| "else"
	| "extends"
	| "final"
	| "finally"
	| "float"
	| "for"
	| "if"
	| "implements"
	| "import"
	| "instanceof"
	| "int"
	| "interface"
	| "long"
	| "native"
	| "new"
	| "null"
	| "package"
	| "private"
	| "protected"
	| "public"
	| "return"
	| "short"
	| "static"
	| "String"
	| "super"
	| "switch"
	| "synchronized"
	| "this"
	| "throw"
	| "throws"
	| "transient"
	| "try"
	| "void"
	| "volatile"
	| "while"
	| "true"|"false"

IDENTIFIER ::= ([A-Za-z_][A-Za-z0-9_]+) - KEYWORDS

INTEGER_LITERAL ::= [0-9]+
FLOATING_POINT_LITERAL ::= [0-9]+"."[0-9]+
CHARACTER_LITERAL ::= "'"("\".|[^'\r\n\])"'"
STRING_LITERAL ::= '"'(("\".|[^"\r\n\]))*'"'

LF       ::= #x000A
CR       ::= #x000D
LINE_TERMINATOR ::=
	LF
	| CR
	| CR LF

TEXT_BLOCK_LITERAL ::= '"""'(([ \h\t\f]*)LINE_TERMINATOR(([^"\]|('"'[^"\])|('""'[^"\]))*))'"""'

NOT_EQUAL_TO ::= "!="
DOT ::= "."
COMMA ::= ","
MULTIPLY ::= "*"
SQUARE_OPEN ::= "["
SQUARE_CLOSE ::= "]"
PAR_OPEN ::= "("
PAR_CLOSE ::= ")"
CURLY_OPEN ::= "{"
CURLY_CLOSE ::= "}"
QUESTION ::= "?"
SEMICOLON ::= ";"
COLON ::= ":"
INCREMENT ::= "++"
DECREMENT ::= "--"
EXCLAMATION ::= "!"
NEGATION ::= "~"
PLUS ::= "+"
MINUS ::= "-"
DIVIDE ::= "/"
MODULO ::= "%"
BITWISE_AND ::= "&"
BITWISE_OR ::= "|"
BITWISE_XOR ::= "^"
AND_LOGICAL ::= "&&"
OR_LOGICAL ::= "||"
UNSIGNED_RIGHT_SHIFT ::= ">>>"
LEFT_SHIFT ::= "<<"
RIGHT_SHIFT ::= ">>"
LESS_THAN ::= "<"
GREATER_THAN ::= ">"
LESS_THAN_OR_EQUAL_TO ::= "<="
GREATER_THAN_OR_EQUAL_TO ::= ">="
EQUAL_TO ::= "=="
ASSIGN ::= "="
PLUS_ASSIGN ::= "+="
MINUS_ASSIGN ::= "-="
MULTIPLY_ASSIGN ::= "*="
DIVIDE_ASSIGN ::= "/="
MOD_ASSIGN ::= "%="
AND_ASSIGN ::= "&="
OR_ASSIGN ::= "|="
XOR_ASSIGN ::= "^="
LEFT_SHIFT_ASSIGN ::= "<<="
RIGHT_SHIFT_ASSIGN ::= ">>="
UNSIGNED_RIGHT_SHIFT_ASSIGN ::= ">>>="

WhiteSpace
         ::= ' '
           | #x09
           | #x0C
           | LINE_TERMINATOR
Comment  ::= TraditionalComment
           | EndOfLineComment
TraditionalComment
         ::= '/*' [^*] '*/'
EndOfLineComment
         ::= '//' [^#x0A]*

EOF      ::= $

@mingodad
Copy link
Author

The ebnf-convert doesn't seem to taking in account the -f none -r none -noinline see bellow:

ebnf-convert -f none -r none -noinline ebnf-convert-bug.y
/* converted on Mon Dec 23, 2024, 10:37 (UTC+01) by bison-to-w3c v0.68-SNAPSHOT which is Copyright (c) 2011-2024 by Gunther Rademacher <grd@gmx.net> */

ArrayInitializer
         ::= CURLY_OPEN VariableInitializers COMMA CURLY_CLOSE
           | CURLY_OPEN VariableInitializers CURLY_CLOSE
           | CURLY_OPEN COMMA CURLY_CLOSE
           | CURLY_OPEN CURLY_CLOSE
VariableInitializers
         ::= VariableInitializer ( COMMA VariableInitializer )*
VariableInitializer
         ::= NUMBER

The input grammar :

//ebnf-convert -f none -r none -noinline ebnf-convert-bug.y
%token CURLY_OPEN COMMA CURLY_CLOSE NUMBER

%%

ArrayInitializer :
	CURLY_OPEN VariableInitializers COMMA CURLY_CLOSE
	| CURLY_OPEN VariableInitializers CURLY_CLOSE
	| CURLY_OPEN COMMA CURLY_CLOSE
	| CURLY_OPEN CURLY_CLOSE
	;

VariableInitializers :
	VariableInitializer
	| VariableInitializers COMMA VariableInitializer
	;

VariableInitializer :
	NUMBER
	;
%%

@GuntherRademacher
Copy link
Owner

Alternatively, for getting it LALR(1), you could completely drop the offending list building rules, like so:

...
ArrayInitializer
         ::= CURLY_OPEN ( VariableInitializer ( COMMA VariableInitializer )* )? COMMA? CURLY_CLOSE
...
SwitchBlock
         ::= CURLY_OPEN SwitchBlockStatementGroup* SwitchLabels? CURLY_CLOSE
...
ArrayCreationExpression
         ::= "new" ( PrimitiveType | ClassOrInterfaceType ) ( DimExpr+ Dims? | Dims ArrayInitializer )

Which is what ebnf-convert could have done, but apparently doesn't. I will have a look at that one.

@mingodad
Copy link
Author

I've done another experiment with a grammar converted with ebnf-convert and fixed the reported shift/reduce conflicts by replacing the repetition operator and the yacc/bison grammar manually converted (remove the ';' at the end of a rule, replacing : by ::= and in-lining the keywords) and the parser generated from the ebnf-converter although it generates bigger parser source code it does parse faster in both Javascript and C++ (see attached).
Java_Compiler-rex.zip

@mingodad
Copy link
Author

mingodad commented Dec 23, 2024

Thank you again for reply !
That would be nice if ebnf-convert could in cases like the ones mentioned here automatically inline then as you've pointed out and the resulting ebnf for lalr(1) grammars without conflict be usable by rex.

Also would be nice to have rex using the bison grammar as one of it's input format and accept %right, %left, %nonassoc and %precedence with that input format to be able to use already existing grammars to generate parsers with rex (here https://mingodad.github.io/parsertl-playground/playground/ I've got more than 250 yacc/bison grammars that would be nice to use to compare/test/debug with rex).

@mingodad
Copy link
Author

The last changes to ebnf-convert now generates an EBNF accepted by rex.
But it seems that still there is some issues like with this grammar (from https://github.com/mithrandie/csvq/blob/a76c52e1ee576566abfd93a39b04eb33ad163410/lib/parser/parser.y):

//csvq-parser.y: warning: 222 shift/reduce conflicts [-Wconflicts-sr]
//csvq-parser.y: warning: 31 reduce/reduce conflicts [-Wconflicts-rr]


/*Tokens*/
%token IDENTIFIER
%token STRING
%token INTEGER
%token FLOAT
%token BOOLEAN
%token TERNARY
%token DATETIME
%token VARIABLE
%token FLAG
%token ENVIRONMENT_VARIABLE
%token RUNTIME_INFORMATION
%token EXTERNAL_COMMAND
%token PLACEHOLDER
%token CONSTANT
%token TABLE_FUNCTION
%token URL
%token SELECT
%token FROM
%token UPDATE
%token SET
%token UNSET
%token DELETE
%token WHERE
%token INSERT
%token INTO
%token VALUES
%token REPLACE
%token AS
%token DUAL
%token STDIN
%token RECURSIVE
%token CREATE
%token ADD
%token DROP
%token ALTER
%token TABLE
%token FIRST
%token LAST
%token AFTER
%token BEFORE
%token DEFAULT
%token RENAME
%token TO
%token VIEW
%token ORDER
%token GROUP
%token HAVING
%token BY
%token ASC
%token DESC
%token LIMIT
%token OFFSET
%token PERCENT
%token JOIN
%token INNER
%token OUTER
%token LEFT
%token RIGHT
%token FULL
%token CROSS
%token ON
%token USING
%token NATURAL
%token LATERAL
%token UNION
%token INTERSECT
%token EXCEPT
%token ALL
%token ANY
%token EXISTS
%token IN
%token AND
%token OR
%token NOT
%token BETWEEN
%token LIKE
%token IS
%token NULL
%token DISTINCT
%token WITH
%token RANGE
%token UNBOUNDED
%token PRECEDING
%token FOLLOWING
%token CURRENT
%token ROW
%token CASE
%token IF
%token ELSEIF
%token WHILE
%token WHEN
%token THEN
%token ELSE
%token DO
%token END
%token DECLARE
%token CURSOR
%token FOR
%token FETCH
%token OPEN
%token CLOSE
%token DISPOSE
%token PREPARE
%token NEXT
%token PRIOR
%token ABSOLUTE
%token RELATIVE
%token SEPARATOR
%token PARTITION
%token OVER
%token COMMIT
%token ROLLBACK
%token CONTINUE
%token BREAK
%token EXIT
%token ECHO
%token PRINT
%token PRINTF
%token SOURCE
%token EXECUTE
%token CHDIR
%token PWD
%token RELOAD
%token REMOVE
%token SYNTAX
%token TRIGGER
%token FUNCTION
%token AGGREGATE
%token BEGIN
%token RETURN
%token IGNORE
%token WITHIN
%token VAR
%token SHOW
%token TIES
%token NULLS
%token ROWS
%token ONLY
%token CSV
%token JSON
%token JSONL
%token FIXED
%token LTSV
%token CSV_INLINE
%token JSON_INLINE
%token JSON_TABLE
%token JSON_ROW
%token SUBSTRING
%token COUNT
%token JSON_OBJECT
%token AGGREGATE_FUNCTION
%token LIST_FUNCTION
%token ANALYTIC_FUNCTION
%token FUNCTION_NTH
%token FUNCTION_WITH_INS
%token COMPARISON_OP
%token STRING_OP
%token SUBSTITUTION_OP
%token UMINUS
%token UPLUS
%token ';'
%token '='
%token '-'
%token '+'
%token '*'
%token '/'
%token '%'
%token '!'
%token '('
%token ')'
%token ','
%token '.'

%right /*1*/ SUBSTITUTION_OP
%left /*2*/ UNION EXCEPT
%left /*3*/ INTERSECT
%left /*4*/ JOIN FULL CROSS NATURAL
%left /*5*/ OR
%left /*6*/ AND
%right /*7*/ NOT
%nonassoc /*8*/ IN BETWEEN LIKE IS COMPARISON_OP '='
%left /*9*/ STRING_OP
%left /*10*/ '-' '+'
%left /*11*/ '*' '/' '%'
%right /*12*/ UMINUS UPLUS '!'

%start program

%%

program :
	/*empty*/
	| procedure_statement
	| procedure_statement ';' program
	;

loop_program :
	/*empty*/
	| loop_statement ';' loop_program
	;

function_program :
	/*empty*/
	| function_statement ';' function_program
	;

function_loop_program :
	/*empty*/
	| function_loop_statement ';' function_loop_program
	;

common_statement :
	select_query
	| select_into_query
	| insert_query
	| update_query
	| replace_query
	| delete_query
	| table_operation_statement
	| variable_statement
	| environment_variable_statement
	| cursor_statement
	| temporary_table_statement
	| prepared_statement
	| user_defined_function_statement
	| transaction_statement
	| command_statement
	| trigger_statement
	| substantial_value
	| EXTERNAL_COMMAND
	;

common_loop_flow_control_statement :
	CONTINUE
	| BREAK
	;

procedure_statement :
	common_statement
	| flow_control_statement
	;

while_statement :
	WHILE substantial_value DO loop_program END WHILE
	| WHILE variable IN /*8N*/ identifier DO loop_program END WHILE
	| WHILE variables IN /*8N*/ identifier DO loop_program END WHILE
	| WHILE while_variable_declaration variable IN /*8N*/ identifier DO loop_program END WHILE
	| WHILE while_variable_declaration variables IN /*8N*/ identifier DO loop_program END WHILE
	;

while_variable_declaration :
	VAR
	| DECLARE
	;

exit_statement :
	EXIT
	| EXIT INTEGER
	;

loop_statement :
	common_statement
	| loop_flow_control_statement
	;

flow_control_statement :
	IF substantial_value THEN program else END IF
	| IF substantial_value THEN program elseif else END IF
	| CASE case_value case_when case_else END CASE
	| while_statement
	| exit_statement
	;

loop_flow_control_statement :
	IF substantial_value THEN loop_program in_loop_else END IF
	| IF substantial_value THEN loop_program in_loop_elseif in_loop_else END IF
	| CASE case_value in_loop_case_when in_loop_case_else END CASE
	| while_statement
	| exit_statement
	| common_loop_flow_control_statement
	;

function_statement :
	common_statement
	| function_flow_control_statement
	;

function_while_statement :
	WHILE substantial_value DO function_loop_program END WHILE
	| WHILE variable IN /*8N*/ identifier DO function_loop_program END WHILE
	| WHILE variables IN /*8N*/ identifier DO function_loop_program END WHILE
	| WHILE while_variable_declaration variable IN /*8N*/ identifier DO function_loop_program END WHILE
	| WHILE while_variable_declaration variables IN /*8N*/ identifier DO function_loop_program END WHILE
	;

function_exit_statement :
	RETURN
	| RETURN substantial_value
	;

function_loop_statement :
	common_statement
	| function_loop_flow_control_statement
	;

function_flow_control_statement :
	IF substantial_value THEN function_program in_function_else END IF
	| IF substantial_value THEN function_program in_function_elseif in_function_else END IF
	| CASE case_value in_function_case_when in_function_case_else END CASE
	| function_while_statement
	| function_exit_statement
	;

function_loop_flow_control_statement :
	IF substantial_value THEN function_loop_program in_function_in_loop_else END IF
	| IF substantial_value THEN function_loop_program in_function_in_loop_elseif in_function_in_loop_else END IF
	| CASE case_value in_function_in_loop_case_when in_function_in_loop_case_else END CASE
	| function_while_statement
	| function_exit_statement
	| common_loop_flow_control_statement
	;

variable_statement :
	VAR variable_assignments
	| DECLARE variable_assignments
	| variable_substitution
	| DISPOSE variable
	;

environment_variable_statement :
	SET environment_variable '=' /*8N*/ substantial_value
	| SET environment_variable '=' /*8N*/ identifier
	| SET environment_variable TO substantial_value
	| SET environment_variable TO identifier
	| UNSET environment_variable
	;

transaction_statement :
	COMMIT
	| ROLLBACK
	;

table_operation_statement :
	CREATE TABLE if_not_exists identifier '(' identifiers ')'
	| CREATE TABLE if_not_exists identifier '(' identifiers ')' as select_query
	| CREATE TABLE if_not_exists identifier as select_query
	| ALTER TABLE updatable_table_identifier ADD column_default column_position
	| ALTER TABLE updatable_table_identifier ADD '(' column_defaults ')' column_position
	| ALTER TABLE updatable_table_identifier DROP field_reference
	| ALTER TABLE updatable_table_identifier DROP '(' field_references ')'
	| ALTER TABLE updatable_table_identifier RENAME field_reference TO identifier
	| ALTER TABLE updatable_table_identifier SET identifier TO identifier
	| ALTER TABLE updatable_table_identifier SET identifier TO substantial_value
	;

column_default :
	identifier
	| identifier DEFAULT value
	;

column_defaults :
	column_default
	| column_default ',' column_defaults
	;

column_position :
	/*empty*/
	| FIRST
	| LAST
	| AFTER field_reference
	| BEFORE field_reference
	;

cursor_statement :
	DECLARE identifier CURSOR FOR select_query
	| DECLARE identifier CURSOR FOR identifier
	| OPEN identifier
	| OPEN identifier USING replace_values
	| CLOSE identifier
	| DISPOSE CURSOR identifier
	| FETCH fetch_position identifier INTO variables
	;

temporary_table_statement :
	DECLARE identifier VIEW '(' identifiers ')'
	| DECLARE identifier VIEW '(' identifiers ')' AS select_query
	| DECLARE identifier VIEW AS select_query
	| DISPOSE VIEW identifier
	| DISPOSE VIEW STDIN
	;

replace_value :
	substantial_value
	| substantial_value AS identifier
	;

replace_values :
	replace_value
	| replace_value ',' replace_values
	;

prepared_statement :
	PREPARE identifier FROM STRING
	| EXECUTE identifier
	| EXECUTE identifier USING replace_values
	| DISPOSE PREPARE identifier
	;

parameter :
	variable
	;

parameters :
	parameter
	| parameters ',' parameter
	;

optional_parameter :
	variable DEFAULT substantial_value
	;

optional_parameters :
	optional_parameter
	| optional_parameter ',' optional_parameters
	;

function_parameters :
	parameters
	| optional_parameters
	| parameters ',' optional_parameters
	;

user_defined_function_statement :
	DECLARE identifier FUNCTION '(' ')' AS BEGIN function_program END
	| DECLARE identifier FUNCTION '(' function_parameters ')' AS BEGIN function_program END
	| DECLARE identifier AGGREGATE '(' identifier ')' AS BEGIN function_program END
	| DECLARE identifier AGGREGATE '(' identifier ',' function_parameters ')' AS BEGIN function_program END
	| DISPOSE FUNCTION identifier
	;

fetch_position :
	/*empty*/
	| NEXT
	| PRIOR
	| FIRST
	| LAST
	| ABSOLUTE substantial_value
	| RELATIVE substantial_value
	;

cursor_status :
	CURSOR identifier IS /*8N*/ negation OPEN
	| CURSOR identifier IS /*8N*/ negation IN /*8N*/ RANGE
	| CURSOR identifier COUNT
	;

command_statement :
	SET flag '=' /*8N*/ identifier
	| SET flag '=' /*8N*/ substantial_value
	| SET flag TO identifier
	| SET flag TO substantial_value
	| ADD substantial_value TO flag
	| REMOVE substantial_value FROM flag
	| SHOW flag
	| ECHO substantial_value
	| PRINT substantial_value
	| PRINTF substantial_value
	| PRINTF substantial_value ',' substantial_values
	| PRINTF substantial_value USING substantial_values
	| SOURCE identifier
	| SOURCE substantial_value
	| EXECUTE substantial_value
	| EXECUTE substantial_value USING substantial_values
	| SYNTAX
	| SYNTAX values
	| SHOW identifier
	| SHOW identifier FROM updatable_table_identifier
	| CHDIR identifier
	| CHDIR substantial_value
	| PWD
	| RELOAD identifier
	;

trigger_statement :
	TRIGGER identifier
	| TRIGGER identifier substantial_value
	| TRIGGER identifier INTEGER substantial_value
	;

select_query :
	select_entity order_by_clause limit_clause
	| select_entity order_by_clause limit_clause FOR UPDATE
	| with_clause select_entity order_by_clause limit_clause
	| with_clause select_entity order_by_clause limit_clause FOR UPDATE
	;

select_into_query :
	select_clause into_clause from_clause where_clause group_by_clause having_clause order_by_clause limit_clause
	| select_clause into_clause from_clause where_clause group_by_clause having_clause order_by_clause limit_clause FOR UPDATE
	| with_clause select_clause into_clause from_clause where_clause group_by_clause having_clause order_by_clause limit_clause
	| with_clause select_clause into_clause from_clause where_clause group_by_clause having_clause order_by_clause limit_clause FOR UPDATE
	;

select_entity :
	select_clause from_clause where_clause group_by_clause having_clause
	| select_set_entity UNION /*2L*/ all select_set_entity
	| select_set_entity INTERSECT /*3L*/ all select_set_entity
	| select_set_entity EXCEPT /*2L*/ all select_set_entity
	;

select_set_entity :
	select_entity
	| subquery
	;

select_clause :
	SELECT distinct fields
	;

into_clause :
	INTO variables
	;

from_clause :
	/*empty*/
	| FROM tables
	;

where_clause :
	/*empty*/
	| WHERE value
	;

group_by_clause :
	/*empty*/
	| GROUP BY values
	;

having_clause :
	/*empty*/
	| HAVING value
	;

order_by_clause :
	/*empty*/
	| ORDER BY order_items
	;

limit_clause :
	offset_clause
	| offset_clause FETCH limit_fetch_position substantial_value limit_fetch_unit limit_restriction
	| LIMIT substantial_value limit_unit limit_restriction offset_clause
	;

limit_restriction :
	/*empty*/
	| ONLY
	| WITH TIES
	;

limit_fetch_position :
	FIRST
	| NEXT
	;

limit_unit :
	/*empty*/
	| limit_fetch_unit
	;

limit_fetch_unit :
	PERCENT
	| ROW
	| ROWS
	;

offset_unit :
	/*empty*/
	| ROW
	| ROWS
	;

offset_clause :
	/*empty*/
	| OFFSET substantial_value offset_unit
	;

with_clause :
	/*empty*/
	| WITH inline_tables
	;

inline_table :
	recursive identifier AS '(' select_query ')'
	| recursive identifier '(' identifiers ')' AS '(' select_query ')'
	;

inline_tables :
	inline_table
	| inline_table ',' inline_tables
	;

primitive_type :
	STRING
	| INTEGER
	| FLOAT
	| ternary
	| null
	;

ternary :
	TERNARY
	;

null :
	NULL
	;

field_reference :
	identifier
	| identifier '.' identifier
	| STDIN '.' identifier
	| identifier '.' INTEGER
	| STDIN '.' INTEGER
	;

value :
	field_reference
	| substantial_value
	| '(' value ')'
	;

substantial_value :
	primitive_type
	| arithmetic
	| string_operation
	| subquery
	| function
	| aggregate_function
	| analytic_function
	| case_expr
	| comparison
	| logic
	| variable
	| variable_substitution
	| environment_variable
	| runtime_information
	| constant
	| flag
	| cursor_status
	| '(' substantial_value ')'
	| PLACEHOLDER
	;

wildcard :
	'*' /*11L*/
	;

row_value :
	'(' values ')'
	| subquery
	| JSON_ROW '(' value ',' value ')'
	;

row_values :
	row_value
	| row_value ',' row_values
	;

order_items :
	order_item
	| order_item ',' order_items
	;

order_item :
	value order_direction
	| value order_direction NULLS order_null_position
	;

order_direction :
	/*empty*/
	| ASC
	| DESC
	;

order_null_position :
	FIRST
	| LAST
	;

subquery :
	'(' select_query ')'
	;

string_operation :
	value STRING_OP /*9L*/ value
	;

matrix_value :
	'(' row_values ')'
	| subquery
	| JSON_ROW '(' value ',' value ')'
	;

comparison :
	value COMPARISON_OP /*8N*/ value
	| row_value COMPARISON_OP /*8N*/ row_value
	| value '=' /*8N*/ value
	| row_value '=' /*8N*/ row_value
	| value IS /*8N*/ negation ternary
	| value IS /*8N*/ negation null
	| value BETWEEN /*8N*/ value AND /*6L*/ value
	| value NOT /*7R*/ BETWEEN /*8N*/ value AND /*6L*/ value
	| row_value negation BETWEEN /*8N*/ row_value AND /*6L*/ row_value
	| value IN /*8N*/ row_value
	| value NOT /*7R*/ IN /*8N*/ row_value
	| row_value negation IN /*8N*/ matrix_value
	| value LIKE /*8N*/ value
	| value NOT /*7R*/ LIKE /*8N*/ value
	| value comparison_operator ANY row_value
	| row_value comparison_operator ANY matrix_value
	| value comparison_operator ALL row_value
	| row_value comparison_operator ALL matrix_value
	| EXISTS subquery
	;

arithmetic :
	value '+' /*10L*/ value
	| value '-' /*10L*/ value
	| value '*' /*11L*/ value
	| value '/' /*11L*/ value
	| value '%' /*11L*/ value
	| '-' /*10L*/ value %prec UMINUS /*12R*/
	| '+' /*10L*/ value %prec UPLUS /*12R*/
	;

logic :
	value OR /*5L*/ value
	| value AND /*6L*/ value
	| NOT /*7R*/ value
	| '!' /*12R*/ value
	;

arguments :
	/*empty*/
	| values
	;

function :
	identifier '(' arguments ')'
	| SUBSTRING '(' arguments ')'
	| SUBSTRING '(' value FROM value ')'
	| SUBSTRING '(' value FROM value FOR value ')'
	| JSON_OBJECT '(' ')'
	| JSON_OBJECT '(' fields ')'
	| IF '(' arguments ')'
	| REPLACE '(' arguments ')'
	;

aggregate_function :
	identifier '(' distinct arguments ')'
	| AGGREGATE_FUNCTION '(' distinct arguments ')'
	| VAR '(' distinct arguments ')'
	| COUNT '(' distinct arguments ')'
	| COUNT '(' distinct wildcard ')'
	| list_function
	;

list_function :
	LIST_FUNCTION '(' distinct arguments ')'
	| LIST_FUNCTION '(' distinct arguments ')' WITHIN GROUP '(' order_by_clause ')'
	;

analytic_function :
	identifier '(' arguments ')' OVER '(' analytic_clause_with_windowing ')'
	| identifier '(' distinct arguments ')' OVER '(' analytic_clause_with_windowing ')'
	| AGGREGATE_FUNCTION '(' distinct arguments ')' OVER '(' analytic_clause_with_windowing ')'
	| VAR '(' distinct arguments ')' OVER '(' analytic_clause_with_windowing ')'
	| COUNT '(' distinct arguments ')' OVER '(' analytic_clause_with_windowing ')'
	| COUNT '(' distinct wildcard ')' OVER '(' analytic_clause_with_windowing ')'
	| LIST_FUNCTION '(' distinct arguments ')' OVER '(' analytic_clause ')'
	| ANALYTIC_FUNCTION '(' arguments ')' OVER '(' analytic_clause ')'
	| FUNCTION_NTH '(' arguments ')' OVER '(' analytic_clause_with_windowing ')'
	| FUNCTION_NTH '(' arguments ')' IGNORE NULLS OVER '(' analytic_clause_with_windowing ')'
	| FUNCTION_WITH_INS '(' arguments ')' OVER '(' analytic_clause ')'
	| FUNCTION_WITH_INS '(' arguments ')' IGNORE NULLS OVER '(' analytic_clause ')'
	;

analytic_clause :
	partition_clause order_by_clause
	;

analytic_clause_with_windowing :
	analytic_clause
	| partition_clause ORDER BY order_items windowing_clause
	;

partition_clause :
	/*empty*/
	| PARTITION BY values
	;

windowing_clause :
	ROWS window_position
	| ROWS BETWEEN /*8N*/ window_frame_low AND /*6L*/ window_frame_high
	;

window_position :
	UNBOUNDED PRECEDING
	| INTEGER PRECEDING
	| CURRENT ROW
	;

window_relative_position :
	INTEGER PRECEDING
	| INTEGER FOLLOWING
	| CURRENT ROW
	;

window_frame_low :
	UNBOUNDED PRECEDING
	| window_relative_position
	;

window_frame_high :
	UNBOUNDED FOLLOWING
	| window_relative_position
	;

table_identifier :
	identifier
	| URL
	| TABLE_FUNCTION '(' arguments ')'
	| STDIN
	;

table_format :
	CSV
	| JSON
	| JSONL
	| FIXED
	| LTSV
	;

inline_table_format :
	CSV_INLINE
	| JSON_INLINE
	| JSON_TABLE
	;

format_specified_function :
	table_format '(' table_identifier ')'
	| table_format '(' table_identifier ',' arguments ')'
	| table_format '(' substantial_value ',' table_identifier ')'
	| table_format '(' substantial_value ',' table_identifier ',' arguments ')'
	;

inline_format_specified_function :
	inline_table_format '(' substantial_value ',' identifier ')'
	| inline_table_format '(' substantial_value ',' identifier ',' arguments ')'
	| inline_table_format '(' substantial_value ',' substantial_value ')'
	| inline_table_format '(' substantial_value ',' substantial_value ',' arguments ')'
	;

updatable_table_identifier :
	table_identifier
	| format_specified_function
	;

table_object :
	updatable_table_identifier
	| inline_format_specified_function
	;

laterable_query_table :
	subquery
	| subquery identifier
	| subquery AS identifier
	;

joinable_tables :
	table
	| LATERAL laterable_query_table
	| laterable_query_table ',' joinable_tables
	| LATERAL laterable_query_table ',' joinable_tables
	;

table :
	table_object
	| table_object identifier
	| table_object AS identifier
	| join
	| DUAL
	| laterable_query_table
	| '(' table ')'
	;

join :
	table CROSS /*4L*/ JOIN /*4L*/ table
	| table join_type_inner JOIN /*4L*/ table join_condition
	| table join_outer_direction join_type_outer JOIN /*4L*/ table join_condition
	| table NATURAL /*4L*/ join_type_inner JOIN /*4L*/ table
	| table NATURAL /*4L*/ join_outer_direction join_type_outer JOIN /*4L*/ table
	| table CROSS /*4L*/ JOIN /*4L*/ LATERAL laterable_query_table
	| table join_type_inner JOIN /*4L*/ LATERAL laterable_query_table join_condition
	| table join_outer_direction join_type_outer JOIN /*4L*/ LATERAL laterable_query_table join_condition
	| table NATURAL /*4L*/ join_type_inner JOIN /*4L*/ LATERAL laterable_query_table
	| table NATURAL /*4L*/ join_outer_direction join_type_outer JOIN /*4L*/ LATERAL laterable_query_table
	;

join_condition :
	ON value
	| USING '(' identifiers ')'
	;

field :
	value
	| value AS identifier
	| wildcard
	| identifier '.' wildcard
	;

case_expr :
	CASE case_value case_expr_when case_expr_else END
	;

case_value :
	/*empty*/
	| value
	;

case_expr_when :
	WHEN value THEN value
	| WHEN value THEN value case_expr_when
	;

case_expr_else :
	/*empty*/
	| ELSE value
	;

field_references :
	field_reference
	| field_reference ',' field_references
	;

values :
	value
	| value ',' values
	;

substantial_values :
	substantial_value
	| substantial_value ',' substantial_values
	;

tables :
	table
	| table ',' joinable_tables
	;

identified_tables :
	table_identifier
	| table_identifier ',' identified_tables
	;

updatable_tables :
	updatable_table_identifier
	| updatable_table_identifier ',' updatable_tables
	;

identifiers :
	identifier
	| identifier ',' identifiers
	;

fields :
	field
	| field ',' fields
	;

insert_query :
	with_clause INSERT INTO updatable_table_identifier VALUES row_values
	| with_clause INSERT INTO updatable_table_identifier '(' field_references ')' VALUES row_values
	| with_clause INSERT INTO updatable_table_identifier select_query
	| with_clause INSERT INTO updatable_table_identifier '(' field_references ')' select_query
	;

update_query :
	with_clause UPDATE updatable_tables SET update_set_list from_clause where_clause
	;

update_set :
	field_reference '=' /*8N*/ value
	;

update_set_list :
	update_set
	| update_set ',' update_set_list
	;

replace_query :
	with_clause REPLACE INTO updatable_table_identifier USING '(' field_references ')' VALUES row_values
	| with_clause REPLACE INTO updatable_table_identifier '(' field_references ')' USING '(' field_references ')' VALUES row_values
	| with_clause REPLACE INTO updatable_table_identifier USING '(' field_references ')' select_query
	| with_clause REPLACE INTO updatable_table_identifier '(' field_references ')' USING '(' field_references ')' select_query
	| REPLACE INTO updatable_table_identifier USING '(' field_references ')' VALUES row_values
	| REPLACE INTO updatable_table_identifier '(' field_references ')' USING '(' field_references ')' VALUES row_values
	| REPLACE INTO updatable_table_identifier USING '(' field_references ')' select_query
	| REPLACE INTO updatable_table_identifier '(' field_references ')' USING '(' field_references ')' select_query
	;

delete_query :
	with_clause DELETE FROM tables where_clause
	| with_clause DELETE identified_tables FROM tables where_clause
	;

elseif :
	ELSEIF substantial_value THEN program
	| ELSEIF substantial_value THEN program elseif
	;

else :
	/*empty*/
	| ELSE program
	;

in_loop_elseif :
	ELSEIF substantial_value THEN loop_program
	| ELSEIF substantial_value THEN loop_program in_loop_elseif
	;

in_loop_else :
	/*empty*/
	| ELSE loop_program
	;

in_function_elseif :
	ELSEIF substantial_value THEN function_program
	| ELSEIF substantial_value THEN function_program in_function_elseif
	;

in_function_else :
	/*empty*/
	| ELSE function_program
	;

in_function_in_loop_elseif :
	ELSEIF substantial_value THEN function_loop_program
	| ELSEIF substantial_value THEN function_loop_program in_function_in_loop_elseif
	;

in_function_in_loop_else :
	/*empty*/
	| ELSE function_loop_program
	;

case_when :
	WHEN substantial_value THEN program
	| WHEN substantial_value THEN program case_when
	;

case_else :
	/*empty*/
	| ELSE program
	;

in_loop_case_when :
	WHEN substantial_value THEN loop_program
	| WHEN substantial_value THEN loop_program in_loop_case_when
	;

in_loop_case_else :
	/*empty*/
	| ELSE loop_program
	;

in_function_case_when :
	WHEN substantial_value THEN function_program
	| WHEN substantial_value THEN function_program in_function_case_when
	;

in_function_case_else :
	/*empty*/
	| ELSE function_program
	;

in_function_in_loop_case_when :
	WHEN substantial_value THEN function_loop_program
	| WHEN substantial_value THEN function_loop_program in_function_in_loop_case_when
	;

in_function_in_loop_case_else :
	/*empty*/
	| ELSE function_loop_program
	;

identifier :
	IDENTIFIER
	| TIES
	| NULLS
	| ROWS
	| CSV
	| JSON
	| JSONL
	| FIXED
	| LTSV
	;

variable :
	VARIABLE
	;

variables :
	variable
	| variable ',' variables
	;

variable_substitution :
	variable SUBSTITUTION_OP /*1R*/ value
	;

variable_assignment :
	variable
	| variable SUBSTITUTION_OP /*1R*/ value
	;

variable_assignments :
	variable_assignment
	| variable_assignment ',' variable_assignments
	;

environment_variable :
	ENVIRONMENT_VARIABLE
	;

runtime_information :
	RUNTIME_INFORMATION
	;

constant :
	CONSTANT
	;

flag :
	FLAG
	;

distinct :
	/*empty*/
	| DISTINCT
	;

negation :
	/*empty*/
	| NOT /*7R*/
	;

join_type_inner :
	/*empty*/
	| INNER
	;

join_type_outer :
	/*empty*/
	| OUTER
	;

join_outer_direction :
	LEFT
	| RIGHT
	| FULL /*4L*/
	;

all :
	/*empty*/
	| ALL
	;

recursive :
	/*empty*/
	| RECURSIVE
	;

as :
	/*empty*/
	| AS
	;

comparison_operator :
	COMPARISON_OP /*8N*/
	| '=' /*8N*/
	;

if_not_exists :
	/*empty*/
	| IF NOT /*7R*/ EXISTS
	;

When EBNF generated by ebnf-convert is passed to rex we get this error:

rex -lalr 1  -java -nolexer csvq-parser.ebnf
duplicate definition: END

This already happened with other grammars and after renaming END to xEND then:

rex -lalr 1  -java -nolexer csvq-parser.ebnf
start symbol: program
ambiguous: optional sequence derives to empty in production select_query

The error message is a bit vague because it doesn't tell the name of the optional sequence and line/column, but on this case it's not difficult to find it because there is only one (with_clause):

select_query
         ::= with_clause? select_entity order_by_clause limit_clause ( FOR UPDATE )?

Here ebnf-convert could check if the nonterminal with_clause is already an optional rule and do not add the ?.
But for the purpose of trying the grammar to be accepted by rex I've removed the unneeded ? then I'm getting this error:

rex -lalr 1  -java -nolexer csvq-parser.ebnf
start symbol: program
ambiguous: optional sequence derives to empty in production analytic_function

On this one it's a bit hard to find witch one of the optional sequence is causing trouble:

analytic_function
         ::= ( ( ( ( identifier '(' distinct? | ( AGGREGATE_FUNCTION | VAR ) '(' distinct ) arguments | COUNT '(' distinct ( arguments | '*' ) ) ')' | FUNCTION_NTH '(' arguments ')' ( IGNORE NULLS )? ) OVER '(' analytic_clause_with_windowing | ( ( LIST_FUNCTION '(' distinct | ANALYTIC_FUNCTION '(' ) arguments ')' | FUNCTION_WITH_INS '(' arguments ')' ( IGNORE NULLS )? ) OVER '(' analytic_clause ) ')'

@mingodad
Copy link
Author

After writing the last comment I found the optional sequence (distinct) so the issue it's like the one pointed out for with_clause that ebnf-convert could detect and not emit the unneeded ?.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants