Custom Errors for RegexParsers

May someone help me understand the following behavior: parseAll (parseIf, "If bla blablaa") should result in is expected. Instead I always get string matching regex 'is\b' expected but 'b' found. I guess it has something to do with whitespaces because " If bla is blablaa" (notice the whitespaces at the beginning) results in the same behavior. I tried it with StandardTokenParsers and everything worked fine. But STP unfortunately doesn't support regex. Follow-up question: How would I have to alter RegexParsers so it uses a sequence of Strings instead of a sequence of chars? That would make error reporting a lot more easy.

lazy val parseIf = roleGiverIf ~ giverRole

lazy val roleGiverIf =
  kwIf ~> identifier | failure("""A rule must begin with if""")
lazy val giverRole =
  kwIs ~> identifier | failure("""is expected""")

lazy val keyword =
  kwIf | kwAnd | kwThen | kwOf | kwIs | kwFrom | kwTo

lazy val identifier =
  not(keyword) ~ roleEntityLiteral
// ...

def roleEntityLiteral: Parser[String] =
  """([^"\p{Cntrl}\\]|\\[\\/bfnrt]|\\u[a-fA-F0-9]{4})\S*""".r 
def kwIf: Parser[String] = "If\\b".r
def kwIs: Parser[String] = "is\\b".r

// ...

parseAll(parseIf, "If bla blablaa") match {
  case Success(parseIf, _) => println(parseIf)
  case Failure(msg, _) => println("Failure: " + msg)
  case Error(msg, _) => println("Error: " + msg)

Answers


This problem is very weird. When you call | and both sides are failures, the side where the failure happened last is selected, ties favoring the left-sided one.

When you try to parse directly with giverRole, it produces the result you expect. If you add a successful match before the failure, though, it produces the result you are seeing.

The reason is rather subtle -- I only found it out by sprinkling log statements on all parsers. To understand it, you must understand how does RegexParser skip spaces. Specifically, spaces are skipped on accept. Because failure doesn't call accept, it doesn't skip spaces.

While the failure of kwIs happens on b, as the space as skipped, the failure of failure happens on the space after If. Here:

If bla blablaa
   ^ kwIs fails here
  ^ failure fails here

Therefore, the error message on kwIs gets precedence by the rule I mentioned.

You can get around this problem by making the parser skip the spaces without matching anything. It is important that this pattern always match, or you'll get an even more confusing error message. Here's a suggestion I think works:

"\\b|$".r ~ failure("is expected")

Another solution is to use acceptIf or acceptMatch instead of using the implicit regex accept, in which case you can provide a tailored error message.


Need Your Help

How to make a Node, Express, CSS3, HTML5, MongoDB, Socket.io application run at cordova?

mongodb sockets cordova express

I have a node project where I used: Express, MongoDB, Javascript, CSS3, HTML5 and Socket.io. I can run the server in localhost without problems.

What are A, B, and C in KDIFF merge

version-control merge kdiff3

Why am I given three choices when merging between my code and someone else's?

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.