How can you parse simple C++ typedef instructions?
I'd like to parse simple C++ typedef instructions such as
typedef Class NewNameForClass; typedef Class::InsideTypedef NewNameForTypedef; typedef TemplateClass<Arg1,Arg2> AliasForObject;
I have written the corresponding grammar that i'd like to see used in parsing.
Name <- ('_'|letter)('_'|letter|digit)* Type <- Name Type <- Type::Name Type <- Name Templates Templates <- '<' Type (',' Type)* '>' Instruction <- "typedef" Type Name ';'
Once this is parsed, all i'll want to do is to generate xml with the same information (but layed out differently)
What is the most effective language for writing such a program ? How can you achieve this ?
EDIT : What i have come up with using Boost Spirit (it's not perfect, but it's good enough for me, at least for now)
rule<> sep_p = space_p; rule<> name_p = (ch_p('_')|alpha_p) >> *(ch_p('_')|alpha_p|digit_p); rule<> type_p = name_p >> !(*sep_p >>str_p("::") >> *sep_p>> name_p) >> *(*sep_p >> ch_p('*') ) >> !(*sep_p >> str_p("const")) >> !(*sep_p >> ch_p('&')); rule<> templated_type_p = name_p >> *sep_p >> ch_p('<') >> *sep_p >> (*sep_p>>type_p>>*sep_p)%ch_p(',') >> ch_p('>') >> *sep_p; rule<> typedef_p = *sep_p >> str_p ("typedef") >> +sep_p >> (type_p|templated_type_p) >> +sep_p >> name_p >> *sep_p >> ch_p(';') >> *sep_p; rule<> typedef_list_p = *typedef_p;
I would alter the grammar slightly
ShortName <- ('_'|letter)('_'|letter|digit)* Name <- ShortName Name <- Name::ShortName Type <- Name Type <- Name Templates Templates <- '<' Type (',' Type)* '>' Instruction <- "typedef" Type Name ';'
Also your grammar leaves out the following cases
- Multiple typedef targets.
- Pointer targets
- Function pointers (this is by far the most difficult)
Parsing a grammar (i love the irony) is a fairly straight forward operation. If you wanted to actually use the grammar in a functional way, I would say the best bet is a lex/yacc combination.
But from your question it appears that you want to spit it out to another format. There really isn't a language designed for this so I would say use whatever language you're most comfortable with.
The OP asked about multiple typedef targets. It's perfectly legally for a typedef declaration to have more than 1 target. For Example:
typedef _SomeStruct SomeStruct, *PSomeStruct
This creates 2 typedef names.
- SomeStruct which is equivalent to "struct _SomeStruct"
- PSomeStruct which is equivalent to "struct _SomeStruct*"
Well, since you're apparently already working with/on C++, have you considered using Boost.Spirit? This allows you to hard-code the grammar inline in C++ as a domain-specific language and program against it in normal C++ code.