C# Regex to remove C style comments and extract text between brackets

question:

I need to automatically extract all the name properties from this JavaScript (separately for providers large and providers small)

/*
    Simple OpenID Plugin
    http://code.google.com/p/openid-selector/

    This code is licensed under the New BSD License.
*/

var providers_large = {
    google : {
        name : 'Google',
        url : 'https://www.google.com/accounts/o8/id'
    },
    yahoo : {
        name : 'Yahoo',
        url : 'http://me.yahoo.com/'
    },
    aol : {
        name : 'AOL',
        label : 'Enter your AOL screenname.',
        url : 'http://openid.aol.com/{username}'
    },
    myopenid : {
        name : 'MyOpenID',
        label : 'Enter your MyOpenID username.',
        url : 'http://{username}.myopenid.com/'
    },
    openid : {
        name : 'OpenID',
        label : 'Enter your OpenID.',
        url : null
    }
};

var providers_small = {
    livejournal : {
        name : 'LiveJournal',
        label : 'Enter your Livejournal username.',
        url : 'http://{username}.livejournal.com/'
    },
    /* flickr: {
        name: 'Flickr',        
        label: 'Enter your Flickr username.',
        url: 'http://flickr.com/{username}/'
    }, */
    /* technorati: {
        name: 'Technorati',
        label: 'Enter your Technorati username.',
        url: 'http://technorati.com/people/technorati/{username}/'
    }, */
    wordpress : {
        name : 'Wordpress',
        label : 'Enter your Wordpress.com username.',
        url : 'http://{username}.wordpress.com/'
    },
    blogger : {
        name : 'Blogger',
        label : 'Your Blogger account',
        url : 'http://{username}.blogspot.com/'
    },
    verisign : {
        name : 'Verisign',
        label : 'Your Verisign username',
        url : 'http://{username}.pip.verisignlabs.com/'
    },
    /* vidoop: {
        name: 'Vidoop',
        label: 'Your Vidoop username',
        url: 'http://{username}.myvidoop.com/'
    }, */
    /* launchpad: {
        name: 'Launchpad',
        label: 'Your Launchpad username',
        url: 'https://launchpad.net/~{username}'
    }, */
    claimid : {
        name : 'ClaimID',
        label : 'Your ClaimID username',
        url : 'http://claimid.com/{username}'
    },
    clickpass : {
        name : 'ClickPass',
        label : 'Enter your ClickPass username',
        url : 'http://clickpass.com/public/{username}'
    },
    google_profile : {
        name : 'Google Profile',
        label : 'Enter your Google Profile username',
        url : 'http://www.google.com/profiles/{username}'
    }
};

openid.locale = 'en';
openid.sprite = 'en'; // reused in german& japan localization
openid.demo_text = 'In client demo mode. Normally would have submitted OpenID:';
openid.signin_text = 'Sign-In';
openid.image_title = 'log in with {provider}';

So I need to: A) Remove all the C-Style comments and B) Get all the name values for [providers_large, providers_small] (after the comments have been removed)

So far I have tried regex to remove C-Style comments (and failed) and regex to get everything between curly braces (and failed)

I subsequently tried to read it in as JSON, but this of course failed with "invalid json primitve whatever"

This are the stackoverflow-sites I uses and this are my examples I tried so far

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;


namespace ConsoleExperiments
{

    public class Program
    {

        // http://stackoverflow.com/questions/2538279/strip-out-c-style-multi-line-comments
        // NOT working
        static string RemoveCstyleComments(string strInput)
        {
            string strPattern = @"/[*][\w\d\s]+[*]/";
            //strPattern = @"/\*.*?\*/";
            strPattern = "/\\*.*?\\*/";

            string strOutput = System.Text.RegularExpressions.Regex.Replace(strInput, strPattern, string.Empty, System.Text.RegularExpressions.RegexOptions.Multiline);
            Console.WriteLine(strOutput);
            return strOutput;
        }


        // http://stackoverflow.com/questions/413071/regex-to-get-string-between-curly-braces-i-want-whats-between-the-curly-brace
        // http://stackoverflow.com/questions/5337166/regular-expression-get-string-between-curly-braces
        // http://stackoverflow.com/questions/1904617/regex-for-removing-curly-brackets-with-nested-curly-brackets
        // http://stackoverflow.com/questions/378415/how-do-i-extract-a-string-of-text-that-lies-between-two-brackets-using-net
        static string GetCurlyValues(string strInput)
        {
            string strPattern = "/{(.*?)}/";
            strPattern = "/{([^}]*)}/";
            strPattern = @"\{(\s*?.*?)*?\}";
            strPattern = @"(?<=\{).*(?=\})";
            strPattern = "{(.*{(.*)}.*)}";
            strPattern = "{{([^}]*)}}";
            strPattern = "{{({?}?[^{}])*}}";
            strPattern = @"\(([^)]*)\)";

            System.Text.RegularExpressions.Regex rex = new System.Text.RegularExpressions.Regex(strPattern, System.Text.RegularExpressions.RegexOptions.Multiline);

            System.Text.RegularExpressions.Match mMatch = rex.Match(strInput);

            foreach (System.Text.RegularExpressions.Group g in mMatch.Groups)
            {
                Console.WriteLine("Group: " + g.Value);
                foreach (System.Text.RegularExpressions.Capture c in g.Captures)
                {
                    Console.WriteLine("Capture: " + c.Value);
                }
            }

            return "";
        }


        static void ReadFile()
        {
            try
            {
                string strFilePath = @"TestFile.txt";
                if (System.IO.File.Exists(strFilePath))
                {
                    // Create an instance of StreamReader to read from a file.
                    // The using statement also closes the StreamReader.
                    using (System.IO.StreamReader sr = new System.IO.StreamReader(strFilePath))
                    {
                        string line;
                        // Read and display lines from the file until the end of
                        // the file is reached.
                        while ((line = sr.ReadLine()) != null)
                        {
                            Console.WriteLine(line);
                        } // Whend

                        sr.Close();
                    } // End Using

                } // End if (System.IO.File.Exists(strFilePath))
                else
                    Console.WriteLine("File \"" + strFilePath + "\" does not exist.");
            } // End Try
            catch (Exception e)
            {
                // Let the user know what went wrong.
                Console.WriteLine("The file could not be read:");
                Console.WriteLine(e.Message);
            } // End Catch

        } // End Sub

        public class cProvider
        {
            public string name = "abc";
            public string label ="def";
            public string url ="url";
        }


        public class cProviders_large
        {
            public List<cProvider> foo = new List<cProvider>();
        }


        static void Main(string[] args)
        {
            string strContent = System.IO.File.ReadAllText(@"D:\UserName\Downloads\openid-selector-1.3\openid-selector\js\openid-en - Kopie.js.txt");
            Console.WriteLine(strContent);
            //RemoveCstyleComments(strContent);
            //GetCurlyValues(strContent);
            System.Web.Script.Serialization.JavaScriptSerializer js = new System.Web.Script.Serialization.JavaScriptSerializer();
            //object obj = js.DeserializeObject(strContent);

            cProviders_large xx = new cProviders_large();
            cProvider ap = new cProvider();
            xx.foo.Add(ap);
            xx.foo.Add(ap);

            string res = js.Serialize(xx);
            Console.WriteLine(res);


            Console.WriteLine(Environment.NewLine);
            Console.WriteLine(" --- Press any key to continue --- ");
            Console.ReadKey();
        } // End Sub Main

    } // End Class Program


} // End namespace ConsoleExperiments

Could anybody who understands regex better than me provide me with the necessary regex-expressions ? Right now, it looks like I will end-up doing it by hand every time the file changes, and I really really hate this...

Edit: On a sidenote, the v8 wrapper uses C++.NET, and thus doesn't work on Linux, although the v8 engine does work very well on Linux.

So I'm sticking to solving the problem via JSON conversion.

Answers


You could use a javascript engine:

using System;
using System.IO;
using Noesis.Javascript;

class Program
{
    static void Main()
    {
        var context = new JavascriptContext();
        context.SetParameter("openid", new object());
        context.Run(File.ReadAllText("test.js"));
        dynamic providers_large = context.GetParameter("providers_large");
        foreach (var provider in providers_large)
        {
            Console.WriteLine(
                "name: {0}, url: {1}", 
                provider.Value["name"], 
                provider.Value["url"]
            );
        }
    }
}

prints the following on my console:

name: Google, url: https://www.google.com/accounts/o8/id
name: Yahoo, url: http://me.yahoo.com/
name: AOL, url: http://openid.aol.com/{username}
name: MyOpenID, url: http://{username}.myopenid.com/
name: OpenID, url:

Need Your Help

Webkit bug with `:hover` and multiple adjacent-sibling selectors

css google-chrome safari webkit css-selectors

Safari and Chrome, as well as Opera and Firefox, can handle the :hover pseudo-class and adjacent-sibling selectors:

Why sometimes linking with .so files give linker errors?

linux shared-libraries shared-objects

I am too much curious about the issue that sometimes .so linking fails, but whenever I use static libs (.a) it doesn't. Do anyone have idea on that?

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.