Regex: how to get words, spaces and punctuation from string

Basically I want to iterate through all sentence, for example:

string sentence = "How was your day - Andrew, Jane?";
string[] separated = SeparateSentence(sentence);

separated output is following:

[1] = "How"

[2] = " "

[3] = "was"

[4] = " "

[5] = "your"

[6] = " "

[7] = "day"

[8] = " "

[9] = "-"

[10] = " "

[11] = "Andrew"

[12] = ","

[13] = " "

[14] = "Jane"

[15] = "?"

As of currently I can only grab words, using "\w(?<!\d)[\w'-]*" Regex. How to separate sentence into smaller parts, according to output example?

Edit: The string doesn't have any of the following:

  • i.e.

  • solid-form

  • 8th, 1st, 2nd

Answers


Check this out:

        string pattern = @"^(\s+|\d+|\w+|[^\d\s\w])+$";
        string input = "How was your 7 day - Andrew, Jane?";

        List<string> words = new List<string>();

        Regex regex = new Regex(pattern);

        if (regex.IsMatch(input))
        {
            Match match = regex.Match(input);

            foreach (Capture capture in match.Groups[1].Captures)
                words.Add(capture.Value);
        }

I suggest you implement a simple lexer (If such a thing exists) that will read the sentence one character at a time and generate the output you are looking for. Although not the simplest solution, it has the advantage of being scalable in case your use cases get more complicated as @AndreCalil suggested.


Why not something like this? It's tailored to your test case, but if you add punctuation this might be what you're looking for.

(\w+|[,-?])

EDIT: Ah, to steal from Andre's response, this is what I was envisioning:

string pattern = @"(\w+|[,-?])";
string input = "How was your 7 day - Andrew, Jane?";

List<string> words = new List<string>();

Regex regex = new Regex(pattern);

if (regex.IsMatch(input))
{
    MatchCollection matches = regex.Matches(input);

    foreach (Match m in matches)
        words.Add(m.Groups[1].Value);
}

Need Your Help

App Ready For Sale, but iAd “receiving test ads”!

iphone iad

My free app was approved on 17 June. And TODAY (22nd June) in my iAd iTC page

Syntax errors in mysql with procedure, triggers and signal

mysql syntax-error

I feel embarrassed that I have to come and ask for help with this, but as surely many before me have learned, it seems mySQL syntax error messages are about as useful as a pope hat on a grizzly bea...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.