In Perl, what is the sane way for converting a string into a list of its characters?

I have been wondering if there's a nicer, but concise way for splitting a string into its characters

@characters = split //, $string

is not that hard to read, but somehow the use of a regular expression looks like overkill to me.

I have come up with this:

@characters = map { substr $string, $_, 1 } 0 .. length($string) - 1

but I find it uglier and less readable. What is your preferred way of splitting that string into its characters?

Answers


Why would using a regular expression be "overkill"? Many worry that regexes in Perl are overkill because they think that running them involves a highly complex and slow regex algorithm. That's not always true: the implementation is highly optimized and many simple cases are treated specially: what looks like a regex may actually perform as well as a simple substring search. I wouldn't be surprised at all if this type of split is optimized as well. split is faster than your map in some tests I ran. unpack appears to be slightly faster than split.

I recommend split because it is the "idiomatic" way. You'll find it in perldoc, in many books, and any good Perl programmer should know it (if you are not sure your audience will understand it, you can always add a comment to the code like someone suggested.)

OTOH, if regexes are "overkill" only because the syntax is ugly, then it's too subjective for me to say anything. ;-)


Various examples, and speed comparisons.

I thought it might be a good idea to see how fast some of the ways are to split a string on every character.

I ran the test against several versions of Perl that I happen to have on my computer.

test.pl
use 5.010;
use Benchmark qw(:all) ;
my %bench = (
   'split' => sub{
     state $string = 'x' x 1000;
     my @chars = split //, $string;
     \@chars;
   },
   'split-string' => sub{
     state $string = 'x' x 1000;
     my @chars = split '', $string;
     \@chars;
   },
   'split-capture' => sub{
     state $string = 'x' x 1000;
     my @chars = split /(.)/, $string;
     \@chars;
   },
   'unpack' => sub{
     state $string = 'x' x 1000;
     my @chars = unpack( '(a)*', $string );
     \@chars;
   },
   'match' => sub{
     state $string = 'x' x 1000;
     my @chars = $string =~ /./gs;
     \@chars;
   },
   'match-capture' => sub{
     state $string = 'x' x 1000;
     my @chars = $string =~ /(.)/gs;
     \@chars;
   },
   'map-substr' => sub{
     state $string = 'x' x 1000;
     my @chars = map { substr $string, $_, 1 } 0 .. length($string) - 1;
     \@chars;
   },
);
# set the initial state of $string
$_->() for values %bench;
cmpthese( -10, \%bench );
for perl in /usr/bin/perl /opt/perl-5.10.1/bin/perl /opt/perl-5.11.2/bin/perl;
do
  $perl -v | perl -nlE'if( /(v5\.\d+\.\d+)/ ){
    say "## Perl $1";
    say "<pre>";
    last;
  }';
  $perl test.pl;
  echo -e '</pre>\n';
done
Perl v5.10.0
               Rate split-capture match-capture map-substr match unpack split split-string
split-capture 296/s            --          -20%       -20%  -23%   -58%  -63%         -63%
match-capture 368/s           24%            --        -0%   -4%   -48%  -54%         -54%
map-substr    370/s           25%            0%         --   -3%   -48%  -53%         -54%
match         382/s           29%            4%         3%    --   -46%  -52%         -52%
unpack        709/s          140%           93%        92%   86%     --  -11%         -11%
split         793/s          168%          115%       114%  107%    12%    --          -0%
split-string  795/s          169%          116%       115%  108%    12%    0%           --
Perl v5.10.1
               Rate split-capture map-substr match-capture match unpack split split-string
split-capture 301/s            --       -31%          -41%  -47%   -60%  -65%         -66%
map-substr    435/s           45%         --          -14%  -23%   -42%  -50%         -50%
match-capture 506/s           68%        16%            --  -10%   -32%  -42%         -42%
match         565/s           88%        30%           12%    --   -24%  -35%         -35%
unpack        743/s          147%        71%           47%   32%     --  -15%         -15%
split         869/s          189%       100%           72%   54%    17%    --          -1%
split-string  875/s          191%       101%           73%   55%    18%    1%           --
Perl v5.11.2
               Rate split-capture match-capture match map-substr unpack split-string split
split-capture 300/s            --          -28%  -32%       -38%   -59%         -63%  -63%
match-capture 420/s           40%            --   -5%       -13%   -42%         -48%  -49%
match         441/s           47%            5%    --        -9%   -39%         -46%  -46%
map-substr    482/s           60%           15%    9%         --   -34%         -41%  -41%
unpack        727/s          142%           73%   65%        51%     --         -10%  -11%
split-string  811/s          170%           93%   84%        68%    12%           --   -1%
split         816/s          171%           94%   85%        69%    12%           1%    --

As you can see split is the quickest, owing to the fact that this is a special case in the code for split.

split-capture is the slowest, probably because it has to set $1, along with several other match variables.

So I would recommend going with plain old split //, ..., or the roughly equivalent split '', ....


It doesn't get much clearer than using the split function to split a string. I suppose you could argue that the null pattern is unintuitive; though I find it clear enough. If you want a "clean" alternative wrap it in a sub:

my @characters = chars($string);
sub chars { split //, $_[0] }

For less readable and more concise (and still with regex overkill):

@characters = $string =~ /./g;

(I learned this idiom from playing code-golf.)


You're right. The standard way to do it is split //, $string. To make code more readable you can create a simple function:

sub get_characters {
    my ($string) = @_;
    return ( split //, $string );
}

@characters = get_characters($string);

I prefer using the split technique. It is well-known, and it is documented.

Yet another way...

@characters = $string =~ /./gs;

Use split with a null pattern to break up the string into individual characters:

@characters = split //, $string;

If you just want the char codes, use unpack:

@values = unpack("C*", $string);

You may need to include use utf8 for unpack to work properly. And you can also use unpack + chr to split the string into individual characters, just TMTOWTDI:

@characters = map chr, unpack("C*", $string);

Need Your Help

Location Client returned location not returning the speed

android location android-location location-client

In my android application i am using location manager to get the location . It works fine in below android 4.0.But in the later versions of android I find it difficult to get the network location i...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.