How Can I Run a Regex that Tests Text for Characters in a Particular Alphabet or Script?

I'd like to make a regex in Perl that will test a string for characters in a particular script. This would be something like:

$text =~ .*P{'Chinese'}.*

Is there a simple way of doing this, for English it's pretty easy by just testing for [a-zA-Z], but for a script like Chinese, or one of the Japanese scripts, I can't figure out any way of doing this short of writing out every character explicitly, which would make for some very ugly code. Ideas? I can't be the first/only person that's wanted to do this.


Look at perldoc perluniprops, which provides an exhaustive list of properties you can use with \p. You’ll be interested in \p{CJK_Unified_Ideographs} and related properties such as \p{CJK_Symbols_And_Punctuation}. \p{Hiragana} and \p{Katakana} give you the kana. There is also a \p{Script=...} property for a number of scripts: \p{Han} and \p{Script=Han} match Han characters (Chinese), but there is no corresponding \p{Script=Japanese}, quite simply because Japanese has multiple scripts.

