Regular Expressions: Groups

In Python, you can write the following, to capture groups of characters with regular expressions.

>>> import re
>>> print(“Match is ‘”
… + re.search(‘\\s([a-z]+)\\s’,
… ‘My text string.’).group(1) + “‘”)
Match is ‘text’
>>> 

This is quite straightforward. In C#, you can write something similar.
using System.Text.RegularExpressions;
namespace MhNeifer.Samples.CSharp {
    public class MyRegex {
        static void Main() {
            Regex rgx = new Regex(@”\s([a-z]+)\s”);
            System.Console.WriteLine(“Match is ‘”
                           + rgx.Matches(“My text string.”)[0].Groups[1].Value
                           + “‘”);
        }
    }
}
If you ignore that C# is more wordy in general (namespace and class definition and all this), this is straightforward as well.
I thought that in Java it would be straightforward too. But it seems that there’s a catch. Or I’m too dumb to see the simple solution. I found the following.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex {
    public static void main(String[] args) {
        Pattern p = Pattern.compile(“.*\\s([a-z]+)\\s.*”);
        Matcher m = p.matcher(“My text string.”);
        m.matches();
        System.out.println(“Match is ‘” + m.group(1) + “‘”);
    }
}
While it looks straightforward, it is not. You have to call Matcher.matches() before Matcher.group(), or you get an exception. I was surprised by this. Please note the ‘.*’ at the beginning and the end of the regular expression. You have to write a regular expression that matches the whole string. For me, this took a while to remember.

Comment