\w* and \w+

Question

I don't quite get why we didn't recieve ', Tim' by print(re.findall(r'\w+, \w+', data)), but with print(re.findall(r'\w*, \w+', data)) we did. I know what \w* searches for at least 0 Unicode characters, which comma isn't, so why it prints before Tim that comma? Shouldn't it print just Tim?

Answer 1 · 2019-08-09T16:58:32Z

August 9, 2019 4:58pm

Good question! The pattern used must completely match any string that is returned. By using the "\w*, \w+" it says "any or no word characters, followed by a comma and space, followed by one or more word characters". So the "comma space" must be part of the match and is therefore part of the returned string.

The pattern "\w+, \w+" it says "one or more word characters, followed by a comma and space, followed by one or more word characters". So without a word character immediately preceding the "comma space", the text ", Tim" does not match the pattern.

In later videos, you will learn about ^ which anchors the pattern to the beginning of the string. This will allow catching ", Tim" but exclude matching other items within the string.

Post back if you have more questions. Good Luck!!

Welcome to the Treehouse Community

Looking to learn something new?

Anthony Grodowski

Anthony Grodowski

\w* and \w+

1 Answer

Chris Freeman

Chris Freeman

Anthony Grodowski

Anthony Grodowski