Regular Expressions are one of the most overlooked things in programming. A simple search on Google for form validation returns many cook-book style tutorials that help making things worse by just showing an ugly finished expression (or an even uglier string parsing function). Because it’s so easy to shy away from regular expressions just by seeing them finished, I’ve decided to demonstrate a regular expression construction step-by-step. The expression will validate an hh:mm:ss
time pattern. Here it is:
^((\d)|(0\d)|(1\d)|(2[0-3]))\:((\d)|([0-5]\d))\:((\d)|([0-5]\d))$
On a first glance it looks ugly, but if you consider that it validates that time pattern in a single line, it starts to look better. Let’s proceed to the step-by-step construction.
First, let’s start with the body of the expression. You don’t want to validate anything before or after the expression itself, so you delimitate it with ^
(to match the beginning of string) and $
(to match the end of the string):
^$
This results in an expression that will match only an empty string. Try it now:
Pattern: ^$
If you just used ^
or $
, the expression would match anything that had a start or an end, respectively, which is not what you want. Try it now:
Pattern: ^
Pattern: $
When we create regular expressions, we should be able to think in terms of expression cases. The first case is the hour value. It can be any value from 00 to 23, and we must consider the single digit cases too (0 to 9). Let’s create an expression for each case and test them. The first case is 0-9
:
The user may choose to type just a single digit, so we must be prepared for this case. A regular expression that can match is \d
, which stands for “any digit”. The ^\d$
expression matches any string that contains only a single digit:
Pattern: ^\d$
Besides 0-9
numbers, the user can also enter a two-digit number such as 18, but we must delimitate this range from 00 to 23. Because regular expressions work on a character basis (and not on a number basis), we must think in terms of which characters can vary and how they can vary. You can’t, for instance, use \d\d
, because that would allow numbers like 45 to match. Neither can you use [0-2][0-3]
because that would not allow numbers like 19 to match. It’s time to subdivide them into groups, which are the following:
0\d
– allows numbers from 00 to 09 to match1\d
– allows numbers from 10 to 19 to match2[0-3]
– allows numbers from 20 to 23 to match
When creating regular expressions it’s nice to test them one by one to be sure they’re working:
Pattern: ^0\d$
Pattern: ^1\d$
Pattern: ^2[0-3]$
Now we can validade any hour value by just grouping all cases so far. Grouping is very intuitive because it uses (...)
to group and |
to alternate between groups:
^((\d)|(0\d)|(1\d)|(2[0-3]))$
Pattern: ^((\d)|(0\d)|(1\d)|(2[0-3]))$
Time to include the :
sign, but because it has a regexp meaning, you should precede it with \
on the expression such as this:
Pattern: ^\:$
Moving on to the minute and second values, the digits range from 00 to 59, and we should include the single digit value as well. The minutes case is much easier because we can give a lot of freedom for the second digit. The [0-5]\d
expression does the trick:
Pattern: ^[0-5]\d$
Joining this expression with the single digit rule yields the following expression:
^((\d)|([0-5]\d))$
Pattern: ^((\d)|([0-5]\d))$
With this expression ready, the seconds value is ready as well, since the seconds rule is the same as the minutes rule. Putting it all together we have the final expression:
^((\d)|(0\d)|(1\d)|(2[0-3]))\:((\d)|([0-5]\d))\:((\d)|([0-5]\d))$
The final hh:mm:ss
validation pattern
You are soooo right about all the scripts that go on and on parsing this piece and then that piece.
This is the simple elegance in a solution that I always crave along with information and examples to help me actually understand it. I don’t need to blindly copy and paste surplus code. I can use this to write my own and know what I’m doing.
I agree, many people, including myself would shy away with using regular expression for validation because the single statements look scary at first. But the author’s explanation helps me to understand it so I’ll use it in my web site since I like compact code.