There is a joke about regular expressions that goes like this; “Once, when a person was confronted with a problem they though,
‘I know, I’ll use regular expressions’. Then they had two problems”. Unfortunately this is likely true in most cases. Regular Expressions are fairly far from being a common skill. Frequently the harder a person thinks about the problem the more exaggerated and complex the regex becomes. If you take these steps when designing you solution they will make it much simpler.
A complex Regular Expression:
^[a-zA-Z0-9\.]@[a-zA-Z0-9\.]\.[a-zA-Z]{2,4}$|^[a-zA-Z0-9\.]@[a-zA-Z0-9\.]\.[a-zA-Z]{2,4}([;,]\s[a-zA-Z0-9\.]@[a-zA-Z0-9\.]\.[a-zA-Z]{2,4})+$
The Steps
Clearly Define Your Objective
Understand what your trying to do with your regular expression and understand if it’s the correct took for the job. If your trying to match parenthesis then an regex is will not do what you need. Understand how what it is you need to match. This may seem obvious but there can be a magnitude of difference in complexity between the objective of “Match an email address” and “Match a RFC 2822 email address”. Know where you need to draw the line.
For the example we will be trying to validate an email field. The field can contain one email address or more than one email address separated by colons or commas. Furthermore they can be decorated by periods. So “john.smith@whateva.com” is a valid and “Phillip@whateva.com, Margret.T.Meed@otherdomain.com; JonesSmith85@some.other.com” is also valid. Now we wont hold ourselves to the standard for email, just the cases we have above. The email is being entered for the persons own benefit. The validation is just helping out with obvious typos. Know what you need to and don’t need to accomplish. As always when programming it helps to have clearly defined scope.
Break Your Problem Into Pieces
This is true for any problem, programming, regular expression or not. Find the smaller pieces that your problem is made out of and solve it piece by piece. I’ll show this by using substation in my example of the steps. The key is to find the correct parts so that you can design them, test them and reuse them in the larger problem.
So we want to validate “john.smith@whateva.com” and “Phillip@whateva.com, Margret.T.Meed@otherdomain.com; JonesSmith85@some.other.com”. Step one is to see that we are trying to match emails. So lets rewrite this with place holders: “email” and “email, email; email”. Broken down, it’s much less daunting. Now can we break this down any smaller? How about an email it’s self? Think charactersAndPeriods@charactersAndPeriods.domain . This produces two parts that we can reuse and a guide of how to put it all together.
Test the Parts of the Whole
Once you have the problem broken down into smaller parts and regular expressions for matching them, test those parts. Testing and debugging the smaller pieces will be much faster than working out why your larger expression is failing. Especially when your problem is not something trivial.
So now we develop our regular expressions for the problem at hand. This fiddle shows the expressions used for each part and how they were tested.
Put the Pieces Together; Then Test
Take the smaller proven parts of your regular expression and put it together. If they are assembled the correct way and you tested them thoroughly then your final tests should be a much easier.
Once we have the smaller pieces we can put them together to solve the full problem. Because we tested the parts of the whole we can be confident that they will likely work when they are put together. To be sure we test. The resulting full regular expression is not easily read or understood but our parts are. That’s how one ends up solving their problem with a regex and not ending up with two.
