Home » Archive by category "Patterns"

Regex Design and Problem Solving

There is a joke about regular expressions that goes like this; “Once, when a person was confronted with a problem they though,
‘I know, I’ll use regular expressions’. Then they had two problems”. Unfortunately this is likely true in most cases. Regular Expressions are fairly far from being a common skill. Frequently the harder a person thinks about the problem the more exaggerated and complex the regex becomes. If you take these steps when designing you solution they will make it much simpler.

A complex Regular Expression:
^[a-zA-Z0-9\.]@[a-zA-Z0-9\.]\.[a-zA-Z]{2,4}$|^[a-zA-Z0-9\.]@[a-zA-Z0-9\.]\.[a-zA-Z]{2,4}([;,]\s[a-zA-Z0-9\.]@[a-zA-Z0-9\.]\.[a-zA-Z]{2,4})+$

The Steps

Clearly Define Your Objective

Understand what your trying to do with your regular expression and understand if it’s the correct took for the job. If your trying to match parenthesis then an regex is will not do what you need. Understand how what it is you need to match. This may seem obvious but there can be a magnitude of difference in complexity between the objective of “Match an email address” and “Match a RFC 2822 email address”. Know where you need to draw the line.

For the example we will be trying to validate an email field. The field can contain one email address or more than one email address separated by colons or commas. Furthermore they can be decorated by periods. So “john.smith@whateva.com” is a valid and “Phillip@whateva.com, Margret.T.Meed@otherdomain.com; JonesSmith85@some.other.com” is also valid. Now we wont hold ourselves to the standard for email, just the cases we have above. The email is being entered for the persons own benefit. The validation is just helping out with obvious typos. Know what you need to and don’t need to accomplish. As always when programming it helps to have clearly defined scope.

Break Your Problem Into Pieces

This is true for any problem, programming, regular expression or not. Find the smaller pieces that your problem is made out of and solve it piece by piece. I’ll show this by using substation in my example of the steps. The key is to find the correct parts so that you can design them, test them and reuse them in the larger problem.

So we want to validate “john.smith@whateva.com” and “Phillip@whateva.com, Margret.T.Meed@otherdomain.com; JonesSmith85@some.other.com”. Step one is to see that we are trying to match emails. So lets rewrite this with place holders: “email” and “email, email; email”. Broken down, it’s much less daunting. Now can we break this down any smaller? How about an email it’s self? Think charactersAndPeriods@charactersAndPeriods.domain . This produces two parts that we can reuse and a guide of how to put it all together.

Test the Parts of the Whole

Once you have the problem broken down into smaller parts and regular expressions for matching them, test those parts. Testing and debugging the smaller pieces will be much faster than working out why your larger expression is failing. Especially when your problem is not something trivial.

So now we develop our regular expressions for the problem at hand. This fiddle shows the expressions used for each part and how they were tested.

Put the Pieces Together; Then Test

Take the smaller proven parts of your regular expression and put it together. If they are assembled the correct way and you tested them thoroughly then your final tests should be a much easier.

Once we have the smaller pieces we can put them together to solve the full problem. Because we tested the parts of the whole we can be confident that they will likely work when they are put together. To be sure we test. The resulting full regular expression is not easily read or understood but our parts are. That’s how one ends up solving their problem with a regex and not ending up with two.

4 Steps to Getting Started with Git

Git is a Distributed version control system. If you are programming you should know about and be using version control. Here are 4 easy steps to getting started with Git as your version control.

Now there are two assumptions going into this. The first that you have downloaded and installed Git. The second is that you can navigate to your source from a command line.

Step 1 – Create the Repository

So this is where the prerequisites come in. Using Git Bash, that you have already installed, go to the directory that you have your code in. Tell Git to create a new repository, right where your code is.

user@computer /c/path/to/your/code/
$ git init

You should now see a message telling you that it has created an empty repository in a folder called “.git” in your directory.

Step 2 – Add Your Files

You need to add files that you want into the repository. You do this with the add command. When using add you can supply a filter like “*.cs” or “*.css”. In this example we want to add everything.

user@computer /c/path/to/your/code/ (master)
$ git add *

Step 3 – Verify Your Staged Files (Optional)

With most things in software development it’s not a bad thing to be a little paranoid. However for the reckless, this step is optional. Verify that the files you want are set to be added to the repository.

user@computer /c/path/to/your/code/ (master)
$ git status

Step 4 – Check in Your Files

Now that you have all your files staged and ready, check them in to the repository.

user@computer /c/path/to/your/code/ (master)
$ git commit -m "Your comment goes here"

That’s it. Your done! In the future when you want to add the changes you have made or new files use the add command again. That’s something different about git. You use the add command for changes and new files. Then when you are satisfied with the staged changes call commit again.

That’s the pattern when developing with Git:

  1. Make updates.
  2. Stage the changes.
  3. Commit the changes.

You can even combine step two and three with the -a flag. However, then you couldn’t verify your staged files before the commit…

user@computer /c/path/to/your/code/ (master)
$ git commit -a -m "Your comment goes here"

string.Format() in JavaScript

I love the string.format pattern in .net. It can clean up strings in your code that would otherwise be horrible to maintain. When programming in JavaScript I have often wished for the same function. This is actually fairly easily by just modifying the prototype for the string object. Prototype? Modify object? No really, it’s quite easy. The string.Format() I created for a personal project is quite simple. It could also be extended to support more of the string.Format() features by modifying the regex and adding switching code. Let’s not get ahead of our selves though. I’ll save that for another post. Here is the code I wrote:

String.prototype.format = function () {
    var formatted = this;
    for (var i = 0; i < arguments.length; i++) {
        formatted = formatted.replace(
            RegExp("\{" + i + "\}", 'g'), arguments[i]);
    }
    return formatted;
};

I will break down what is happening in the code starting with the prototype. The prototype in JavaScript is the object that all objects of that class inherit from. By extending it we add that extension to all of those objects. Here we take String reference its prototype and create a method on it. Format is just a member of the object. It is then assigned a reference to the function we define.

String.prototype.format = function () { };

The guts of the function are fairly simple. We replace any place holders with parameters that have the same number value as the position of the parameter in the arguments.

String.prototype.format = function () {
    //Copy this, the string, to a modifiable var
    var formatted = this;

    //For each argument search for place holder for it
    for (var i = 0; i < arguments.length; i++) {
        formatted = formatted.replace(

            //If a place holder exists, replace with the value
            RegExp("\{" + i + "\}", 'g'), arguments[i]);
    }
    return formatted;
};

That’s it in a nut shell.

Locking, Timing, and Logging. A usefull pattern to verify the performance of threaded code.

There will be times when threads end up waiting for a lock or access to a critical section when synchronizing code. Through planning, design and the proper patterns these times can be minimized. However good the design or theory it is always good to get real data on how fast the code is running. This is a class I came up with that handles multiple threads generating timing data at the same time. This class has lots of ways it can be improved, I mention a few while going through it, and because it will synchronize its own calls into the class it can not be used to debug a lock situation, only to time the duration inside locks or other branches of working, or bug lite, code.

For code that deals with basic locking check out this post on a locking pattern I wrote a class around.

The basics of the class are that it calculates the Mean, Max, Min, Median, and Range of calls to Start and Stop. One thread can call Start / Stop or the calls can be made from different call backs by providing an id to synchronize them on. e.g. like this.GetHashCode etc.. There is also the option to provide an ILog object, like the one used by Log4Net, to provide automatic statical logging of the performance of this object.

public Timed(int maxSample)
    :this (maxSample, null, TimeSpan.Zero, "") { }

public Timed(int maxSample, ILog log, TimeSpan reportInterval, string logIdString)
{
    _maxSample = maxSample;

    if(log != null)
    {
        _log = log;
        _logNameTag = logIdString;
        TimerCallback callBack = new TimerCallback(Log);
        _timer = new Timer(callBack, null, reportInterval, reportInterval);
    }
}

The next thing is to code the call back for the timer. The function should match the prototype of the TimerCallback. So make it return void and take an object, which you may or may not use to pass information into your call back. All my data is in the class and access is being synchronized so the use of a callback object is not needed.

private void Log(object o)
{
    _repoprtingLock.AcquireReaderLock(Timeout.Infinite);
    _log.Info(string.Format(
        "{0} - Mean: {1} / Max: {2} / Min: {3} / Median {4} / Range {5}",
         _logNameTag, Mean, Max, Min, Median, Range));
    _repoprtingLock.ReleaseLock();
}

The rest of the code is fairly straight forward. I put in lots of comments for the Linq statements for anyone who is not familiar.

The Start and Stop functions were where I began. Calling them with out any parameters allows you to use the id of the current thread as the timing tag. This way multiple concurrent calls to the code will produce accurate time stamps. Otherwise one thread could call start right after another and change the results. In cases where different threads will start and stop the code you can pass in a unique id. This was in case two different handlers running in separate threads were responsible for a start and stop.

public void Start()
{
    Start(Thread.CurrentThread.GetHashCode());
}

public void Start(int key)
{
    lock (_timeLock)
    {
        _starts[key] = DateTime.Now;
    }
}

public TimeSpan Stop()
{
    return Stop(Thread.CurrentThread.GetHashCode());
}

public TimeSpan Stop(int key)
{
    //Record the time now so when we get the lock
    // the timing will be accurate.
    DateTime now = DateTime.Now;
    TimeSpan span = TimeSpan.Zero;

    lock (_timeLock)
    {
        span = now - _starts[key];
    }

    _repoprtingLock.AcquireWriterLock(Timeout.Infinite);
    _statsLock.AcquireWriterLock(Timeout.Infinite);
    //If the list is full of samples make room by pruning the oldest
    if (_times.Count >= _maxSample) _times.RemoveAt(0);
    _times.Add(span);
    _statsLock.ReleaseLock();
    _repoprtingLock.ReleaseLock();
    return span;
}

Max is very simple. We lock access to _times and call .Max().

public TimeSpan Max
{
    get
    {
        TimeSpan span;

        _statsLock.AcquireReaderLock(Timeout.Infinite);
        span = _times.Count > 0 ? _times.Max() : TimeSpan.Zero;
        _statsLock.ReleaseLock();

        return span;
    }
}

Min works the same way as Max.

public TimeSpan Min
{
    get
    {
        TimeSpan span;

        _statsLock.AcquireReaderLock(Timeout.Infinite);
        span = _times.Count > 0 ? _times.Min() : TimeSpan.Zero;
        _statsLock.ReleaseLock();

        return span;
    }
}

Range is basically the same a Min and Max.

public TimeSpan Range
{
    get
    {
        TimeSpan span;

        _statsLock.AcquireReaderLock(Timeout.Infinite);
        if (_times.Count == 0) span = TimeSpan.Zero;
        else span = _times.Max() - _times.Min();
        _statsLock.ReleaseLock();

        return span;
    }
}

Finding the Mean is the average of the values. This can be found with .Average but I used .Aggregate in case you decide to use a type that dose not support .Average.

public TimeSpan Mean
{
    get
    {
        TimeSpan span;

        _statsLock.AcquireReaderLock(Timeout.Infinite);
        if (_times.Count == 0) span = TimeSpan.Zero;
        else
            span = new TimeSpan(
                //Use Aggregate to sum the TimeSpans
                _times.Aggregate((total, next) => total + next)
                //Convert to ticks for division
                .Ticks / _times.Count
                //TimeSpan constructor takes result as it's argument
                );
        _statsLock.ReleaseLock();

        return span;
    }
}

The Median is the middle value of the list. If there are an even number of values then it is the average of those numbers. Before we do this we have to sort the list first.

public TimeSpan Median
{
    get
    {
        TimeSpan span;

        _statsLock.AcquireReaderLock(Timeout.Infinite);
        //These days memory is cheep and this
        // shorthand var is local in scope
        int count = _times.Count;
        _times.Sort();
        if (count == 0) span = TimeSpan.Zero;

        //If the list has an even number we take the average
        // of the two values in the middle
        if (count % 2 == 0)
        {
            span = new TimeSpan(
                //The addition returns a Timespan
                (_times[count / 2] + _times[(count / 2) + 1])
                //We have to convert to ticks to devide it by two
                .Ticks / 2
                //The Timespan constructor converts the ticks back
                );
        }
        else
        {
            //When odd, we have to add one
            span = _times[(count / 2) + 1];
        }
        _statsLock.ReleaseLock();

        return span;
    }
}

There are lots of places where the code could be cleaner if I were to use my Locked Class in place of the ReadWriterLocks. The class allows you to use short hand and return in the middle of a lock and not worry about releasing the lock you called. So the following code:

public TimeSpan Mean
{
    get
    {
        TimeSpan span;

        _statsLock.AcquireReaderLock(Timeout.Infinite);
        if (_times.Count == 0) span = TimeSpan.Zero;
        else
            span = new TimeSpan(
                //Use Aggregate to sum the TimeSpans
                _times.Aggregate((total, next) => total + next)
                //Convert to ticks for division
                .Ticks / _times.Count
                //TimeSpan constructor takes result as it's argument
                );
        _statsLock.ReleaseLock();

        return span;
    }
}

Would look like this:

public TimeSpan Mean
{
    get
    {
        using(new Locked(_statsLock, false))
        {
            if (_times.Count == 0) return TimeSpan.Zero;
            return new TimeSpan(
                //Use Aggregate to sum the TimeSpans
                _times.Aggregate((total, next) => total + next)
                //Convert to ticks for division
                .Ticks / _times.Count
                //TimeSpan constructor takes result as it's argument
                );
        }
    }
}

A pattern for a blog

Being a personal blog I will start by saying a little about my self. I think a lot. As a programmer thats a part of my job. Mostly because I have tones of different interests. If I win the lotto I would probably end up getting masters in Biology, Psychology / Sociology, and Math, possibly teaching too. While finding time to work on small engines, and possibly work with wood or glass. It’s honestly exciting to think about learning it all. Knowledge is just patterns. Each filed of understanding is organized around teaching and finding new patterns that are like it. Patterns also have meaning to me as a programmer. I plan on using this blog to share them with you the reader in almost any form they take. I’m hoping you will spread the patterns you find here and share your own as well.

I have the followings goals for this blog.

  • A record of patterns I identify or have learned.
  • To share and teach these patterns to others.
  • Solidify my understanding by writing and teaching.
  • Compliment my professional resume.
  • Have the site pay for it’s minimal hosting costs.
  • Post links to cool things I find online regarding interesting things

Here is to the hope that this site is useful to me in these ends as well as helpful to anyone else who reads it.