Introduction to Regular Expressions

by Barry Dysert
(last updated February 28, 2017)

3

Regular expressions are fairly common in computer programming. But they are also available to end-users, as well. For example, if you're dealing with PowerShell or the Findstr command, you can make use of them. And there are third-party utilities (such as Agent Ransack and EditPad) that support regular expressions. Since they are so useful in many different environments, I decided that a tip about them is in order.

A regular expression is a string, often consisting of special characters, that is designed to match a search pattern. For example, if we're using the Findstr command, we can search for the word "document" immediately followed by either a comma or period by specifying the search string "document[,.]". The special characters in this case are the square brackets, which means that any one of the enclosed characters must follow the word "document".

There are a lot of special characters that can be used in a regular expression. Many of which are listed below:

Character Description
* Match 0 or more characters. Example: h*aven matches both heaven and haven.
+ Match 1 or more characters. Example: he+aven matches heaven and heeaven but not haven.
^ Match what follows only if it's at the beginning of the string. Example: if your regular expression is ^document, it will match document in the string "document the requirements", but it will not match anything in the string "this is a requirements document".
$ Match what precedes only if it's at the end of the string. Example: if your regular expression is document$, it will match document in the string "this is a requirements document", but it will not match anything in the string "document the requirements".
\d Match a digit. Example: \d matches each digit individually in the string "my degree is from 1987". If you want to match all four digits at once, just repeat the \d like \d\d\d\d.
. Match any single character. The period is like a wildcard in that it will match anything.
(a|b) Match either the expression represented by 'a' or the expression represented by 'b'. Example: "(1|one) is the loneliest number" will match both strings "1 is the loneliest number" and "one is the loneliest number".

To see regular expressions in action, let's combine some and use the Findstr command to find occurrences of strings in a file. I have a file called Barry.tmp which contains the following text:

Video provides a powerful way to help you prove your point. 
When you click Online Video, you can paste in the embed code for 
the 20 or 30 videos you want to add. You can also type a keyword 
to search online for the video that best fits your document.

Let's use Findstr to find strings of digits in the file: (See Figure 1.)

Figure 1. Searching for strings of digits.

Let me introduce a couple of other concepts. You can use a pair of square brackets to delimit a range of characters to be found. For example, specifying [a-z] will match any alphabetic character. Also noteworthy is that you use the backslash character (\) to "escape" the character that follows and treat it literally. This is useful if you want to find a character that would otherwise be treated as part of the regular expression. Armed with these two pieces of information, we can now search our file for an alphabetic character followed by a period, i.e., find lines that are ends of sentences: (See Figure 2.)

Figure 2. Searching for ends of sentences.

Let me end with a word of caution: In my experience the regular expression engine used at the Windows command line is a bit different from most regular expression engines. Regular expressions that should work (and that do work in other utilities) may not work as expected at the command line. So knowing your mileage may vary, experiment with them, but don't bet your next paycheck that what you've entered will work as expected.

 This tip (1364) applies to Windows 7, 8, and 10.

Author Bio

Barry Dysert

Barry has been a computer professional for over 35 years, working in different positions such as technical team leader, project manager, and software developer. He is currently a software engineer with an emphasis on developing custom applications under Microsoft Windows. When not working with Windows or writing Tips, Barry is an amateur writer. His first non-fiction book is titled "A Chronological Commentary of Revelation." ...

MORE FROM BARRY

Customizing What Appears on the Right Side of the Start Menu

You're probably used to seeing your Start menu appear a certain way. But is it the most effective display for how you ...

Discover More

Understanding Virtual Memory

Computers can address far more locations than may exist in physical memory. The key to this feature is the use of virtual ...

Discover More

What is the Purpose of the Application Event Log?

The Application event log holds messages generated by applications and services. This tip explains more about it.

Discover More
More WindowsTips

Setting Time Limits for Windows 10 Users

If your computer is set up in crowded environment, it may be helpful to establish times that an account can be used. You ...

Discover More

Copying Files Using the Command Line

The copy command can be a timesaver over trying to do the similar sort of thing with Windows Explorer. You can copy ...

Discover More

Generate a Wireless Network Report

If you have Wi-Fi connection problems, Windows 10 can provide a report to help diagnose possible issues. Using Command ...

Discover More
Subscribe

FREE SERVICE: Get tips like this every week in WindowsTips, a free productivity newsletter. Enter your address and click "Subscribe."

View most recent newsletter.

Comments

If you would like to add an image to your comment (not an avatar, but an image to help in making the point of your comment), include the characters [{fig}] in your comment text. You’ll be prompted to upload your image when you submit the comment. Maximum image size is 6Mpixels. Images larger than 600px wide or 1000px tall will be reduced. Up to three images may be included in a comment. All images are subject to review. Commenting privileges may be curtailed if inappropriate images are posted.

What is 3 + 9?

2017-02-28 02:21:54

Torgrim

Thanks for the clarification Barry. And thank you for a great article btw.
I guess what it should read for * and +, is "Match 0/1 or more OF THE PRECEDING character". :)


2017-02-27 07:43:12

Barry

No. It's correct as-is. What we're looking for is "he" plus possibly more "e"s plus "aven". So you put the "+" after the first "e" to indicate you want to match one or more "e"s.


2017-02-27 06:42:21

Torgrim

+ "Match 1 or more characters. Example: he+aven matches heaven and heeaven but not haven.", is a typo, right? Should read Example: h+aven..., right?


Newest Tips
Subscribe

FREE SERVICE: Get tips like this every week in WindowsTips, a free productivity newsletter. Enter your address and click "Subscribe."

(Your e-mail address is not shared with anyone, ever.)

View the most recent newsletter.