Select Page

Intro

Sometimes we are interested in the way something ends, whether it’s a single word or an entire line. Regex offer quite a convenient way of achieving this, so let’s look at a few examples and learn how 🙂

Regex to match the last character of the line

Matching the last character of the line is quite simple. Let’s consider the following sentence:

Hello, my name is Bob. I’m in class C

If we can to match the last character on the line (C, in this particular case), we can use following expression:

.*C$

Where:

.*  –  . stands for any character, and * for zero or more occurrences of preceding specification,

so, basically: any number of any characters

$  –  stands for the end of line.

This means it will be matched only if the character right before the end of the line is C. Of course, we can replace C with any other character, and as long as the rest of the expression stays unchanged, it will work just the same.

Since we’re “consuming” basically everything until we reach the last character, the expression is quite simple, but what about matching the last character of a particular word, in the middle of the line? Let’s explore that next 🙂

Regex to match the last character of the word

Let’s consider the following sentence as an example:

Hello, my name is Alice. What’s your name?

Let’s say we’re interested in the name of the person presenting itself (Alice, in this example).

This means we can look for the word ending with . (dot), and capture all preceding characters, like this:

.*\s([^\.]+)\..*

This expression is rather length, so let’s break it down:

.*  –  as we saw before, .* will match any character until \s (which is any kind of white space),

and since words are separated by white spaces, it will “consume” each word until it

reaches the one we’re interested in

([^\.]+)  –  () stands for a capture group, meaning we want to consider what’s inside as a

singular item, in our case that would be [^\.]+, so let’s break that one down 🙂

[^\.]+  –  [] stands for character range, meaning it will match any character specified within those

brackets. In this case, ^\. simply means anything but the . (dot).

+  –  stands for one or more occurrences, meaning it will match every word character

until it reaches . (dot).

\..*  –  means . (dot) has to be right after the previously captured sequence of characters

So, let’s summarize what this all means:

.*\s will match everything until the last space before the word we’re looking for, which is defined by ([^\.]+)\..*, and mean sequences of characters different than . (dot), followed by the . (dot).

I know this one was a bit more complex, so don’t worry if all the pieces are not quite in place yet 🙂

Regex to match last character that isn’t

Ok, thus far we saw how to match line/word ending with a specific character, but what if we actually want the opposite; to match the line/word that ends with the character different than the specified one. Let’s see how to do this for the entire line first 🙂

Regex to match last character of the line that isn’t

So, for the end of the line, let’s use the sentence for our previous example:

Hello, my name is Alice. What’s your name?

And answer:

Hi Alice, I’m Bob, pleasure to meet you!

Now, let’s say we want to match the sentence not ending with the ? (questionmark). We can simply do something like:

.*[^?]$

Nothing new here, really: we are matching everything if the last character is not a question mark. Let’s take a look at an example for a single word next.

Regex to match last character of the word that isn’t

Ok, for this example, let’s consider listing our favorite fruits, like so:

Apples, bananas, kiwis, lemons and oranges.

Now, let’s say we want to match every word which doesn’t end with , (comma), so, in this example lemons and oranges.

\b(\w+)\b[^,]

In this expression we’ve introduced a couple of new special characters, so let’s break it down:

\b  –  stands for word boundary, meaning word has to start or end depending on where \b is 

located. In this case, it signifies word start.

(\w+)  –  we already saw capturing group before (), so nothing new here, but we didn’t see 

what’s inside. \w simply stands for characters which belong to word character group,

or alpha characters.

\b[^,]  –  This means we want to match any character other than “,” following our word’s end.

Each of our words have “,”, “ “ or “.” right after they end, so we want to exclude

the ones having exactly “,”, leaving us with “lemons” and “and”,

both of which have “ “ immediately afterwards, and “oranges”, which has “.”.

Regex to match last number of the line

One of the situations in which this kind of regex are particularly useful is to match the number at the end of word or an entire line, so let’s look at the line example first:

Nice to meet you Bob, here’s my phone number: +123/456-7890

To match the number at the end, we can simply slightly modify our previous regex for the entire line, to specify a number:

.*([0-9])$

This expression will capture any number in range from 0 to 9 right before the line ends, but we can also replace the [] range with the specific value if we need to. Now let’s look at the example of numbers at the end of the words.

Regex to match last number of a word

For this example, let’s extract the numbers from a license number:

Yes, xxx-123 is my license number, is there a problem?

So, to get 123 part, we can use following expression:

\b\w*([0-9])\b

Again, nothing new. We’re looking for the sequence of characters bound by word bounds (the word, to put it simply 🙂) that has to contain at least one number in range from 0 to 9 right before it ends. In our case, that’s the number 3. But what if we want to capture all the numbers?

Worry not my friends, regex has you covered 😉

Regex to match multiple last numbers of a word

We can use the same example, the only difference is this time we want to get all the numbers until the end of the word. To achieve this, regex should be slightly tweaked like so:

\b\w*([0-9]{3})\b

As you can probably notice, the only part that differs is after number range, namely {3},

where {} is a repetition operator, and 3 is an exact number of times a specified pattern is supposed to appear. But what if we are not sure how many numbers there are to match, and we need them all? Well that’s quite simple as well, so let’s see:

\b\D*([0-9]+)\b

You can see the expression changed only slightly (from \w to \D). Similar to \w, \d stands for numeric (decimal digit) characters. The \W and \D are basically opposite. They mean everything but the word character/decimal digit characters respectively. So, for our case this means we want to match all the characters other than the digits, and capture the sequence of digits that leads to that word end. This way, all the numbers will get matched/captured 🙂