Select Page

Match currencies

If looking at/through receipts or invoices is annoying for you, but it’s frequently a part of your routine, regex can help tremendously by highlighting/extracting the most important stuff, figures.
Let’s show some examples, shall we?

For this example, let’s use a list of fruits as in previous lectures:

Apples: $5

Bananas: $9

Kiwis: $7

Lemons: $6

Oranges: $7.50

So, to match all the digits after the $, we can use:

\$[0-9]*

This expression will match $ and every consecutive digit afterwards

However, there are a couple things to notice:

  • There is \ before $, and that’s because $ by itself has a special meaning in regex,
    which is the end of the line. Prefixing special characters with \ is called escaping,
    since it “escapes” character’s special behavior and interprets it as a regular/literal character. The same trick work on parentheses or brackets, or any kind of character used in regex syntax
  • In the case of oranges, this will only match $7, since . is breaking sequence of digits
    (or a single digit in our case). Luckily for us, this can be easily fixed, and we’ll touch on that in our next topic
  • Last thing to notice is that not all the currencies are prefixes, meaning they precede the sequence of digits, so let’s take a look at how to match the currencies that are suffixes next 🙂

Also, digit range [0-9] can be short handed by using \d, like so:

\$\d*

It has the same meaning as range, but range is more flexible in case you need to change the bounds, instead of using 0 through 9.

Match suffixed currencies

For this example we can use our old fruit list, but with the euro prices:

Apples: 5€

Bananas: 9€

Kiwis: 7€

Lemons: 6€

Oranges: 7.50€

To match this prices, all we need to do is change our currency symbol and put it after the digits, like so:

[0-9]*€

Notice we didn’t have to escape this one, since € doesn’t hold any special function in an expression.

This expression works fine, but what if we want to cover more currencies at once, not having to worry about which one will actually match? Worry not, regex to the rescue 😉

Match multiple currencies

To match multiple currencies (let’s say we want to introduce GBP to the currencies we covered thus far), we can change our existing expression to include this:

(\$|£)?[0-9]*€?

This expression will match both US dollar and GBP before the digit sequence, or Euro after.

If you want to add more prefixed currencies, just add pipe (“|”) followed by currency symbol to the first currency group (first () containing US dollar and GBP), or you can do the same for the suffix currencies, just use the same () syntax and “|” as delimiters 🙂

Now, let’s get back to the decimal point in our numbers.

Match currency containing decimal point

To match the sequence of digits containing decimal point, we can the following expression:

(\$|£)?[0-9\.]*€?

This expression will include the dot as a part of the sequence (notice that our digit range now has \. after 9). \ is required again, since . by itself will match any character, but we want literal . (dot) instead. Also, If your situation is delicate and requires decimal accuracy, we can be even more specific:

(\$|£)?[0-9]*\.[0-9]{2}€?

This will match exactly two digits after the . (dot), since that’s the most common case, but you can change it to your liking by simply changing the value between curly braces ({n}), which is 2 in this example.

Match currency with the comma

Besides decimal points, currencies can also contain commas (,) to delimit groups of every three digits. Let’s see how to include commas into our matching sequence:

(\$|£)?[0-9\,\.]*€?

This will match both values with and without comma(s).

Example in python

Let’s say we want to calculate the total amount of money we spent shopping, and we have a list of item we bought in a file called shopping_list.txt:

Apple: $5

Bananas: $9

Kiwis: $7

Lemons: $6

Oranges: $7.50

A thousand euro item: 1,000.00€ 


We can do this quite easily with python:

from re import search

total = 0

with open('shopping_list.txt') as sl:
    for item in sl.readlines():
        item_price = search("((?:\$|£)?([0-9\,\.]+)€?)", item)
        total += float(item_price.groups()[-1].replace(',', ''))

print(total)

Instead of writing long comments and making code rather cluttered, I’ll explain what each line is doing here:

On the line starting with with we open the file using context manager as a variable sl (for shopping list).
For loop will iterate over each line in a file, giving us one item at the time (one item per line).
We use our expression on that particular item to get it’s price.
Notice, since we are using two groups, .groups() method will return both, but we are interested only in the second one (actual value), so we’ll ask only for the last one.
Also, since we saw values can contain commas (,), we want to use .remove() method to remove them so we can safely convert the value to an actual float and add it to a total

If you run the script like this:

$ python calculate_total.py

We’ll get this result:

$ 1034.5

which really is a total of all the items on our shopping list.

Hope you found this article useful, see you in the next one! 🙂