Day 1: RegEx Module and 1st Program using RegEx

Let’s begin our Day 1.

Python has a built-in package called re, which can be used to work with Regular Expressions. The first step is to import this module by using following line:

1st line of code: import re

Before starting the actual programming it’s always a better approach to compile (or you can say pre-compile).

2nd line of code: pattern = re.compile (r””)

The purpose of this step is to compile the RegEx pattern (pattern is a variable name) which will be used for matching later, and, although one can skip this step, but, believe me, it’s really handy to compile RegEx when it’ll be used several times in your program.

Hey!! What’s this (r””)?

Actually, it’s a prefix. Prefixing with an r simply indicates to the string that whatever we are writing inside double quotes should be treated literally and not as escape characters for python. For example:

print (“\n”) will give an output as cursor blinking on new line, but,

print (r“\n”) will give an output as \n.

So, let’s not skip the 2nd line of code while learning regular expression, and this means that above two lines are mandatory before getting into the coding logic.

What should be our approach? The traditional way of 1st learning every data type, functions, bla…bla…bla…all in a single run and then use it in examples or learn through examples at run time. I prefer the second approach and it’s not boring also.

The first program:

Task 1: Our 1st task is to write a program that will replace all the digits in a string with an underscore ( _ )

Let’s talk about this task now. What should be the algorithm for this?

1. Input a string. The string should be having digits also.

2. Find out the digits and replace them with an underscore.

3. Print the new string with underscore.

Let’s code it:

Wow!! It took 6 lines only. Yup, that’s the magic of using regular expression.

Let’s do a postmortem of this code now.

1st two lines of codes are mandatory lines that we have discussed earlier. Again a reminder, 2nd line is not compulsory to write, but, it’s a good practice actually.

3rd line of code: my_string  =  input (“Enter a string: “)

my_string is a variable in which user is being asked to enter any string at run time.

5th line of code: result = pattern.sub (“_”,my_string)

Again result is a variable which is used to store the final result. In this final result, our input that is, my_string will be checked for a pattern, which is defined in 4th line of code (which we will analyze in a moment). By replacing, we can also say that we want to substitute. RegEx provides a sub function that does the substitution part. The meaning of this line of code goes like this:

Substitute an underscore in my_string by checking the pattern, where pattern consist of digits.

“Pattern consists of digits” is described in 4th line of code and that is the heart of the code. It is the 1st proper RegEx function to deal with.

4th line of code: pattern = re.compile (r”[0-9]+”)

Here comes the 1st learning part of RegEx. What is this ([0-9]+)? The 1st table to learn is:

Table 1: Special Sequences

There is nothing to panic about. We will only refer this table when needed and by referring only we are going to memorize it. That’s damn true…

Coming to our 4th line of code, [0-9] will check for any digits starting from 0 to 9. Let’s check the output now of our code:

So, the input string john123 after execution of code gives an output john___

Is there any difference with the code that was written earlier? Yes, one difference. ([0-9]+) v/s ([0-9]). What does this ‘+’ sign do?

Observe the change. Without ‘+’ the output has as many underscores as many digits in string. But after using ‘+’ it’s a single underscore.

The question is: What’s the use?

It is very useful when we do password validation. Actually, without ‘+’ an intruder comes to know that how many digits are there and he/she/they can try for various permutation and combination to break the password. ‘+’ increase the security.

6th line of code: print (result)

This will finally print the result.

Let’s refer Table 1 again. [0-9] can also be replaced with \d. Will it give the same result?

It works cool!!!

Let’s do 1 more check. Can I write digits anywhere or it should be written in continuous manner.

So, we can write digits in any manner. It will be substituted by an underscore.Congratulations!! That’s our 1st RegEx program and it’s successful (and it’s not Hello World program).

Task 2 One more example with sub will help to understand substitution function as well. Here, we are substituting digits with a $ sign.

So on Day 1 we have learned two functions: sub and digits.

Moving on, we used ‘+’. Does it have any name? Yes. It is known as Quantifier. Quantifiers simply specify the quantity of characters to match. This brings to our Table 2. Again, there is no need to memorize it. We will learn it eventually like \d and sub.

Table 2: Quantifiers

Table 3 Sets

A set is a collection of characters inside a pair of square brackets [] with a distinguished meaning:

Leave a comment