Day 5: String Validation – III (Email ID Validation)

After Username and Password validation, Email ID is another important parameter to validate. Email ID is very important because almost all websites mail their information in your inbox (or spam), and, even you unsubscribe, the very 1st time when you are creating your ID on any login page, a verifying link is send to your email. So, how to check whether the entered Email ID is in correct format or not. That’s why Email ID validation is required.

Let’s begin and end our day with this small task of Email ID validation.

# Task 1 (Day 5) Write a program to validate Email ID

Quite a small Task lengthwise. But believe me, its even more simple to code it down. What should be the algorithm?

  1. an Email ID can begin with any character, i.e. alphanumeric or non alphanumeric, i.e. abc or 123abc or #abc etc.
  2. an Email ID can have two or more parts, i.e. abc_def or abc.def and so on.
  3. Let step 1 and 2 above be collectively known as Part I of Email ID.
  4. After Part 1, for our Part II, there must be a ‘@’ symbol followed by mail server name. For example, abc@gmail or abc_def@msn etc.
  5. Finally, after mail server name, ‘ . ‘ sign followed by com or edu or org etc.
  6. One problem!! sometimes the last part goes like this: .co.uk or .gov.in, i.e. a subdomain. Here we need two dots.

The algorithm gives an idea about how to proceed, but it also gives an idea that there can be a numerous approach towards this solution. The problem is that there is no standard guideline to make, atleast, the Part I of Email ID. It can be cool.dude or $hunny_bunny or anything casual or it can be absolute formal like a simple name. Fortunately, Part II is fixed. It has to be a company/organization/etc. name plus dot com/gov/edu/etc. (or with a subdomain).

Keeping this in mind, we will try to make a generalized solution for Part I and a standard solution for Part II. And again I stress this point, Part I can have many solution possibilities. And since you have learned quite a lot of RegEx by now, you can make your own now.

Let’s code the standard part first:

4th Line of Code: It starts with a ‘ @ ‘ sign, followed by [a-zA-Z0-9] which is alphanumeric characters sequence, for example, gmail. A-Z is used because you might have noticed that while writing Email ID upper or lowercase does not matter. You can although be strict in making your validation code strict by using only a-z. One can also avoid writing 0-9 in this because i haven’t seen numbers in this. Then, we have used ‘ . ‘ to write our domain name, again followed by [a-zA-Z0-9], which will give domain name, say, com or org, etc.. Note that again we used ‘ . ‘ followed by yet another [a-zA-Z0-9]. This will give scope for writing subdomain, like, regexusingpython.wordpress.com. {1, 255} provides us a scope to write a domain name of any length long (not infinite, but a long length which lies in this value of 1 to 255). One can also write {3,6} or {2, 50} or anything depending upon their need. Also, I have used ‘ * ‘ in-place of ‘ + ‘. The difference is that ‘ * ‘ causes the resulting RegEx to match 0 or more repetitions of the preceding RegEx, as many repetitions as are possible. xy* will match ‘x’, ‘xy’, or ‘x’ followed by any number of ‘y’. On the other hand, ‘ + ‘ causes the resulting RegEx to match 1 or more repetitions of the preceding RegEx. xy+ will match ‘x’ followed by any non-zero number of ‘y’; it will not match just ‘x’. Now what does this mean by the way?

* matches 0 or more repetitions of the preceding RegEx, while + 1 or more repetitions of the preceding RegEx. The difference can be well understood if there are spaces. Looks quite easy. Checking for output now:

So both domain and subdomain parts are working in correct manner. This brings to our Part I coding.

Again important line is 4th line of code. The big line is having Part II as well which we wrote just above. Our concern is whatever is written before @. It starts with [ _a-zA-Z0-9-]. This means that our Part I can start with any alphanumeric character or with an underscore. This is followed by (\. [ _a-zA-Z0-9]+), which means that we can have a name like bruce_wayne or bruce.wayne.

Check the output. It’s working correctly. Now, why earlier I said that this Part I is not standard? Because one can have anything in-place of an underscore. I can put a $ sign or a # or nothing at all. I can put \W to allow any special character, like this,

That’s why writing Email ID is bit tricky. But it’s easy once you understand the logic. You will find many ways to write Email ID validation code. This one i find quite useful in almost all situations.

Hope, you also find it easy to learn and use and of course manipulate according to your need. That’s it for Day 5. Have a nice time ahead.

1 Comment

Leave a comment