HomeSoftware EngineeringThe right way to create a Area Identify Validator in Python

The right way to create a Area Identify Validator in Python


The problem

Create a website title validator largely compliant with RFC 1035, RFC 1123, and RFC 2181

The next guidelines apply:

  • Area title might comprise subdomains (ranges), hierarchically separated by . (interval) character
  • Area title should not comprise greater than 127 ranges, together with prime stage (TLD)
  • Area title should not be longer than 253 characters (RFC specifies 255, however 2 characters are reserved for trailing dot and null character for root stage)
  • Stage names should be composed out of lowercase and uppercase ASCII letters, digits and – (minus signal) character
  • Stage names should not begin or finish with – (minus signal) character
  • Stage names should not be longer than 63 characters
  • Prime stage (TLD) should not be absolutely numerical

Moreover:

  • Area title should comprise no less than one subdomain (stage) other than TLD
  • Prime stage validation should be naive – ie. TLDs nonexistent in IANA register are nonetheless thought of legitimate so long as they adhere to the foundations given above.

The validation perform accepts a string with the complete area title and returns a boolean worth indicating whether or not the area title is legitimate or not.

Examples:

validate('aoms') == False
validate('ao.ms') == True
validate('amazon.com') == True
validate('AMAZON.COM') == True
validate('sub.amazon.com') == True
validate('amazon.com-') == False
validate('.amazon.com') == False
validate('[email protected]') == False
validate('127.0.0.1') == False

The answer in Python

Choice 1:

import re

def validate(area):
    return re.match('''
        (?=^.{,253}$)          # max. size 253 chars
        (?!^.+.d+$)          # TLD just isn't absolutely numerical
        (?=^[^-.].+[^-.]$)     # would not begin/finish with '-' or '.'
        (?!^.+(.-|-.).+$)    # ranges do not begin/finish with '-'
        (?:[a-zd-]            # makes use of solely allowed chars
        {1,63}(.|$))          # max. stage size 63 chars
        {2,127}                # max. 127 ranges
        ''', area, re.X | re.I)

Choice 2:

def validate(area):
    print(area)
    if len(area) > 253 or len(area) == 0:
        print(1)
        return False
    
    els = area.break up('.')
    if len(els) > 127 or len(els) < 2:
        print(2)
        return False
    
    for x in els:
        if len(x) > 63 or len(x) == 0:
            print(3)
            return False

        if not x[0].isalnum() or not x[-1].isalnum():
            print(4)
            return False

        for l in x:
            if (not all(ord(c) < 128 for c in l) or not l.isalnum()) and l != '-':
                print(5)
                return False

    if els[-1].isnumeric():
        return False
    
    print(True)
    return True

Choice 3:

import re

def validLevel(lvl):
    return not bool(re.search(r'^-|-$', lvl)) and bool(re.match(r'[a-zA-Z0-9-]{1,63}$', lvl))

def validate(area):
    lst = area.break up('.')
    return len(area) <= 253 
           and a pair of <= len(lst) <= 127 
           and never lst[-1].isdigit() 
           and all( validLevel(lvl) for lvl in lst )

Take a look at circumstances to validate our answer

check.describe('Area title validator exams')
check.count on(not validate('aoms')) 
check.count on(validate('ao.ms'))
check.count on(validate('amazon.com'))
check.count on(validate('AMAZON.COM'))
check.count on(validate('sub.amazon.com'))
check.count on(not validate('amazon.com-'))
check.count on(not validate('.amazon.com'))
check.count on(not validate('[email protected]'))
check.count on(not validate('127.0.0.1'))
RELATED ARTICLES

Most Popular

Recent Comments