Python etc / re.compile

Publish: 22 December 2020, 18:00

Always precompile regular expressions using re.compile if the expression is known in advance:

# generate random string
from string import printable
from random import choice
text = ''.join(choice(printable) for _ in range(10 * 8))

# let's find numbers
pat = r'\d(?:[\d\.]+\d)*'
rex = re.compile(pat)

%timeit re.findall(pat, text)
# 2.08 µs ± 1.89 ns per loop

# pre-compiled almost twice faster
%timeit rex.findall(text)
# 1.3 µs ± 68.8 ns per loop

The secret is that module-level re functions just compile the expression and call the corresponding method, no optimizations involved:

def findall(pattern, string, flags=0):
    return _compile(pattern, flags).findall(string)

If the expression is not known in advance but can be used repeatedly, consider using functools.lru_cache:

from functools import lru_cache

cached_compile = lru_cache(maxsize=64)(re.compile)

def find_all(pattern, text):
    return cached_compile(pattern).findall(text)