Skip to main content
Regex Best Practices

How to use regex securely

Noah Morse avatar
Written by Noah Morse
Updated over 2 weeks ago

Regex can be a powerful tool when dealing with dynamic data. That being said, it can be challenging to create a regular expression that can do precisely what you want it to 100% of the time. Unexpected edge cases or carefully crafted malicious payloads may be able to bypass regex filtering, resulting in unexpected or unsafe results.

Regex Best Practices

  • Prioritize using specialized packages or libraries that are designed to perform specific parsing or filtering functions. This includes parsing HTML, validating email addresses, or sanitizing user input that may influence code functionality.

    • ex. Using Python's urllib.parse package over regex to filter out specific URL schemes

  • Use previously validated patterns for common use cases.

  • Avoid using Evil Regex. These are regex patterns that get stuck in exponential backtracking due to specific crafted inputs, causing excessive CPU usage and potential system downtime.

    • ex. (a+)+ , ([a-zA-Z]+)*, (.*a){x} for x \> 10Use well-known and trusted regex tools for building, linting, validating, and testing regex

    • ex. regex101, RegExr

  • Limit regex complexity. Overly complex regex can be difficult to create correctly and can lead to performance issues.

  • Use regex timeouts when available

  • Avoid using regex when the incoming data is unconstrained or from an unknown source.

Did this answer your question?