Logo

dev-resources.site

for different kinds of informations.

The importance of the environment in Regex pattern matching

Published at
10/16/2024
Categories
ruby
rails
regex
learning
Author
lcsm0n
Categories
4 categories in total
ruby
open
rails
open
regex
open
learning
open
Author
6 person written this
lcsm0n
open
The importance of the environment in Regex pattern matching

Hereā€™s a small discovery I made regarding Ruby regex rules and whitespace characters, that made me scratch my head for a moment:

Letā€™s have a look the following string, extracted from an email body:

FromĀ :     John DOE <[email protected]>
Enter fullscreen mode Exit fullscreen mode

Please note that the space character between 'From' and the column (ā€˜:ā€™) is a Non-Breaking Space Character (U+00A0 in Unicode), while the other spaces in this string are regular whitespaces (U+0020).

Letā€™s now consider the following regex rule, defined in a Ruby constant:

REGEX = /(?:From)\s*:\s*(?:.*?<)?([^<>\s]+@[^>\s]+)(?:>)?/i
Enter fullscreen mode Exit fullscreen mode

When testing the mentioned string against this regex in a Ruby console, we donā€™t get any match:

REGEX.match(ā€˜FromĀ :  John DOE <[email protected]>ā€™)
=> nil
Enter fullscreen mode Exit fullscreen mode

Why, you may wonder?
The reason for this is that the \s matcher does not look for Non-Breaking Space Characters. In order to make it work, we need to update the regex to explicitly expect NBSC characters, as follows:

REGEX = /(?:From)[\s\u00A0]*:\s*(?:.*?<)?([^<>\s]+@[^>\s]+)(?:>)?/i
Enter fullscreen mode Exit fullscreen mode

Everything looks fine up to that point.

However - here it becomes weird - when testing the original regex rule (the first one, without the \u00A0 part) on the same string in an interactive visualiser (https://regexr.com/ for instance), there is a match:

Screenshot of the test made on Regexr

My understanding of the situation is that the interactive Regex visualiser actually converts the NBSC to regular whitespace when copy-pasting the string into its text input, simply because the browser interprets it as a regular whitespace in its HTML rendering.

This little experiment highlights the importance of testing regex patterns in the exact environment where they will be used. While online tools can be helpful for quick tests, they don't always accurately represent how the regex will behave in your production environment.

PS: It is worth mentioning that the string under scrutiny was copy-pasted from the original email at every stage of this experiment, meaning that the string itself wasnā€™t transformed by the copy-pasting operation.

regex Article's
30 articles in total
Favicon
Here are 7 Regex tools that can save your life from hell šŸ”„
Favicon
What are the benefits of using bounded quantifiers in regex
Favicon
Understanding Regex in Python: A Practical Example
Favicon
Coding challenge: Design and Implement an Advanced Text Search System
Favicon
Automating Email Validation with Python: A Step-by-Step Tutorial
Favicon
Streaming regex scanner ā€” regexpscanner
Favicon
Unraveling the Magic of Regular Expressions: The Ultimate Guide to Mastering Sed, Gawk, and POSIX PatternsšŸš€
Favicon
Masking confidential data in prompts using Regex and spaCy
Favicon
Regular Expressions for Highlighting Comments in PyCharm
Favicon
Regex lookahead
Favicon
Easy to follow Regular Expression Cheatsheet
Favicon
šŸ“ Cross-Post Project Update: Regex, Bug Fixes, and More Regex!
Favicon
How to work with regular expressions
Favicon
Advent of Code 2024 - Day 3: Mull it Over
Favicon
Vim Regex Tricks - Capitalize Every First Letter
Favicon
Finally figured out a whole bunch of Nginx regex. It's more confusing than normal regex somehow
Favicon
From Regex Rampage to Lazy Bliss: My rjq Performance Adventure
Favicon
Regular Expressions
Favicon
Building a Regex Engine in Go: Introducing MatchGo
Favicon
Build up your confidence with Regex: 5 Techniques to make it STICK
Favicon
Mastering Regular Expressions: A Semantic Approach to Regex
Favicon
Regex for a Java Software Engineer
Favicon
Intro to Regular Expressions
Favicon
Intro to Regular Expressions
Favicon
The importance of the environment in Regex pattern matching
Favicon
js / ts - expressĆ£o regular
Favicon
A Guide to Splitting Strings in JavaScript by Regex
Favicon
Taming the Regex Beast: A Beginner's Guide to Regular Expressions
Favicon
The JS string replace() method
Favicon
Learn Enough Regex Without Losing Your Mind

Featured ones: