#---Regular Expressions---
Regular expressions (regexp) in Ruby provide powerful
pattern matching and text manipulation.
In Ruby, regexp is implemented using the Regexp
class.
Ruby provides several ways to create regular expressions:
/.../
:
The most common way to create a regex is by enclosing
the pattern in forward slashes.regex = /hello/
%r{...}
:
This is useful when your regex contains forward slashes,
as it avoids the need to escape them.## Same as:
## regex = /https?:\/\//
regex = %r{https?://}
Regexp.new
:
You can also create a regex using the Regexp.new
method, which is useful when the pattern is dynamic.pattern = "hello"
regex = Regexp.new(pattern)
The most common use of regexp is to check if a string
matches a specific pattern.
You can use the match?
method or the =~
operator.
The match?
method (from ruby 3.x) is preferred because
it returns a boolean without creating a MatchData
object, making it more efficient.
Using Regexp#match?
method:
## Check if string ends with one or more digits
if /\d+$/.match?("The answer is 42")
p "Match found!" # => "Match found!"
else
p "No match."
end
Using Regexp#=~
method:
if /hello/ =~ "The answer is 42"
p "Match found!"
else
p "No match." # => "No match."
end
It’s also useful to capture the data that’s
matching your pattern. For example, in a text blurb
you might want to extract just the date.
To capture a pattern, wrap it with ()
.
The match
method on the regex returns a MatchData
object from which you can access your captures.
str = "Today's date is 2025-10-25."
re = /(\d{4})-(\d{2})-(\d{2})/
md = re.match(str)
if md
p "Full match: #{md[0]}" # => "Full match: 2025-10-25"
p "Year: #{md[1]}" # => "Year: 2025"
p "Month: #{md[2]}" # => "Month: 10"
p "Day: #{md[3]}" # => "Day: 25"
end
With the ?<foo>
thingy in the regex, you can assign
name for the captures.
Now you can access them by their names individually,
or get every match with md.named_captures
which
returns a hash.
re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
md = re.match(str)
p md.named_captures
## => {"year" => "2025", "month" => "10", "day" => "25"}
p md[:year]
## => "2025"
Let’s say you have to validate all the dates in some
lengthy text that the user filled in a text form.
With just the date’s regex, you could get them all
using String#scan
like this:
str = %Q{
Today is 2025-02-05. It's an extraordinary
Wednesday. It's unlike 2025-02-01 because
that day is long gone. And it's also unlike
2025-02-06 because that's still out of the
horizon.
}
str.scan(re) do |y, m, d|
puts "year: #{y}, month: #{m}, day: #{d}"
end
## => year: 2025, month: 02, day: 05
## => year: 2025, month: 02, day: 01
## => year: 2025, month: 02, day: 06
Without a block scan
gives an array of all matches.
p str.scan(/\d{4}-\d{2}-\d{2}/)
## => ["2025-02-05", "2025-02-01", "2025-02-06"]
Options (also called modifiers) change how the pattern
matching behaves. There are 3 of them: i
, x
, m
:
i
): Use it to
match letters regardless of whether they’re
lowercase or uppercase.insensitive_re = /that/i
p "she said that THAT phone is hers".scan(insensitive_re)
## => ["that", "THAT"]
m
): This changes the behavior of
the .
(dot) metacharacter. Normally, .
matches any
character except a newline (\n).
With m
, .
will match newlines too.str = "Third line\nFourth line"
p str.match?(/Third.*Fourth/) # => false
p str.match?(/Third.*Fourth/m) # => true
x
): Use this to write
readable regex by ignoring whitespace and comments in
the regex. Use it to split complex regex into smaller
ones.url_re
)..
quantifier matches any character except a
newline:p /./.match?("\n") # => false
p /./m.match?("\n") # => true
?
quantifier matches zero or one of the
preceding character or group. Use it to match something
optionally:p %w(color colour coler).map { |w| /colou?r/.match?(w) }
## => [true, true, false]
+
quantifier matches at least one of the
preceding character or group:p %w(color coolor celor).map { |w| /co+lor/.match?(w) }
## => [true, true, false]
*
quantifier matches zero or more of the
preceding character or group:p %w(color coolor clor).map { |w| /co*lor/.match?(w) }
## => [true, true, true]
{min,max}
quantifiers match specific count
of the preceding character or group.matches words with exactly 4 chars:
p %w(john joe adler).map { |w| /^[a-z]{4}$/.match?(w) }
## => [true, false, false]
matches words with at least 4 chars:
p %w(john joe adler).map { |w| /^[a-z]{4,}$/.match?(w) }
## => [true, false, true]
matches words with at most 3 chars:
p %w(john joe adler).map { |w| /^[a-z]{,3}$/.match?(w) }
## => [false, true, false]
matches only words 3 to 5 chars in length:
p %w(john joe ladler).map { |w| /^[a-z]{3,5}$/.match?(w) }
## => [true, true, false]
[]
are Character classes. Any characters
within the brackets match wherever the class is
in the pattern.^
is the first char in the brackets, then only
chars other than the ones in the brackets are matched.p %w(bad bid bed bod).map { |w| /b[aie]d/.match?(w) }
## => [true, true, true, true]
p %w(bad bid bed bod).map { |w| /b[^aie]d/.match?(w) }
## => [false, false, false, true]
You can specify a range of chars with -
instead of
specifying them individually (0-9
for digits,
a-z
for alphabets:
p %w(a1b a3b aeb).map { |w| /a[0-9]b/.match?(w) }
## [true, true, false]
Use \d
instead of [0-9]
. Matches any digit:
p /Emp ID: \d{4}/.match?('Emp ID: 3423') # => true
Use \D
instead of [^0-9]
to match everything except
a digit:
p /Ham \D+ Rye/.match?('Ham On Rye') # => true
Use \w
instead of [0-9a-zA-Z_]
. Matches any ‘word’
(including underscore):
p /a \w+/.match?('a valid_variable_7') # => true
Use \W
instead of [^0-9a-zA-Z_]
to do the opposite:
p /x \W+ y/.match?('x = y') # => true
Use \s
to match any kind of whitespaces (space, tab,
carriage return etc)
p ["john", "john doe"].map { |w| /\w+\s+\w+/.match?(w) }
## => [false, true]
Use \S
to match any kind of whitespaces (space, tab,
carriage return etc)
p ["john", " "].map { |w| /\S+/.match?(w) }
## => [true, false]
For all the above, instead of just matching a single preceding char, you can also match a group of preceding chars:
p %w(banana bananana bnana).map { |w|
/b(an)+ana/.match?(w)
} # => [true, true, false]
\w
and \d
match only ascii letters and numbers.
If you want to match letters, numbers, their upper/lower
variations from other languages, and even emojis, then
you need to use the \p
character class.
p /\d/.match?('௩') # => false
p /\p{Digit}/.match?('௩') # => true
## ('௩' is '3' in Tamil)
p /\p{Emoji}/.match?('😉') # => true
re = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i
p re.match?("user@example.com") # => true
p re.match?("invalid.email@") # => false
p "hello world".gsub(/world/, 'ruby') # => "hello ruby"
p "hello world".gsub(/\w+/) { |word| word.upcase }
## => "HELLO WORLD"
p "John Doe".gsub(/(?<first>\w+)\s(?<last>\w+)/,
'\k<last>, \k<first>') # => "Doe, John"
url_re = %r{
\A
(?<protocol>https?://) # These comments
(?<domain>[^/]+) # and whitespaces
(?<path>/[^?#]*)? # are ignored.
(?<query>\?[^#]*)? # Use them to annotate
(?<fragment>\#.*)? # the regex.
\z
}x # free-spacing mode
md = url_re.match("https://example.com/path?q=1#section")
p md[:protocol] # => "https://"
p md[:domain] # => "example.com"
p md[:path] # => "/path"
p md[:query] # => "?q=1"
p md[:fragment] # => "#section"
non_word_re = /\W+/
tokens = "20 - 10 == (3 + 2) + 5".split(non_word_re)
p tokens
## => ["20", "10", "3", "2", "5"]
Be rightly scared of regex but use them with care. Read this.