Ruby Examples: Regular Expression

#---Regular Expressions---

Regular expressions (regexp) in Ruby provide powerful pattern matching and text manipulation. In Ruby, regexp is implemented using the Regexp class.

Create a Regular Expression

Ruby provides several ways to create regular expressions:

Using /.../: The most common way to create a regex is by enclosing the pattern in forward slashes.

              regex = /hello/
            

Using %r{...}: This is useful when your regex contains forward slashes, as it avoids the need to escape them.

              ## Same as:
## regex = /https?:\/\//
regex = %r{https?://}
            

Using Regexp.new: You can also create a regex using the Regexp.new method, which is useful when the pattern is dynamic.

              pattern = "hello"
regex = Regexp.new(pattern)
            

Check If a String Matches a Pattern

The most common use of regexp is to check if a string matches a specific pattern.
You can use the match? method or the =~ operator.

The match? method (from ruby 3.x) is preferred because it returns a boolean without creating a MatchData object, making it more efficient.

Using Regexp#match? method:

              ## Check if string ends with one or more digits
if /\d+$/.match?("The answer is 42")
  p "Match found!" # => "Match found!"
else
  p "No match."
end
            

Using Regexp#=~ method:

              if /hello/ =~ "The answer is 42"
  p "Match found!"
else
  p "No match." # => "No match."
end
            

Capture Matching Data

It’s also useful to capture the data that’s matching your pattern. For example, in a text blurb you might want to extract just the date.
To capture a pattern, wrap it with ().
The match method on the regex returns a MatchData object from which you can access your captures.

              str = "Today's date is 2025-10-25."
re = /(\d{4})-(\d{2})-(\d{2})/
md = re.match(str)
if md
  p "Full match: #{md[0]}" # => "Full match: 2025-10-25"
  p "Year: #{md[1]}"       # => "Year: 2025"
  p "Month: #{md[2]}"      # => "Month: 10"
  p "Day: #{md[3]}"        # => "Day: 25"
end
            

Naming the Captures

With the ?<foo> thingy in the regex, you can assign name for the captures. Now you can access them by their names individually, or get every match with md.named_captures which returns a hash.

              re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
md = re.match(str)
p md.named_captures
## => {"year" => "2025", "month" => "10", "day" => "25"}
p md[:year]
## => "2025"
            

Capture All Matches with Scan

Let’s say you have to validate all the dates in some lengthy text that the user filled in a text form. With just the date’s regex, you could get them all using String#scan like this:

              str = %Q{
Today is 2025-02-05. It's an extraordinary
Wednesday. It's unlike 2025-02-01 because
that day is long gone. And it's also unlike
2025-02-06 because that's still out of the
horizon.
}
str.scan(re) do |y, m, d|
  puts "year: #{y}, month: #{m}, day: #{d}"
end
## => year: 2025, month: 02, day: 05
## => year: 2025, month: 02, day: 01
## => year: 2025, month: 02, day: 06
            

Without a block scan gives an array of all matches.

              p str.scan(/\d{4}-\d{2}-\d{2}/)
## => ["2025-02-05", "2025-02-01", "2025-02-06"]
            

Regex Options

Options (also called modifiers) change how the pattern matching behaves. There are 3 of them: i, x, m:

Case-insensitive matching (i): Use it to match letters regardless of whether they’re lowercase or uppercase.

              insensitive_re = /that/i
p "she said that THAT phone is hers".scan(insensitive_re)
## => ["that", "THAT"]
            

Multiline mode (m): This changes the behavior of the . (dot) metacharacter. Normally, . matches any character except a newline (\n). With m, . will match newlines too.
This is useful in matching groups of subsequent lines in a file.

              str = "Third line\nFourth line"
p str.match?(/Third.*Fourth/)  # => false
p str.match?(/Third.*Fourth/m) # => true
            

Free-spacing mode (x): Use this to write readable regex by ignoring whitespace and comments in the regex. Use it to split complex regex into smaller ones.
See example below in the url-parsing regex (url_re).

Regex Metacharacters

. quantifier matches any character except a newline:

              p /./.match?("\n")  # => false
p /./m.match?("\n") # => true
            

? quantifier matches zero or one of the preceding character or group. Use it to match something optionally:

              p %w(color colour coler).map { |w| /colou?r/.match?(w) }
## => [true, true, false]
            

+ quantifier matches at least one of the preceding character or group:

              p %w(color coolor celor).map { |w| /co+lor/.match?(w) }
## => [true, true, false]
            

* quantifier matches zero or more of the preceding character or group:

              p %w(color coolor clor).map { |w| /co*lor/.match?(w) }
## => [true, true, true]
            

{min,max} quantifiers match specific count of the preceding character or group.

matches words with exactly 4 chars:

              p %w(john joe adler).map { |w| /^[a-z]{4}$/.match?(w) }
## => [true, false, false]
            

matches words with at least 4 chars:

              p %w(john joe adler).map { |w| /^[a-z]{4,}$/.match?(w) }
## => [true, false, true]
            

matches words with at most 3 chars:

              p %w(john joe adler).map { |w| /^[a-z]{,3}$/.match?(w) }
## => [false, true, false]
            

matches only words 3 to 5 chars in length:

              p %w(john joe ladler).map { |w| /^[a-z]{3,5}$/.match?(w) }
## => [true, true, false]
            

[] are Character classes. Any characters within the brackets match wherever the class is in the pattern.
If ^ is the first char in the brackets, then only chars other than the ones in the brackets are matched.

              p %w(bad bid bed bod).map { |w| /b[aie]d/.match?(w) }
## => [true, true, true, true]
p %w(bad bid bed bod).map { |w| /b[^aie]d/.match?(w) }
## => [false, false, false, true]
            

You can specify a range of chars with - instead of specifying them individually (0-9 for digits, a-z for alphabets:

              p %w(a1b a3b aeb).map { |w| /a[0-9]b/.match?(w) }
## [true, true, false]
            

Shorthand character classes make regex tolerable.

Use \d instead of [0-9]. Matches any digit:

              p /Emp ID: \d{4}/.match?('Emp ID: 3423') # => true
            

Use \D instead of [^0-9] to match everything except a digit:

              p /Ham \D+ Rye/.match?('Ham On Rye') # => true
            

Use \w instead of [0-9a-zA-Z_]. Matches any ‘word’ (including underscore):

              p /a \w+/.match?('a valid_variable_7') # => true
            

Use \W instead of [^0-9a-zA-Z_] to do the opposite:

              p /x \W+ y/.match?('x = y') # => true
            

Use \s to match any kind of whitespaces (space, tab, carriage return etc)

              p ["john", "john  doe"].map { |w| /\w+\s+\w+/.match?(w) }
## => [false, true]
            

Use \S to match any kind of whitespaces (space, tab, carriage return etc)

              p ["john", "  "].map { |w| /\S+/.match?(w) }
## => [true, false]
            

For all the above, instead of just matching a single preceding char, you can also match a group of preceding chars:

              p %w(banana bananana bnana).map { |w|
  /b(an)+ana/.match?(w)
} # => [true, true, false]
            

Matching Unicode Characters

\w and \d match only ascii letters and numbers. If you want to match letters, numbers, their upper/lower variations from other languages, and even emojis, then you need to use the \p character class.

              p /\d/.match?('௩') # => false
p /\p{Digit}/.match?('௩') # => true
## ('௩' is '3' in Tamil)
p /\p{Emoji}/.match?('😉') # => true
            

Common Regex Usecases

To see if a string matches a pattern. Eg: email, ip address:

              re = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i
p re.match?("user@example.com") # => true
p re.match?("invalid.email@") # => false
            

To replace parts of a string matching a pattern:

              p "hello world".gsub(/world/, 'ruby') # => "hello ruby"
p "hello world".gsub(/\w+/) { |word| word.upcase }
## => "HELLO WORLD"
p "John Doe".gsub(/(?<first>\w+)\s(?<last>\w+)/,
                '\k<last>, \k<first>') # => "Doe, John"
            

To parse and capture substrings:

              url_re = %r{
  \A
  (?<protocol>https?://)  # These comments
  (?<domain>[^/]+)        # and whitespaces
  (?<path>/[^?#]*)?       # are ignored.
  (?<query>\?[^#]*)?      # Use them to annotate
  (?<fragment>\#.*)?      # the regex.
  \z
}x # free-spacing mode
md = url_re.match("https://example.com/path?q=1#section")
p md[:protocol]  # => "https://"
p md[:domain]    # => "example.com"
p md[:path]      # => "/path"
p md[:query]     # => "?q=1"
p md[:fragment]  # => "#section"
            

To split strings based on a pattern:

              non_word_re = /\W+/
tokens = "20 - 10 == (3 + 2) + 5".split(non_word_re)
p tokens
## => ["20", "10", "3", "2", "5"]
            

Official Docs

The String and Symbol classes also have the =~, match? and match methods with same functionality.
Regexp
MatchData
String’s match

Useful Links

Rubular.com is a popular free tool where you can check your regexes online for quick feedback
Mastering Regular Expressions, a favorite book for many geeks

Regex Quotes

“The plural of regex is regrets”
“Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems”.
“they are far and away the most “obvious” (at least, to people who don’t know any better) way to get from point A to point B.”

Be rightly scared of regex but use them with care. Read this.