Ruby Examples > Regular Expressions

#---Regular Expressions---

Regular expressions (regexp) in Ruby provide powerful pattern matching and text manipulation. In Ruby, regexp is implemented using the Regexp class.

Create a Regular Expression

Ruby provides several ways to create regular expressions:

  • Using /.../: The most common way to create a regex is by enclosing the pattern in forward slashes.
regex = /hello/
  • Using %r{...}: This is useful when your regex contains forward slashes, as it avoids the need to escape them.
## Same as:
## regex = /https?:\/\//
regex = %r{https?://}
  • Using Regexp.new: You can also create a regex using the Regexp.new method, which is useful when the pattern is dynamic.
pattern = "hello"
regex = Regexp.new(pattern)

Check If a String Matches a Pattern

The most common use of regexp is to check if a string matches a specific pattern.
You can use the match? method or the =~ operator.

The match? method (from ruby 3.x) is preferred because it returns a boolean without creating a MatchData object, making it more efficient.

Using Regexp#match? method:

## Check if string ends with one or more digits
if /\d+$/.match?("The answer is 42")
  p "Match found!" # => "Match found!"
else
  p "No match."
end

Using Regexp#=~ method:

if /hello/ =~ "The answer is 42"
  p "Match found!"
else
  p "No match." # => "No match."
end

Capture Matching Data

It’s also useful to capture the data that’s matching your pattern. For example, in a text blurb you might want to extract just the date.
To capture a pattern, wrap it with ().
The match method on the regex returns a MatchData object from which you can access your captures.

str = "Today's date is 2025-10-25."
re = /(\d{4})-(\d{2})-(\d{2})/
md = re.match(str)
if md
  p "Full match: #{md[0]}" # => "Full match: 2025-10-25"
  p "Year: #{md[1]}"       # => "Year: 2025"
  p "Month: #{md[2]}"      # => "Month: 10"
  p "Day: #{md[3]}"        # => "Day: 25"
end

Naming the Captures

With the ?<foo> thingy in the regex, you can assign name for the captures. Now you can access them by their names individually, or get every match with md.named_captures which returns a hash.

re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
md = re.match(str)
p md.named_captures
## => {"year" => "2025", "month" => "10", "day" => "25"}
p md[:year]
## => "2025"

Capture All Matches with Scan

Let’s say you have to validate all the dates in some lengthy text that the user filled in a text form. With just the date’s regex, you could get them all using String#scan like this:

str = %Q{
Today is 2025-02-05. It's an extraordinary
Wednesday. It's unlike 2025-02-01 because
that day is long gone. And it's also unlike
2025-02-06 because that's still out of the
horizon.
}
str.scan(re) do |y, m, d|
  puts "year: #{y}, month: #{m}, day: #{d}"
end
## => year: 2025, month: 02, day: 05
## => year: 2025, month: 02, day: 01
## => year: 2025, month: 02, day: 06

Without a block scan gives an array of all matches.

p str.scan(/\d{4}-\d{2}-\d{2}/)
## => ["2025-02-05", "2025-02-01", "2025-02-06"]

Regex Options

Options (also called modifiers) change how the pattern matching behaves. There are 3 of them: i, x, m:

  • Case-insensitive matching (i): Use it to match letters regardless of whether they’re lowercase or uppercase.
insensitive_re = /that/i
p "she said that THAT phone is hers".scan(insensitive_re)
## => ["that", "THAT"]
  • Multiline mode (m): This changes the behavior of the . (dot) metacharacter. Normally, . matches any character except a newline (\n). With m, . will match newlines too.
    This is useful in matching groups of subsequent lines in a file.
str = "Third line\nFourth line"
p str.match?(/Third.*Fourth/)  # => false
p str.match?(/Third.*Fourth/m) # => true
  • Free-spacing mode (x): Use this to write readable regex by ignoring whitespace and comments in the regex. Use it to split complex regex into smaller ones.
    See example below in the url-parsing regex (url_re).

Regex Metacharacters

  • . quantifier matches any character except a newline:
p /./.match?("\n")  # => false
p /./m.match?("\n") # => true
  • ? quantifier matches zero or one of the preceding character or group. Use it to match something optionally:
p %w(color colour coler).map { |w| /colou?r/.match?(w) }
## => [true, true, false]
  • + quantifier matches at least one of the preceding character or group:
p %w(color coolor celor).map { |w| /co+lor/.match?(w) }
## => [true, true, false]
  • * quantifier matches zero or more of the preceding character or group:
p %w(color coolor clor).map { |w| /co*lor/.match?(w) }
## => [true, true, true]
  • {min,max} quantifiers match specific count of the preceding character or group.

matches words with exactly 4 chars:

p %w(john joe adler).map { |w| /^[a-z]{4}$/.match?(w) }
## => [true, false, false]

matches words with at least 4 chars:

p %w(john joe adler).map { |w| /^[a-z]{4,}$/.match?(w) }
## => [true, false, true]

matches words with at most 3 chars:

p %w(john joe adler).map { |w| /^[a-z]{,3}$/.match?(w) }
## => [false, true, false]

matches only words 3 to 5 chars in length:

p %w(john joe ladler).map { |w| /^[a-z]{3,5}$/.match?(w) }
## => [true, true, false]
  • [] are Character classes. Any characters within the brackets match wherever the class is in the pattern.
    If ^ is the first char in the brackets, then only chars other than the ones in the brackets are matched.
p %w(bad bid bed bod).map { |w| /b[aie]d/.match?(w) }
## => [true, true, true, true]
p %w(bad bid bed bod).map { |w| /b[^aie]d/.match?(w) }
## => [false, false, false, true]

You can specify a range of chars with - instead of specifying them individually (0-9 for digits, a-z for alphabets:

p %w(a1b a3b aeb).map { |w| /a[0-9]b/.match?(w) }
## [true, true, false]
  • Shorthand character classes make regex tolerable.

Use \d instead of [0-9]. Matches any digit:

p /Emp ID: \d{4}/.match?('Emp ID: 3423') # => true

Use \D instead of [^0-9] to match everything except a digit:

p /Ham \D+ Rye/.match?('Ham On Rye') # => true

Use \w instead of [0-9a-zA-Z_]. Matches any ‘word’ (including underscore):

p /a \w+/.match?('a valid_variable_7') # => true

Use \W instead of [^0-9a-zA-Z_] to do the opposite:

p /x \W+ y/.match?('x = y') # => true

Use \s to match any kind of whitespaces (space, tab, carriage return etc)

p ["john", "john  doe"].map { |w| /\w+\s+\w+/.match?(w) }
## => [false, true]

Use \S to match any kind of whitespaces (space, tab, carriage return etc)

p ["john", "  "].map { |w| /\S+/.match?(w) }
## => [true, false]

For all the above, instead of just matching a single preceding char, you can also match a group of preceding chars:

p %w(banana bananana bnana).map { |w|
  /b(an)+ana/.match?(w)
} # => [true, true, false]

Matching Unicode Characters

\w and \d match only ascii letters and numbers. If you want to match letters, numbers, their upper/lower variations from other languages, and even emojis, then you need to use the \p character class.

p /\d/.match?('௩') # => false
p /\p{Digit}/.match?('௩') # => true
## ('௩' is '3' in Tamil)
p /\p{Emoji}/.match?('😉') # => true

Common Regex Usecases

  • To see if a string matches a pattern. Eg: email, ip address:
re = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i
p re.match?("user@example.com") # => true
p re.match?("invalid.email@") # => false
  • To replace parts of a string matching a pattern:
p "hello world".gsub(/world/, 'ruby') # => "hello ruby"
p "hello world".gsub(/\w+/) { |word| word.upcase }
## => "HELLO WORLD"
p "John Doe".gsub(/(?<first>\w+)\s(?<last>\w+)/,
                '\k<last>, \k<first>') # => "Doe, John"
  • To parse and capture substrings:
url_re = %r{
  \A
  (?<protocol>https?://)  # These comments
  (?<domain>[^/]+)        # and whitespaces
  (?<path>/[^?#]*)?       # are ignored.
  (?<query>\?[^#]*)?      # Use them to annotate
  (?<fragment>\#.*)?      # the regex.
  \z
}x # free-spacing mode
md = url_re.match("https://example.com/path?q=1#section")
p md[:protocol]  # => "https://"
p md[:domain]    # => "example.com"
p md[:path]      # => "/path"
p md[:query]     # => "?q=1"
p md[:fragment]  # => "#section"
  • To split strings based on a pattern:
non_word_re = /\W+/
tokens = "20 - 10 == (3 + 2) + 5".split(non_word_re)
p tokens
## => ["20", "10", "3", "2", "5"]

Official Docs

  • The String and Symbol classes also have the =~, match? and match methods with same functionality.
  • Regexp
  • MatchData
  • String’s match

Regex Quotes

  • “The plural of regex is regrets”
  • “Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems”.
  • “they are far and away the most “obvious” (at least, to people who don’t know any better) way to get from point A to point B.”

Be rightly scared of regex but use them with care. Read this.

Next topic: Nil .