By Marek Zaluski
Sourcegraph lets you search code across repositories, supporting three kinds of search patterns: literal patterns, regular expression patterns, and structural patterns. In this article, we’ll take a look at regular expressions patterns and how to use them in Sourcegraph.
Regular expressions, often shortened as regex, help you find code that matches a pattern (including classes of characters like letters, numbers and whitespace), and can restrict the results to anchors like the start of a line, the end of a line, or word boundary.
Start searching with regular expression patterns by toggling the dot asterisk ( ) button towards the right-hand side of the search box. When you mouse over it you’ll receive a tooltip that reads
Enable regular expression.
Once it is highlighted, you're ready to search with regular expressions.
Regular expressions are useful when you're looking for any strings that match a particular pattern. A common pattern in CSS is the RGB hex color code, like #6495ED (cornflower blue) or #663399 (Rebecca purple).
To match RGB hex color codes, we can write a regular expression that has three parts:
Combining these parts into one regular expression, we can use this pattern in a Sourcegraph search. To make the results more relevant, we can add the
lang:css filter so that we target only CSS files.
Regular expressions are useful for finding patterns like this, where certain classes of characters are repeated a certain number of times.
One use case for regular expression search is when you are trying to find examples of file system function calls. You may be interested in functions that read or write files:
writeFile. While you could search for them individually, it can be useful to perform one search that includes results for both functions.
writeFile have a pattern in common: they both end in
File. We can write a regular expression that expresses this pattern like so:
The above search query uses the regular expression syntax of a pipe (
|) character, which signifies “or”. We can read the query as a search for “
write” followed by
If you would like to narrow down the scope of the search, you can modify this pattern further. If you would like to specify that you would like to use the file system prefix of
fs, as in
fs.writeFile, you can add that prefix to the search. Because the
. dot character notation has a special regex meaning (it will match all characters), we will want to escape the dot with a backslash (
Regular expressions are also useful if you’re looking for a variable that can contain a mix of alphabetic and numeric characters, like
id3 and so forth. To narrow down the results, we can add the
In this case
\d+ matches one or more digits.
It’s common to be looking for a pattern with two keywords, separated by any other text in between.
When using regular expressions in Sourcegraph, the space character matches any characters between keywords. So if you search for two words in regular expressions mode, like
auth service, you’ll get results where
service are found on the same line, and any other number of characters (not limited to spaces) may be in between.
When you use spaces in regular expressions in Sourcegraph, the space character is automatically interpreted as replaced by the
.* pattern. This pattern matches any number of characters on the same line (including none). When you’re looking for two words appearing together in a code base, but not necessarily right next to each other, regular expression mode is a useful way to find relevant results.
By default, Sourcegraph finds all occurrences of your search pattern even when it occurs inside of a longer word. Sometimes when you’re searching for a pattern like
count, you’re only interested in functions or variables called
count and not
countItems. In those cases, you should specify that you’re looking for an exact keyword. You can do this with the regular expression
\b, which stands for word boundary.
We can use this to improve the search from our earlier example. You may have noticed that the search for
writeFile returned other functions that started with those patterns, like
readFileAsync. By using a word boundary, we can restrict the search to only match the exact function name.
In the above search we’re adding
\b at the end of the query, but not at the beginning. This way, we can express that we’re looking for matches that end with this pattern, but may have an alternate prefix.
We’ve looked at ways to find matches that are located anywhere in a given line of code. What if you want to narrow down your search to find only the instances when the word is located at the start of the line?
You can use the line start character,
^, and the line end character,
$, to anchor your search. For example, here is a search that matches the word
let when it occurs at the start of the line.
This won’t match lines where there’s anything before
let on the line. That means if there’s whitespace at the beginning of the line, like tabs or spaces for indentation, then that line won’t be considered matches. Since indentation is common in code, you may want to modify the search to include results where
let can be optionally preceded by any amount of whitespace.
This is where the
\s* pattern is useful; it matches any number of whitespace characters (zero or more). By adding it in front of
let, you can now include all results where the line has indentation whitespace.
Regular expressions are a powerful syntax for searching code, and in this tutorial we’ve only covered some fundamental features. Sourcegraph uses the RE2 style of regular expressions, which you can learn more about by reading the RE2 documentation.