Select-String and Grep

Select-String and Grep

  • Comments 15

Dustin Marx has a blog entry where he compares Unix/Linux, PowerShell and DOS commands.  In it he says, "If there is one Unix command I would love to have in PowerShell, it is the grep command with its regular expression support."  Well Dustin, your wish is our command.  Select-String command to be precise:

PS> Get-Help Select-String

NAME
    Select-String

SYNOPSIS
    Identifies patterns in strings.

SYNTAX
    Select-String [-pattern] <string[]> -inputObject <psobject> [-include <stri
    ng[]>] [-exclude <string[]>] [-simpleMatch] [-caseSensitive] [-quiet] [-lis
    t] [<CommonParameters>]

    Select-String [-pattern] <string[]> [-path] <string[]> [-include <string[]>
    ] [-exclude <string[]>] [-simpleMatch] [-caseSensitive] [-quiet] [-list] [<
    CommonParameters>]

DETAILED DESCRIPTION
    Identifies patterns in strings. By default, Select-String interprets the va
    lue of the Pattern parameter as a regular expression and matches input agai
    nst it. To learn more about regular expressions in Windows PowerShell, type
     get-help about_regular_expression. You can suppress the regular expression
     match by using the SimpleMatch parameter. A simple match attempts to find
    the string specified in the Pattern parameter as a substring of the input.

    The cmdlet makes it easy to search string content from files. It includes a
     Path parameter that supports wildcards and when that parameter is used, th
    e contents of the referenced files are retrieved and matched against the va
    lue of the Pattern parameter.

    Output from the cmdlet is, by default, a MatchInfo object which includes de
    tailed information about the matches. The information is most useful when t
    he input to the cmdlet is retrieved from files. The object includes propert
    ies like Filename and Line, which have the value 'InputStream' when the inp
    ut was not from a file. You can use the Quiet parameter to suppress the out
    put of MatchInfo objects. In that case, the resulting output becomes a bool
    ean value that is true if a match occurred and false otherwise.

    When matching file content, you can use the List parameter to stop after th
    e first match in each input file. You should use this parameter if you only
     require a single match, because it will result in faster matching commands.

There are a ton of great scenarios but here are some of the more common usages:

PS> dir . -recurse |%{ "`n*** $($_.name)"; cat $_}

*** animals.txt
dog
cat
horse
cow

*** fruit.txt
orange
apple
cherry

*** trees.txt
Elm
Maple
Oak
Dogwood
Apple

PS> Set-Alias ss Select-String

PS> ss Dog *

animals.txt:1:dog
trees.txt:4:Dogwood

PS> ss Dog * -CaseSensitive

trees.txt:4:Dogwood

PS> ss ^[cd]o *

animals.txt:1:dog
animals.txt:4:cow
trees.txt:4:Dogwood

PS> ss ^[cd]o -path * -Exclude *an*.txt

trees.txt:4:Dogwood

 

We've expanded Select-String in the next version with a number of additional functions.  One of my favorites is -Context which allows you to specify the number of lines  you want displayed before and after a match.  Check it out:

PS> ss oak *

trees.txt:3:Oak

PS> ss oak * -Context 1,0

  trees.txt:2:Maple
> trees.txt:3:Oak

PS> ss oak * -Context 0,1

> trees.txt:3:Oak
  trees.txt:4:Dogwood

PS> ss oak * -Context 2,1

  trees.txt:1:Elm
  trees.txt:2:Maple
> trees.txt:3:Oak
  trees.txt:4:Dogwood

And last but not least, this is PowerShell so of course we are not going to just emit text, we emit objects which

PS> ss dog * |fl *

IgnoreCase : True
LineNumber : 1
Line       : dog
Filename   : animals.txt
Path       : C:\temp\ss\animals.txt
Pattern    : dog

IgnoreCase : True
LineNumber : 4
Line       : Dogwood
Filename   : trees.txt
Path       : C:\temp\ss\trees.txt
Pattern    : dog

We should have produced an alias from grep to Select-String.

Enjoy!

Jeffrey Snover [MSFT]
Windows Management Partner Architect
Visit the Windows PowerShell Team blog at:    http://blogs.msdn.com/PowerShell
Visit the Windows PowerShell ScriptCenter at:  http://www.microsoft.com/technet/scriptcenter/hubs/msh.mspx

Leave a Comment
  • Please add 7 and 7 and type the answer here:
  • Post
  • PingBack from http://msdnrss.thecoderblogs.com/2008/03/23/

  • I'm looking forward to v2 of Powershell with the context option (I think that's what it is) so that I can get more than just the line that was found but also +/- a line before and after.

    Also, I've seen some scripts that do highlighting (via Write-Host with a foreground color), but any chance we'll see official support for that?

  • The best part about the emitted object is that you can get full access to the match details, match groups, etc.  Having had to do subsequent string parsing on the regex'd match, and finding out things like the string position of the regex match, or the what particular substring was matched was great!

  • Dave,

    The coloring you mention, via write-host, is directly built into the cmdlet, and you can use it with v1.

    PSH>write-host -fore yellow -back red "hello"

  • Jeffery,

    Does that mean one could use get-content -wait and pipe that to Select-String to watch a file and trigger an event when something matches?

    thanks,

    mike

    axel::foley

  • > Does that mean one could use get-content -wait and pipe that to Select-String to watch a file and trigger an event when something matches?

    I have not tried that but I think you'll be able to connect those dots.  You'd need to run it in the background and then register for output events.

    Jeffrey Snover [MSFT]

    Windows Management Partner Architect

    Visit the Windows PowerShell Team blog at:    http://blogs.msdn.com/PowerShell

    Visit the Windows PowerShell ScriptCenter at:  http://www.microsoft.com/technet/scriptcenter/hubs/msh.mspx

  • The behaviour of select-string is very different from grep. Take this example using a pipe. In both a Linux OS and a Windows OS, I have three files. 2.txt found.txt and test.txt that contain the string "found":

    UNIX/Linux bash shell:

    # ls

    2.txt  found.txt  test.txt

    Windows PowerShell:

    > ls

       Directory: Microsoft.PowerShell.Core\FileSystem::C:\aa\monad_doc\test_out

    Mode                LastWriteTime     Length Name

    ----                -------------     ------ ----

    -a---         8/09/2008  10:25 AM         49 2.txt

    -a---         8/09/2008  10:25 AM         49 found.txt

    -a---         8/09/2008  10:25 AM         49 test.txt

    Now I want to find out which files in the folder include the string "found" in their file names:

    UNIX/Linux bash shell:

    # ls | grep found

    found.txt

    Windows PowerShell:

    > ls | select-string found

    2.txt:1:found

    2.txt:2:his found

    2.txt:3:hisfound

    2.txt:4:her found

    2.txt:5:herfound

    found.txt:1:found

    found.txt:2:my found

    found.txt:3:myfound

    found.txt:4:your found

    found.txt:5:yourfound

    test.txt:1:found

    test.txt:2:my found

    test.txt:3:myfound

    test.txt:4:your found

    test.txt:5:yourfound

    bash interprets the command to mean "from all the file names returned by ls, return all instances of file names containing the string "found"

    PowerShell interprets the command to mean "in all the files returned by ls, return filename - colon - instance number per file (reading from the head of the file) - colon - line containing the string "found"

    Now let's dispense with the pipe and see what happens when we operate on one file only:

    UNIX/Linux bash shell:

    # grep found test.txt

    found

    my found

    myfound

    your found

    yourfound

    Windows PowerShell:

    > select-string found test.txt

    test.txt:1:found

    test.txt:2:my found

    test.txt:3:myfound

    test.txt:4:your found

    test.txt:5:yourfound

    bash interprets the command to mean "return all lines containing the string "found" in the file test.txt"

    PowerShell interprets the command to mean "return filename - colon - instance number (reading from the head of the file) - colon line containing the string "found" in the file test.txt"

    This is _just_the_beginning_ of the differences between grep and select-string.

  • What I find myself constantly wanting is a way to get the output equivalent of the --only-matches flag of grep

    The documentation tantalizes you by saying...

    Pattern: the string that was actually matched

    But it's not true - is the expression you used, which is what you'd expect.

    I'd really like to have the actual match text as a property.

    e.g.

    ipconfig | select-string "IPv4 Address" | foreach { $_.Line | select-string "(\d{1,3}.)\d{1,3}" | foreach { $_.Match } } | > list-of-ip-addresses.txt

  • I'm going to answer my own question

    ipconfig | select-string "IPv4 Address" | foreach { $_.Line | where { $_ -match "(\d{1,3}.){3}\d{1,3}" } | foreach { $matches[0] } }

    That was what I'm looking for but it's slightly more cumbersome than a property.

  • Grep is such a powerful and oft-used tool that I should think excellent support in v1 would be at the top of people's list.  I guess the focus was just on management more than processing text, but please note that processing text will be VERY IMPORTANT until everything is PS.  Until then, please give us a nice transition story.  I'll echo the --only-matches flag, the --max-count flag, the --no-filename flag, the --count flag, and the --invert-match flag.  Most of the features of grep are useful and I don't think supporting them would cost much in terms of benefit.

    As an aside, is there discussion somewhere of the poor performance when, say, tailing the last million lines of a file and piping it to grep?  I've resorted to cmd /c "tail ... | grep ...", but that's kludgy, due to nesting of escapes and all that fun.  I tried a hack with lambdas [1], but it has two problems: it doesn't do string interpolation and some unknown problem I have yet to fully characterize.  Do you guys have any story for this?

    [1] http://luke.breuer.com/time/item/PowerShell_better_cmd_handling/481.aspx

  • Here are some performance numbers; when processing lines of text, PS appears to max out at around 4 lines/ms, whereas cmd.exe hits almost 3000 lines/ms for large files:

    http://luke.breuer.com/time/item/PowerShell_pipeline_performance/527.aspx

  • I want a md5sum for power shell, to integrate with my scripts.

  • I want a md5sum to use with my scripts.

  • Please see my howto for basic grep functionality under powershell:

    clintboessen.blogspot.com/.../how-to-grep-in-powershell.html

  • #grep -o '\d+'   output  --only-matching part of a line func how to get?

Page 1 of 1 (15 items)