Grep and Unicode

04/20/09

Grep and Unicode

I really like grep. Windows Search Containing Text seldom seems to give me the results I want. Grep works, it’s fast, and it lets me use regular expressions.

I’m used to an old version of Borland’s Turbo GREP that shipped with Delphi. It’s old, it doesn’t work well when files have long lines, and it doesn’t support Unicode (particularly UTF-16). Microsoft’s SQL Server Management Studio has a nasty habit of saving SQL text files as UTF-16, so I don’t always find the saved query I’m looking for.

I found out that Windows XP (and up) has a utility called FINDSTR that acts much like grep (type “help findstr” in a command prompt for more info). Unfortunately, it doesn’t support Unicode either. See http://stackoverflow.com/questions/408079/findstr-or-grep-that-autodetects-chararacter-encoding-utf-16.

PowerShell comes to the rescue. It appears to support Unicode/UTF-16, at least if the byte order mark is present. See http://kevin-berridge.blogspot.com/2008/06/powershell-grep.html. I think the first comment on that post is half right; the issue is that “ls” in PowerShell is an alias for Get-ChildItem and is returning a collection of FileInfo and DirectoryInfo objects. In UNIX, the output of ls is just text, so that’s all grep can operate on. PowerShell pipes objects, so it is more powerful (albeit sometimes trickier).


Your Host: webmaster@truewill.net
Copyright © 2000-2013 by William Sorensen. All rights reserved.