Wilson Mar bio photo

Wilson Mar

Hello. Join me!

Email me Calendar Skype call 310 320-7878

LinkedIn Twitter Gitter Google+ Youtube

Github Stackoverflow Pinterest

Find, grep, sed stuff on your hard drives using regular expressions


Overview

Here are my notes on finding (and replacing) stuff within a Mac, including searching inside files and file metadata.

Spotlight = mdfind

MacOS comes with a GUI called Spotlight. It maintains, automatically running in the background, a database that indexes every file and its metadata (such as date modified, etc.).

Spotlight’s database and functionality is available in the command mdfind. Anything Spotlight can find, mdfind can find it too. The command to search for the word “essay” only within the Documents folder:

mdfind -onlyin ~/Documents essay
   

If you don’s use Spotlight or mdfind, turn off its indexing overhead entirely:

mdutil -i off

If Spotlight’s indexing isn’t working the way it should, erase the index and rebuild it from scratch:

mdutil -E

Terminal commands:

  1. cd to the folder you want searched. For example:

    cd ~/gits/wilsonmar/wilsonmar.github.io/_posts

    PROTIP: Drill down into the lowest folder you can. If you are too high in the folder hierarchy, you’ll encounter messages like these about protected folders and files not processed:

    find: ./.DocumentRevisions-V100: Permission denied
    find: ./.fseventsd: Permission denied
    find: ./.MobileBackups: Permission denied
    find: ./.Spotlight-V100: Permission denied
    find: ./.Trashes: Permission denied
    find: ./dev/fd/3: Not a directory
    find: ./dev/fd/4: Not a directory
    find: fts_read: Permission denied
    

    Grep Utilities

  2. See the version of the Grep utility installed:

    grep --version

    example response:

    grep (BSD grep) 2.5.1-FreeBSD

    PROTIP: The Mac was created from a base of BSD (Berkeley System Distribution) and Linux goodness is added.

  3. Display just the filenames containing the word “foo” throughout the whole drive:

    
    grep -r -l "foo" .
    

    https://www.cyberciti.biz/faq/howto-recursively-search-all-files-for-words/

    grep [options] PATTERN [FILE...]
    
    

-F, –fixed-strings Interpret PATTERN as a list of fixed strings, separated by new- lines, any of which is to be matched.

-x, –line-regexp Select only those matches that exactly match the whole line.

-q, –quiet, –silent Quiet; do not write anything to standard output. Exit immedi- ately with zero status if any match is found, even if an error was detected. Also see the -s or –no-messages option. </pre>

### Use find command to show text within files

  1. For example, find text “»»» HEAD” within files by diving recursively into the current folder and down (represented by a period) :

    grep -ri ">>>>>> HEAD" .
    

    TIP: Remove the i to not ignore case distinctions.

  2. To find text “»»» HEAD” within files of “.md” type at the current folder path:

    
    find . –name "*.md" –print | xargs grep ">>>>>> HEAD"
    
  3. Use -print0 option to find filenames that contain spaces or other metacharacters:

    
    find /path/to/dir -type f -print0 | xargs -0 grep -l "foo"
    

    Find file names

    See 15 Practical Linux Find Command Examples

  4. Using the GNU find command:

    
    find / -type f -name postgresql 2> /dev/null
    

    The “2>/dev/null” sends errors to a null device so you don’t see them on the Console.

  5. Find files using file-name ( case in-sensitve find)

    
    find -iname "MyCProgram.c"
    
  6. Execute commands on files found by the find command:

    
    find -iname "MyCProgram.c" -exec md5sum {} \;
    
  7. Find all empty files in home directory:

    
    find ~ -empty
    

    WARNING: There is a lot of these.

  8. Find the word “server” with case -insensitive in a file:

    grep -i Server /etc/ntp.conf
    

    The “-i” for insensitive capitalization.

    The response contains the word “server” searched:

    server time.apple.com.
    
  9. Find lines that don’t (-v to reverse search) begin with # (specified by a ^) or blank lines (specified by a ^) to the end of line $:

    grep -ve'^#' -ve'^$'/etc/ntp.conf
    

    The response:

    server time.apple.com.
    

Last 10 files modified anywhere

Time utility

  1. Find the last word “server” with case -insensitive in a file:

    
    time find . -xdev -type f -print0 | xargs -0 stat -f "%m%t%Sm %N" | sort -rn | head -n 10 | cut -f2-
    

    The time utility captures how long the command takes to run. It’s needed because the “find .” command looks at every file in the whole operating system.

    real  1m1.537s
    user  0m32.894s
    sys   0m29.795s
    

    Stat utility

  2. NOTE: Use the Linux stat command:

    
    stat -f "%m%t%Sm %N" /tmp/* | sort -rn | head -10 | cut -f2-
    

    which returns, for example:

    Mar 25 16:38:45 2018 /tmp/wbxgpc.wbt
    
  3. This is such a useful command that you can make an alias of it in ~/.bash_profile:

    alias last10="stat -f "%m%t%Sm %N" /tmp/* | sort -rn | head -10 | cut -f2-"
    

Regular Expressions

We’ll use the spelling dictionary of English words that comes with Linux.

  1. Search for words ending with “fine”:

    grep 'fine$' /usr/share/dict/words
    

    $ (Shift+4) specifies search from the end of string.

    Responses include “refine”.

  2. Search for words beginng with “fine”:

    grep '^fine' /usr/share/dict/words
    

    ^ (Shift+6) specifies search from the beginning of string.

    Responses include “refine”.

  3. Search for “fine” anywhere within the line:

    grep 'fine' /usr/share/dict/words
    
  4. Search for lines containing “fine” anywhere within words:

    grep 'fine' /usr/share/dict/words
    

    Whitespace

  5. Search for a space before “system” in lines, such as either “system” or “file system”:

    grep '\ssystem' /etc/ntp.conf
    
  6. Search for any word boundary after “server” in lines, such as “servers” or “serverless”:

    grep 'server\b' /etc/ntp.conf
    

    PROTIP: The response “server time.apple.com” is the server used for Network Time Protocol used to update your machine’s clock.

  7. Return lines that do not start with # for comment:

    grep -v '^\s*#' /etc/hosts
    
  8. Search for specific characters C or c:

    grep '[Cc]'
    

    Locate command database

    sudo locate whatever

    WARNING: The locate database (/var/db/locate.database) does not exist.
    To create the database, run the following command:
     
      sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.locate.plist
     
    Please be aware that the database can take some time to generate; once
    the database has been created, this message will no longer appear.
    

    Quantifiers

  9. Search for “color” or “colour” in any line, using a combination, including a ? to specify the previous character u as an optional character:

    grep '\b[Cc]olou?r\b' /usr/share/dict/words
    

    u+ matches one or more occurrences.

    u* matches zero or more times.

    u{4} matches exactly four occurrences.

    Enhanced grep:

  10. Return words with five consequtive vowel characters:

    grep -E '[aeiou]{5}' /usr/share/dict/words
    

    BTW, “euouae” (pronounced your-you-ee”) consists only of vowels.

    Regex ranges

  11. Search for characters, upper and lower case ranges from A to Z, plus underscores:

    grep '[A-Za-z_]' ???
    
  12. Search for just numbers range 1 through 9:

    grep '[0-9]' ???
    

Sed

The Linux sed utility replaces contents within a file.

To delete empty lines and comment lines:


   sed -i.bak '/^s*#/d;/^$/d' somefile
   

The -i generates a backup (file .bak in this case).

The semicolon separates multiple specifications.

Windows

https://www.makeuseof.com/tag/search-file-contents-windows/

Hidden files

If you are searching for hidden files:

How to Search the Content of Your Files on Windows

Using Utility programs

KDiff3

P4Merge

Using IDE

Within Eclipse, press ctrl+F for the Find dialog.

Press ctrl+H for Find & replace.

Search are multi-line by default in Eclipse when you are using regex:

(\@Length\(max = L_255)\)([\r\n\s]+private)