bash shell script to find the closest parent directory of several files

Suppose the input arguments are the FULL paths of several files. Say,

/abc/def/file1
/abc/def/ghi/file2
/abc/def/ghi/file3
  1. How can I obtain the directory name /abc/def in a bash shell script?
  2. How can I obtain only file1, /ghi/file2, and /ghi/file3?

Answers


Given the answer for part 1 (the common prefix), the answer for part 2 is straight-forward; you slice the prefix off each name, which could be a done with sed amongst other options.

The interesting part, then, is finding the common prefix. The minimum common prefix is / (for /etc/passwd and /bin/sh, for example). The maximum common prefix is (by definition) present in all the strings, so we simply need to split one of the strings into segments, and compare possible prefixes against the other strings. In outline:

split name A into components
known_prefix="/"
for each extra component from A
do
    possible_prefix="$known_prefix/$extra/"
    for each name
    do
        if $possible_prefix is not a prefix of $name
        then ...all done...break outer loop...
        fi
    done
    ...got here...possible prefix is a prefix!
    known_prefix=$possible_prefix
done

There are some administrivial details to deal with, such as spaces in names. Also, what is the permitted weaponry. The question is tagged bash but which external commands are allowed (Perl, for example)?

One undefined issue — suppose the list of names was:

/abc/def/ghi
/abc/def/ghi/jkl
/abc/def/ghi/mno

Is the longest common prefix /abc/def or /abc/def/ghi? I'm going to assume that the longest common prefix here is /abc/def. (If you really wanted it to be /abc/def/ghi, then use /abc/def/ghi/. for the first of the names.)

Also, there are invocation details:

  • How is this function or command invoked?
  • How are the values returned?
  • Is this one or two functions or commands (longest_common_prefix and 'path_without_prefix`)?

Two commands are easier:

  • prefix=$(longest_common_prefix name1 [name2 ...])
  • suffix=$(path_without_prefix /pre/fix /pre/fix/to/file [...])

The path_without_prefix command removes the prefix if it is present, leaving the argument unchanged if the prefix does not start the name.

longest_common_prefix
longest_common_prefix()
{
    declare -a names
    declare -a parts
    declare i=0

    names=("$@")
    name="$1"
    while x=$(dirname "$name"); [ "$x" != "/" ]
    do
        parts[$i]="$x"
        i=$(($i + 1))
        name="$x"
    done

    for prefix in "${parts[@]}" /
    do
        for name in "${names[@]}"
        do
            if [ "${name#$prefix/}" = "${name}" ]
            then continue 2
            fi
        done
        echo "$prefix"
        break
    done
}

Test:

set -- "/abc/def/file 0" /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3 "/abc/def/ghi/file 4"
echo "Test: $@"
longest_common_prefix "$@"
echo "Test: $@" abc/def
longest_common_prefix "$@" abc/def
set --  /abc/def/ghi/jkl /abc/def/ghi /abc/def/ghi/mno
echo "Test: $@"
longest_common_prefix "$@"
set -- /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3
echo "Test: $@"
longest_common_prefix "$@"
set -- "/a c/d f/file1" "/a c/d f/ghi/file2" "/a c/d f/ghi/file3"
echo "Test: $@"
longest_common_prefix "$@"

Output:

Test: /abc/def/file 0 /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3 /abc/def/ghi/file 4
/abc/def
Test: /abc/def/file 0 /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3 /abc/def/ghi/file 4 abc/def
Test: /abc/def/ghi/jkl /abc/def/ghi /abc/def/ghi/mno
/abc/def
Test: /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3
/abc/def
Test: /a c/d f/file1 /a c/d f/ghi/file2 /a c/d f/ghi/file3
/a c/d f
path_without_prefix
path_without_prefix()
{
    local prefix="$1/"
    shift
    local arg
    for arg in "$@"
    do
        echo "${arg#$prefix}"
    done
}

Test:

for name in /pre/fix/abc /pre/fix/def/ghi /usr/bin/sh
do
    path_without_prefix /pre/fix $name
done

Output:

abc
def/ghi
/usr/bin/sh

Need Your Help

How to plot a second graph instead of color coding in matlab

matlab graph colors

i just started with my master thesis and i already am in trouble with my capability/understanding of matlab.

Floating point math in python / numpy not reproducible across machines

python numpy floating-point blas

Comparing the results of a floating point computation across a couple of different machines, they are consistently producing different results. Here is a stripped down example that reproduces the

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.