Fast load 70,000 files into array and insert into database

A few days ago, I had a problem with a script dealing with a multitude of files on our EMC mass filers. Here's now my working code

unset xP_Array
declare -a xP_Array
export LOG=$HOME/BIN/somelogfile
export OUT=/path/to/device
echo "`date '+%m/%d/%y %T:'` START -- MEM"                          >> $LOG

echo "`date '+%m/%d/%y %T:'` Go to work directory."                                              >> $LOG
cd ${OUT}

echo "`date '+%m/%d/%y %T:'` Fill the array."                                                   >> $LOG
for f in "$OUT"/*XML; do
    xP_Array+=( "${f#$OUT/}" )
echo "`date '+%m/%d/%y %T:'` Get array length."                                               >> $LOG

echo "`date '+%m/%d/%y %T:'` MEM: $Plen FILES TO PROCESS."                                      >> $LOG

echo "`date '+%m/%d/%y %T:'` Check if zero files."                                               >> $LOG
date_fmt='%m/%d/%y %T'
if (( Plen = 0 ))
    printf "%($date_fmt)T: ZERO FILES\n" $(date +%s)                                            >> $LOG

echo "`date '+%m/%d/%y %T:'` Loop."                                                             >> $LOG
for i in "${xP_Array[@]}"
        echo "`date '+%m/%d/%y %T:'` Move file to run directory."                              >> $LOG
        mv $OUT/$i RUN/
        echo "`date '+%m/%d/%y %T:'` PROCESSING "$i"."                                          >> $LOG
        echo "`date '+%m/%d/%y %T:'` Check DB LOAD return value."                                       >> $LOG
        EXIT=`echo $?`
        case $EXIT in
                0) echo "`date '+%m/%d/%y %T:'` COMPLETE."                                      >> $LOG
                mv RUN/"$i" "$ARCH"
                *) echo "`date '+%m/%d/%y %T:'` ERROR. "$i" MOVED TO RECON."                    >> $LOG
                mv RUN/"$i" "$RECON"

echo "`date '+%m/%d/%y %T:'` END -- MEM"                                    >> $LOG

I wonder if it could work faster. I am already working with my DBA to see if the database inserts can be sped up but I wonder if the loop itself could run faster.

Btw, all the echo statements are redirected to a log file that I email myself when the script completes. Are they slowing the script down?

Can this script be optimized to run faster?


for f in "$OUT"/*XML; do
    xP_Array+=( "${f#$OUT/}" )

could be replaced with

pushd "$OUT"
xP_Array=( *XML )

but I don't think that's a big bottleneck in your script.

In your other loop, the only real overhead I see is the repeated calls to date and using command substitution to assign $? to EXIT, which can be done directly (EXIT=$?). I don't think there is anything else to optimize there except the actual DB load.

If you were willing to switch from human readable dates, you could assign the current time (as a UNIX timestamp) to SECONDS, then just reference that variable for the log lines instead of calling date.

SECONDS=$(date +%s)
for i in "${xP_Array[@]}"
    echo "$SECONDS: Move file to run directory." >> $LOG
    # etc

With a new-enough bash (4.2 or later, I think), printf can format the UNIX timestamp as a readable time:

printf "%(%m/%d/%Y %T)T" $SECONDS

The only thing I can think of here are the echo "date... statements and the CASE statement but the execution time for those should be really negligible. I would concentrate in tuning the SQL more than spending time fine tuning the bash script here. If you really want to do something in the script you can use the tips mentioned by @chepner already.

The other way would be, if your file formats are same, then you can merge all the files, or merge 100 files at a time as CSV file and then use the merged CSV file to load External Tables. You can then use the external table to load the actual table. You can start doing that by one file at a time with external table and see how much performance difference is observed.

External tables are quite handy solutions when it comes to data file (if properly formatted) since there is literally not INSERT and or DELETE efforts required, if the data file is present at the designated directory you will find data in the table else you won't (if the data file is absent, issuing a SELECT on the table will cause error and will generate some log files which can be easily taken care of).

Need Your Help

UITextFieldDelegate Causes Exception

objective-c ios delegates uitextfield uitextfielddelegate

Simple Case. I want to create a UIViewController displaying and reacting to the input of a simple UITextField.

Check condition and automatically redirect to a link, html

html xhtml redirect

I'm creating a sample website using xhtml with javascript support. Also using php for server side programming. I need to redirect the webpage to some other page from an html page, after checking some