best way to extract a string from a BATCH of files
i have to process 300+ HTML files, extract a string from each one and place it in a separate text file for import downstream. upside: the string format is identical in each file and is +/- two lines from the same position as well.
i thought maybe using Python, but then i thought PERL might be a better way since this kinda plays to it's backyard.
sadly, i have no access to UNIX/LINUX or i'd just grep it...
this is such an odd client request that i'm a bit goggle-eyed ATM.
so: what is the best way to extract a target string from a BATCH of files?
If you give us more details (i.e. path and name of the files, the string you want to extract, etc) perhaps I may write a Windows Batch .BAT file to achieve this task...
To write a Batch file that successfully run I need a couple additional data, so I made some assumptions. You may help me to fix the details. This is my method:
- Seek for a line that contains ">Text link<". I suppose there is just one; this may be fixed.
- Read the next line. I assumed that each td is located in independent lines; this may be fixed.
- In this line remove the text from beginning of line until value string.
- Replace quotes by $ (the next step cannot process quotes).
- Get the text between $; this is the result.
for /F skip... command may read a wrong line if thefile contains empty lines; this may be fixed.
@echo off setlocal DisableDelayedExpansion findstr /n ">Text link<" thefile.htm > linefound.tmp for /F "delims=:" %%a in (linefound.tmp) do set lineNo=%%a for /F "skip=%lineNo% delims=" %%a in (thefile.htm) do ( set "theLine=%%a" goto continue ) :continue setlocal EnableDelayedExpansion set theLine=!theLine:*value=! set theLine=!theLine:"=$! for /F "tokens=2 delims=$" %%a in ("!theLine!") do set URL=%%a echo Result: %URL%
EDIT no. 2
You are confusing me. Worked the first code or not? The second example you posted in the comments seems not be related to the first one (is the data within second <td> or after [url=http://?). Is it the same problem or a different one? Please, don't assume I know about HTML file format (I don't). I DO know about Batch files, but I can't guess what to do if I have not complete details...
The following Batch file show everything between square brackets that comes IN THE SAME LINE that have the [url=http:// string in the file given in the first parameter:
@echo off for /F "tokens=2 delims=" %%a in ('findstr /n "[url=http://" %1') do echo %%a
As you're already familiar with Grep, why not use a Windows port, such as the Grep in GnuWin32?
Another great way to get a ton of *nix functionality in Windows is Cygwin http://www.cygwin.com