Replace a structure with a different one and preserve some values

I would like to transfer input to output in bash. I tried to use sed but it didn't work - I have it probably wrong. So far I have this (just to try if I can extract id) but it does not work:

sed 's;id="([a-zA-Z:]+)";\\1;p' input

Input

<mediaobject>
    <imageobject id="fig:deployment">
        <caption>Application deployment</caption>
        <imagedata fileref="images/deployment.png" width="90%" />
    </imageobject>
</mediaobject>

Output

<img src="images/deployment.png" width="90%" id="fig:deployment" title="Application deployment" />

Answers


awk is available virtually everywhere that bash is installed and can avoid some of the pitfalls you might encounter with sed (for example if the attributes in the xml are not consistently ordered).

awk '
    ## set a variable to mark that we are in a mediaobject block
    $1=="<mediaobject>" { object=1 }

    ## mark that we have exited the object block
    $1=="</mediaobject>" { object=0 }

    ## if we are in an mediaobject block and we find an imageblock
    $1=="<imageobject" && object==1 { 
        iobject=1                          ## record that we are in an imageblock
        id = substr($2, 5, length($2) - 6) ## this is unnecessary for output
    }

    ## if we have a line with image data
    $1~/<imagedata/ && iobject==1 {
        fileref=substr($2,9,length($2)-8)  ## the path, including the quotations
        width=$3                           ## the width
    }

    ## if we have a caption line
    $1~/<caption>/ && iobject==1 {
        gsub("(</?caption>|^ *| *$)", "")  ## remove xml and leading/trailing whitespace
        caption=$0                         ## record the modified line as the caption
    }

    ## when we arrive at the end of an imageblock
    $1=="</imageobject>" && object==1 {
        iobject=0                                                            ## record it
        printf("<img src=%s %s title=\"%s\" />\n", fileref, width, caption)  ## print record
    }

' input

Although as I mentioned, this code should work equally well no matter how the attributes are orded, it will fail if the attributes on the line change order (which is less likely). If you encounter that problem you can do something like:

## use match to find the beginning of the attribute
## use a nested substr() to pull only the value of fileref (with quotations)
fileref = substr(substr($0, match($0,/fileref=[a-z\/"]+/),RLENGTH),9))

Need Your Help

Github API List all repositories and repo's content

php javascript github github-api

If I was to go about displaying just MY github repositories and their contents on an external website how would i go about doing this? Are there any source code's you can provide me with, if not p...