Building an HTML document with content from another

I'm have a document A and want to build a new one B using A's node values.

Given A looks like this...

    <div id="section0">
      <h1>Section 0</h1>
        <p>Some <b>important</b> info here</p>
        <div>Some unimportant info here</p>
    <div id="section1">
      <h1>Section 1</h1>
        <p>Some <i>important</i> info here</p>
        <div>Some unimportant info here</div>

When building a B document, I'm using method a.at_css("#section#{n} h1").text to grab the data from A's h1 tags like this:

require 'nokogiri'

a = Nokogiri::HTML(html) do |doc|
  doc.h1 a.at_css("#section#{n} h1").text

So there are three questions:

  1. How do I grab the content of <p> tags preserving tags inside <p>?

    Currently, once I hit a.at_css("#section#{n} p").text it returns a plain text, which is not what's needed.

    If, instead of .text I hit .to_html or .inner_html, the html appears escaped. So I get, for example, &lt;p&gt; instead of <p>.

  2. Is there any known true way of assigning nodes at the document building stage? So that I wouldn't dance with text method at all? I.e. how do I assign doc.h1 node with value of a.at_css("#section#{n} h1") node at building stage?

  3. What's the profit of Nokogiri::Builder.with(...) method? I wonder if I can get use of it...


  1. How do I grab the content of <p> tags preserving tags inside <p>?

    Use .inner_html. The entities are not escaped when accessing them. They will be escaped if you do something like builder.node_name raw_html. Instead:

    require 'nokogiri'
    para = Nokogiri.HTML( '<p id="foo">Hello <b>World</b>!</p>' ).at('#foo')
    doc = do |d|
      d.body do
        d.div(id:'content') do
          d.parent << para.inner_html
    puts doc.to_html
    #=> <body><div id="content">Hello <b>World</b>!</div></body>
  2. Is there any known true way of assigning nodes at the document building stage?

    Similar to the above, one way is:

    puts{ |d| d.body{ d.parent << para } }.to_html
    #=> <body><p id="foo">Hello <b>World</b>!</p></body>

    Voila! The node has moved from one document to the other.

  3. What's the profit of Nokogiri::Builder.with(...) method?

    That's rather unrelated to the rest of your question. As the documentation says:

    Create a builder with an existing root object. This is for use when you have an existing document that you would like to augment with builder methods. The builder context created will start with the given root node.

    I don't think it would be useful to you here.

In general, I find the Builder to be convenient when writing a large number of custom nodes from scratch with a known hierarchy. When not doing that you may find it simpler to just create a new document and use DOM methods to add nodes as appropriate. It's hard to tell how much hard-coded nodes/hierarchy your document will have versus procedurally created.

One other, alternative suggestion: perhaps you should create a template XML document and then augment that with details from the other, scraped HTML?

Need Your Help

Can not access OrderHasComment public methods from prePersist event listener

symfony2 dependency-injection doctrine2

I'm trying to access setUser() method from prePersist() on listener but I get this error:

Why is my HTML form inside a jQuery-UI dialog not submitting?

jquery html ajax jquery-ui

This is the code of an HTML form as shown by Firebug:

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.