Ruby Regex Help

I want to Extract the Members Home sites links from a site. Looks like this

<a href="http://www.ptop.se" target="_blank">

i tested with it this site

http://www.rubular.com/

<a href="(.*?)" target="_blank">

Shall output http://www.ptop.se,

Here comes the code

    require 'open-uri'
    url = "http://itproffs.se/forumv2/showprofile.aspx?memid=2683"
    open(url) { |page| content = page.read()
    links = content.scan(/<a href="(.*?)" target="_blank">/)
    links.each {|link| puts #{link} 
    }
    }

if you run this, it dont works. why not?

Answers


I would suggest that you use one of the good ruby HTML/XML parsing libraries e.g. Hpricot or Nokogiri.

If you need to log in on the site you might be interested in a library like WWW::Mechanize.

Code example:

require "open-uri"
require "hpricot"
require "nokogiri"

url = "http://itproffs.se/forumv2"

# Using Hpricot 
doc = Hpricot(open(url))
doc.search("//a[@target='_blank']").each { |user| puts "found #{user.inner_html}" }

# Using Nokogiri
doc = Nokogiri::HTML(open(url))
doc.xpath("//a[@target='_blank']").each { |user| puts "found #{user.text}" }

Several issues with your code

  1. I don't know what you mean by using #{link}. But if you want to append a '#' character to the link make sure you wrap that with quotes. ie "#{link}"
  2. String.scan accepts a block. Use it to loop through the matches.
  3. The page you are trying to access does not return any links that the regex would match anyway.

Here's something that would work:

require 'open-uri'
url = "http://itproffs.se/forumv2/"
open(url) do |page|
    content = page.read()
    content.scan(/<a href="(.*?)" target="_blank">/) do |match|
        	match.each { |link| puts link}
        end
end

There're better ways to do it, I am sure. But this should work.

Hope it helps


Need Your Help

build android .apk on the server side? what do I need to setup?

android air adobe apk server-side

I'm working on a project that requires me to give users the ability to input some data into a webform and when submitting the data, an .apk will be generated based on the given information for them...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.