Wednesday, September 21, 2011

 

Image sizing and scraping with JQuery and Rails


I know, I know, long time no type. Not my fault, I swear! First I went travelling, then I had a buncha work to do, then I went to San Francisco TechCrunch Disrupt, all of which occupied my time. Well, I suppose, on reflection, all those things were my fault, but, um ... look, I'm back now, OK?

Back with a li'l discussion of image sizing and scraping with Rails and JQuery, for a project I can't really talk about yet. Suffice to say that sometimes I want to calculate the size of various remote images - quite a lot of them, in fact, making performance important - and sometimes I want to scrape all the images from a set of web pages.

I thought the first task was going to be tricky. And maybe it is. But fortunately, someone has solved it for me, via the totally awesome FastImage Ruby gem, for which praise should be heaped upon one sdsykes. It works exactly like advertised, which is to say, like this:


  def self.picture_info_for(url)
    return nil if url.blank?
    begin
      size = FastImage.size(url)
      return size  #[width, height]
    rescue
      logger.info "Error getting info for picture at "+url.to_s
      return Array[0,0]  #this makes sense for my app, but maybe not yours
    end
  end


So, hurrah! This meant the scraping bit was actually tricker. Sure, I could have done it all in Rails, but it's user-facing, and I didn't want the user to have to wait for a bunch of potentially sequential http requests to complete without seeing any results. I could have done it all on the client side, but parsing HTML with Javascript, even with JQuery, sounded painful and fraught with difficulties, compared to using the dead-easy Hpricot gem. So I came up with a compromise I quite like:

1. On the client side: (written using HAML, which I mostly adore)
- @image_urls.each_with_index do |url,idx|
  = link_to url, url
  %div{:id => 'page_'+idx.to_s}     

%script
  $(function() {
  - @image_urls.each_with_index do |url,idx|
    $.ajax({
    url: '/stories/scrape_images?url=#{CGI::escape(url)}',
    success: function(msg){ $('#page_#{idx}').html(msg); },
    error: function(msg){ $('#page_#idx}').html(msg); }
    });
  });

2. On the server, to first create and then respond to that client page:
  def popup_scraped_images
    start_time = Time.now
    @image_urls = []
    @seed = Seed.find(params[:seed_id])
    @seed.active_signals.each do |signal|
      next if signal.main_url.blank?
      @image_urls << signal.main_url
    end
  end

  def scrape_images
    url = params[:url]
    slash = url =~ /[^\/]\/[^\/]/
    host = slash.nil? ? "" : url[0,slash+1]
    html = ''

    page = HTTParty.get(url, :timeout => 5)
    Hpricot(page).search("//img").each do |element|
      img_src = element.attributes["src"]
      img_src = host+img_src if img_src.match(/^\//)
      html += '<img src="'+img_src+'" />' if img_src.match(/^http/)
    end
  end


Hopefully how they all interact is self-explanatory. Et voila - semi-asynchronous Rails/JQuery image scraping, handled on the server side for easy caching if need be later on.

Labels: , , , , , , ,


Comments:
Nice Information Your first-class knowledge of this great job can become a suitable foundation for these people. I did some research on the subject and found that almost everyone will agree with your blog.
Cyber Security Course in Bangalore

 
Writing in style and getting good compliments on the article is hard enough, to be honest, but you did it so calmly and with such a great feeling and got the job done. This item is owned with style and I give it a nice compliment. Better!
Cyber Security Training in Bangalore
 
Top quality blog with unique content and found valuable looking forward for next updated thank you
Ethical Hacking Course in Bangalore
 

Post a Comment

Subscribe to Post Comments [Atom]





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]