Dave Pearsonbeta tester


Hack: Script to export an art list

Warning: This is a solution looking for a problem. It’s also aimed at people who have the ability to execute ruby scripts.

Edit: I’ve now used this tool to knock up a quick and dirty website as a simple test of the idea.

I’ve seen a few people ask for this, and I’ve been after a way to do this myself: a facility on RedBubble that lets you export a full list of your works (or, at the very least, your art) so that you can do something else with it elsewhere.

Today I got to thinking that it should be possible to scrape the data from my public art page. A bit of hacking with ruby later (using its “net/http” and “rexml/document” modules) and I had something that produced a very simple tab-delimited text file containing the work ID and the work title (given these two items it’s pretty easy to infer everything else).

Like I say: this is a bit of a solution looking for a problem right now, but a quick test with Google Docs shows that the file imports nicely as a spreadsheet:

I’m also thinking that such data could be handy as the starting point for writing a tool that generates some sort of front-end (on one of my own sites) into my works on RedBubble.

If you think this sounds interesting, and if you’re able to run ruby scripts, pop over here and grab a copy.

  • littleredplanet

    littleredplanet

    Hey Dave, pretty nifty idea. What format was your art page – html, php generated, etc??

  • Dave Pearsonbeta tester replied

    Not sure I quite follow the question. Which “art page” are you asking about?

  • littleredplanet

    littleredplanet

    Ah, I see what you mean now. I just had a look at your link to the script. I like this idea. I often wonder what I have uploaded here and what I have left out. It’d be nice to have a convenient list to keep track of things.

    I downloaded Ruby onto my slackware machine and tried your script. I’m a Ruby nob and may not be dong something correct. Here’s what was returned when I ran it:
    dale[MyFiles]$ red_bubble_art -h www.redbubble.com -u littleredplanet
    ./red_bubble_art:46:in `get_page_count’: undefined method `text’ for nil:NilClass (NoMethodError)
    from ./red_bubble_art:151
    from /usr/lib/ruby/1.8/net/http.rb:547:in `start’
    from /usr/lib/ruby/1.8/net/http.rb:440:in `start’
    from ./red_bubble_art:148

    I changed the Ruby path to /usr/bin/ruby to see if that helped. No luck. Any ideas?

    cheers
    dale

  • Dave Pearsonbeta tester replied

    Interesting one. In true hacking fashion it WorksForMe™:

    davep@hagbard:~$ getrbart -u littleredplanet | wc -l
    124

    Looking at the line it’s failing on, and the error message, I’m wondering if there’s a difference in the version of REXML or something (although I’m running ruby 1.8 here too).

    I’ll do a little more playing and see if I can come up with a way of having it fail for me in the same way it fails for you.

  • Dave Pearsonbeta tester replied

    Actually, scratch that a little. I didn’t see the obvious. For whatever reason it appears that it isn’t finding the ‘a’ tag in the downloaded page (this is where it gets the max page number from), hence the fact that it’s reporting that `nil’ doesn’t have that method (`text’, which it won’t).

    I should probably add more error checking (there’s almost none) but, even then, I can’t imagine why it would fail like that. I know I’m not the only person to have run the script without any problems.

  • Dave Pearsonbeta tester replied

    dale[MyFiles]$ red_bubble_art -h www.redbubble.com -u littleredplanet

    Silly question time: Is that exactly what you executed? Is that a copy and paste? The reason I ask is that I can get the same error message if, for example, I incorrectly spell your user name:

    davep@hagbard:~$getrbart -u litleredplanet -v
    Getting art page count…
    getrbart:48:in `get_page_count’: undefined method `text’ for nil:NilClass (NoMethodError)

  • Dave Pearsonbeta tester replied

    Okay, I’ve uploaded a new version of the script. While this probably won’t solve your problem it should be a little more user-friendly in how it reports problems. Also, verbose mode now emits a little more information as to what’s happening.

    Can you download it and try the following:

    getrbart -u littleredplanet -v > works.txt

    and, if you get a problem, report exactly what you ran and exactly what it says as a result?

    Thanks.

  • PigleT

    PigleT

    I’m wondering if it failed to retrieve any text around line 44-45. Maybe a check for last.length is in order?

  • Dave Pearsonbeta tester replied

    Indeed (see lack of error checking thing above). The interesting thing here, however, is why it isn’t puling down any HTML (or the right sort of HTML).

  • littleredplanet

    littleredplanet

    Dave, that was a copy and paste. I’ve had a few attempts and am sure it’s not a misspelling.

  • Dave Pearsonbeta tester replied

    Ahh. Okay, so it can’t be the obvious then.

    In that case, on the surface, it looks like your box isn’t able to correctly see RedBubble or, at the very least, the script isn’t managing to get a proper page of art from RedBubble.

  • Dave Pearson

    Dave Pearsonbeta tester

    As a quick test: do you have lynx? If so, can you try this:

    lynx -dump http://www.redbubble.com/people/littleredplanet/art

    and let me know what happens?

  • littleredplanet

    littleredplanet

    bummer! I downloaded the revised version.
    Here’s a copy and paste of the result:

    dale[MyFiles]$ getrbart -u littleredplanet -v > works.txt
    Getting art page count…
    ./getrbart:46:in `get_page_count’: undefined method `text’ for nil:NilClass (NoMethodError)
    from ./getrbart:151
    from /usr/lib/ruby/1.8/net/http.rb:547:in `start’
    from /usr/lib/ruby/1.8/net/http.rb:440:in `start’
    from ./getrbart:148

  • Dave Pearsonbeta tester replied

    Are you sure that you’re running the revised version? That error can’t be on line 46 any more and, besides, you’re missing some of the extra verbose output. Can you check the version please, like this:

    davep@hagbard:~$ getrbart—help
    getrbart v1.4
    Copyright© 2008 by Dave Pearson
    http://www.davep.org/
    ...

    It should be v1.4.

    And, no, this doesn’t need rails. It should work with an “out of the box” ruby 1.8.

  • Dave Pearsonbeta tester replied

    For what it’s worth:

    I’ve got it running just fine on three different boxen. Ruby versions are:
    ruby 1.8.4 (2005-12-24) [i586-linux]
    ruby 1.8.5 (2006-08-25) [i486-linux]
    ruby 1.8.6 (2007-12-03 patchlevel 113) [i686-linux]

    On the other hand: PigleT has a box to hand that displays the same problem.

  • Dave Pearsonbeta tester replied

    Okay, there’s a new version of the script up (v1.5, if you’re not seeing that, be sure to shift-reload it in your browser first to make sure you’ve got the right version).

    This version includes a nasty but crafty kludge that should make it work on your box. Thanks go to PigleT for identifying the problem (he was able to reproduce it on one of his boxes) and coming up with the kludge.

    Give it a go and let us know how you get on.

  • littleredplanet

    littleredplanet

    Dave,
    Lynx gives me the file list.

  • littleredplanet

    littleredplanet

    Should I have rails installed too?

  • PigleT

    PigleT

    I have the same error, for my valid username, on an ubuntu gutsy box that I believe has no rails installed.

    Can I suggest you run through these lines by hand in irb?

    zsh, sauce 1:21PM ruby/ % irb
    irb(main):001:0> require “net/http”
    => false
    irb(main):002:0> require “rexml/document”
    => true
    irb(main):003:0> require “getoptlong”
    => true
    irb(main):004:0>

    The fact the first is `false’ here worries me slightly. What no net/http on my box?? No wonder it fails…

  • Dave Pearsonbeta tester replied

    davep@hagbard:~$ irb
    irb(main):001:0> require “net/http”
    => true
    irb(main):002:0> require “net/http”
    => false
    irb(main):003:0>

    Which should suggest that your IRB has net/http required by default.

  • PigleT

    PigleT

    For the record, the problem here seemed to be mismatch of xml namespaces – the webpage document is in the well-known XHTML namespace, and XPath.match() allows specifying namespaces to search, but REXML’s to_a method doesn’t actually avail itself of the possibility – so it was loading the page into one xmlns but only searching the default namespace. Very silly ;)

  • littleredplanet

    littleredplanet

    v1.5 worked like a charm!

  • Dave Pearsonbeta tester replied

    That’s good to hear.

    Thanks for reporting this and sticking with it. Much appreciated.

  • Xavier Shay

    Xavier Shay works here

    works for me

    ruby 1.8.6 (2007-09-23 patchlevel 110) [i686-darwin8.11.1]

    Maybe you’ve already heard of it, but I personally find Hpricot easier to work with for screen scraping than rexml.

  • Dave Pearsonbeta tester replied

    Cheers Xacier.

    No, I hadn’t seen Hpricot before. That looks very handy and, by the looks of it, results in nice terse code too. Thanks for the pointer, that’s one to go play with.

  • Dave Pearsonbeta tester replied

    Xacier? Yikes! Sorry about the typo on your name Xavier (of all things!).

    Has anyone ever mentioned that comments need an edit option? ;)

  • owlspook

    owlspook

    if I can find the time I’m game for a try at it .. running Linux Mint (debian/ubuntu family) and can get ruby .. my programming experience is limited to early MUDs way back in the dark ages lol … and oh my html/CSS (grin) but never an expert … regardless I found this little place Try Ruby!

    also was hoping W3Schools had Ruby but alas no .. still a good resource if someone hasn’t seen it …

    will try and keep an eye on the development of your script … with time crunches abounding and so many projects ah never enough time .. your script interests me since I’m always looking for a faster way to get things done lol … keep up the good work! (big smile)

  • Dave Pearsonbeta tester replied

    Note that you don’t need to know ruby to run this—not that knowing ruby is ever a bad thing.

  • owlspook

    owlspook

    well sure ya don’t but I always find myself curious about how to do it (big grin) ... the time I spoke of was downloading, installing and making sure it works .. then actually using it (grin) ...

    the tryruby site is cool … I ended up working through the tutorial and it actually was fun (big smile) ... I bookmarked the links and will probably go back and play more … thank you for pointing a direction (big smile)

Add your comment

You need to login or signup to add your comment to this work.