_ hpricot.com
| blog | demos | contact | hpricot alternatives

What is Hpricot?

Hpricot is not maintained now but it is historically significant.

Also I see evidence it is still used:

http://rubygems.org/search?query=hpricot

Hpricot is technology which connects your software to someone else's webpage.

For example, suppose you want to copy all the links from http://apartments.oodle.com into a spreadsheet?

How would you do this?

Hpricot is well suited for this type of task.

How would you design technology like Hpricot?

I would start by asking, "How do I perceive a webpage?"

I perceive it as a rectangular two dimensional page-like-object which has both text and images.

This is similar to how I understand a page in a magazine, newspaper, or book.

But suppose I want to read and understand the sentences on a page of text; how is that done?

Many years ago, when I learned how to read, my parents and teachers taught me to start in the upper left corner and slowly move my vision to the right.

I collect words in my mind from the page.

Then after I collect 4 or 5 words I compare the word-phrase to other phrases I have heard, spoken, or read.

Something magical then happens and I "understand" the phrase.

I add this understanding to my short-term memory.

Then I move to the next set of words on the page.

Eventually I collect enough phrases from the page into my mind. Perhaps at some point I then understand the page.

What's obvious about the above scenario is that my mind transforms the two dimensional page into a one dimensional stream of words and then I apply my reading ability to understand the page.

Hpricot can be used in a similar way.

I can use Hpricot to read a webpage as a one dimensional stream of HTML-tokens into a second collection of tagged tokens.

I, the Hpricot operator, can then search through the tagged tokens and ask for tokens which interest me.

For example, I could ask for a list of all the images displayed at http://www.artnet.com/

Once I have the interesting images, I could then pivot my use of Hpricot.

You see, Hpricot can be used as both a reader AND a writer of HTML.

For example, I could use Hpricot to build a slide-show type page which has the images as the slides.

The use-cases for Hpricot are endless.

For example Google uses similar technology to transform webpages into streams of tokens.

Does Google use Hpricot?

I assume that instead of Hpricot, Google uses proprietary software which is faster (but perhaps more difficult to use).

Google reads from the web, then Google uses technology like Hpricot to write to the web.

For example the word Tiger would be interersting in the page:

http://en.wikipedia.org/wiki/Big_cat

Google then builds an object, object-Tiger, which has URL,

http://en.wikipedia.org/wiki/Big_cat , inside.

Next, Google goes looking for Tiger at other URLs using software like Hpricot.

If Tiger is found, the corresponding URL is added to object-Tiger using software like Hpricot.

Eventually Google knows all the URLs which correspond to Tiger.

Then if I search for Tiger, Google will use software like Hpricot to write a webpage which has object-Tiger wrapped in some useful HTML.

Then Google will serve that page to my browser.

Can you think of a way to pull data from your favorite website(s), enhance the data somehow, and then serve it?

If yes, Hpricot is for you.

blog | demos | contact | hpricot alternatives