About Hpricot Demos Hpricot Google Group Hpricot Gem http://github.com/hpricot A Script! http://rdoc.info/projects/hpricot/hpricot The Competition: http://rdoc.info/projects/brynary/nokogiri http://rdoc.info/projects/tenderlove/nokogiri http://nokogiri.org/tutorials/parsing_an_html_xml_document.html This site was built by Dan Bikle http://bikle.com And features the "rectangular look" |
Demonstrations Of Hpricot(Results may be sent to another page or tab)Simple SearchSample Syntax:hpricot_object.search("a").to_html demo:Simple RemoveSample Syntax: hpricot_object.search("a[@href*=maps]").removeI use hpricot_object.to_html to see the effect. Removing comments is a bit more work. See the comments removal demo a bit further down. Stacked SearchSample Syntax: hpricot_object.search("a").search("img").to_html"at" Search is similar to Simple SearchSample Syntax: hpricot_object.at("img[@src*=jpg]").to_html.at() is useful if I need just 1 element. If I want more than 1, I use .search() instead. If I need to "peel-off" an outer tag, I use inner_html combined with .at()Sample Syntax: hpricot_object.at("html/head").inner_htmlIf .search() gives me several elements and I want the first, I could use first (I'd probably use .at() though).Sample Syntax: hpricot_object.search("img[@src*=jpg]").first.to_html demo:Ruby is well suited for working with enumerable objects.And, .search() returns an enumerable object. Sample Syntax: hpricot_object.search("a").map{|e| "<hr /># {e.to_html}" }.sort.to_sHere, I loop through the object and attach an <hr /> to the front of each element. And I sort it by href. I can burrow into an hpricot_object and prepend some HTML to the inside of each element pointed to by the .search() method.Sample Syntax Which prepends an <hr /> to the inside of every <a> element: hpricot_object.search("a").prepend("<hr />").to_htmlI can burrow into an hpricot_object and wrap some HTML around the outside of each element pointed to by the .search() method.Sample Syntax which wraps <h1 /> around the outside of every <a> element: hpricot_object.search("a").wrap("<h1>")An easy way to see the effect is to just run hpricot_object.to_html Or, I could run the original search and then use .search("..") to go up a level: hpricot_object.search("a").search("..").to_html If I want to replace an element with some HTML of my choice, I can use .at() combined with .swap()Sample Syntax: hpricot_object.at("div/a").swap("<h1>hpricot.com</h1>")An easy way to see the effect is to just run hpricot_object.to_html If I know about the parent of the new element, I can search for the parent: hpricot_object.search("div").to_html I can use Simple Search to find element attributes. Perhaps I need a list of href attributes?Sample Syntax: hpricot_object.search("a[@href*=nytimes.com]").map {|e| '<hr />' + e.get_attribute(attrname) }.sort.to_sThe above search is a bit loose. It will match if 'nytimes.com' appears anywhere in href. If I need an exact match, my call to search would look like this: .search("a[@href='http://www.google.com']") Perhaps I want to visualize the enumerable returned by .search() ?Sample Syntax: i = -1; hpricot_object.search("a").map{|e| i+=1;"<hr /># {i}# {e}"}.to_sHere, I use .map() to create an array of numbered HTML strings. Perhaps I want to see just a slice of the enumerable returned by .search() ?Sample Syntax: i = -1; hpricot_object.search("a").map{|e| i+=1;"<hr />#\{i}#\{e}"}[5,11].to_sHere, I get 11 elements from it, starting at element 5. Working with HTML commentsSample Syntax: hpricot_object.search("body").search("*").map{|e| "<hr /># {e}" if e.comment?}.to_sI cannot use .search() to locate HTML comments. I can, however, use .search("*") to get a list of all the nodes. Then, I loop through the list and ask, "Is this a comment?" So, one way to display HTML comments is to use .map() to create an array of HTML strings from a stacked .search() Removing HTML commentsSample Syntax: hpricot_object.search("body").search("*").each{|e| (lst=e.parent.children;e.parent=nil;lst.delete(e)) if e.comment?}Removing HTML comments is similar to displaying them. I remove HTML comments using a stacked search and a loop. I use hpricot_object.to_html to see the effect. Searching For Text NodesSample Syntax: hpricot_object.search("a[text()*='Washington']").to_htmlNotice that this is identical to simple search. I just need to know this format: [text()*='Washington'] If I'm looking for an EXACT match, I use this format: [text()='Washington'] This is similar to searching for an element by its attributes rather than its name. See the href searching example above. Altering Text NodesSample Syntax: hpricot_object.search("*").each {|e| e.content=e.content().gsub(Regexp.new('bikle.com'), 'bikle.com IS MY SITE!') if e.text? }The "getter" method for a text node content is: .content() The "setter" method for a text node content is: .content()= I use hpricot_object.to_html to see the effect. Removing AttributesSometimes I want to remove a JavaScript-onclick attribute from an <a> tag. Sample Syntax: hpricot_object.search("a[@onclick]").remove_attr("onclick")I use hpricot_object.to_html to see the effect. |