Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Thielen <>
Subject Parsing open graph tags with nutch
Date Wed, 21 Dec 2016 08:00:41 GMT
I have a running nutch 2.3.1/hbase installation that parses/indexes web pages just fine. Now
I need to parse open graph tags (namely og:image, og:description). From several fragments
found on the web I learned that tika basically supports parsing open graph tags, but I am
lost trying to figure out how to integrate this into nutch.

Can someone point me into the right direction? Maybe an example?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message