Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Nagel <wastl.na...@googlemail.com>
Subject Re: Need help on getting HTML content
Date Fri, 16 Dec 2016 15:57:34 GMT
Hi,

the only way is to transform the DOM subtree below the <math> element
back to HTML and then save this HTML string in parse metadata and write
it via an indexing filter as an extra field to the index.

See, e.g., o.a.n.util.DomUtil.saveDom(OutputStream, Element)
for how to "serialize" a DOM subtree.

Best,
Sebastian

On 12/16/2016 07:27 AM, AshokRaj.Lourdusamy@cognizant.com wrote:
> Hi,
> 
> 
> For a particular tag (<math>), I need to save the entire HTML of the tag.
> 
> Now I am able to save only the text content in getText() called in HTMLParser.java. 
> 
> But there is no way to store the HTML content.
> 
> 
> Please share your thoughts on this.
> 
> [math tag.png]
> 
> 
> Thanks in advance,
> 
> -Ashok.
> 
> 
> 
> This e-mail and any files transmitted with it are for the sole use of the intended recipient(s)
and
> may contain confidential and privileged information. If you are not the intended recipient(s),
> please reply to the sender and destroy all copies of the original message. Any unauthorized
review,
> use, disclosure, dissemination, forwarding, printing or copying of this email, and/or
any action
> taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.
Where
> permitted by applicable law, this e-mail and other e-mail communications sent to and
from Cognizant
> e-mail addresses may be monitored.


Mime
View raw message