Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vickyk <>
Subject Dynamic Crawling, URL with query parameters.
Date Wed, 04 Jan 2017 17:20:20 GMT
Hey Guys,

I am crawling the URL which contains few query parameters e.g

I got the crawling working in nutch with various combinations of the query
parameters, I simply injected the urls as the new URL when the parameter
value is changing. There is plenty of possibilities of having various
combinations for the query parameters, having said that there could be the
explosion of the URL's ingested.
Is there a possibility I can avoid entering multiple URL's with different
query parameters, this should be available out of box? 

It would be great if any one had the similar use case and share the
experience in handling such scenario? I am particular about the scale as we
anticipate the query parameters can increases over the period of time.


View this message in context:
Sent from the Nutch - User mailing list archive at

View raw message