Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Coffey <mcof...@yahoo.com.INVALID>
Subject Re: nutch 1.12 and Solr 5.4.1
Date Thu, 22 Dec 2016 19:44:33 GMT
Thank you very much for replying. I know it's holiday season and you probably have a million
things to do!
OMG, it is working now that I am using the version of SolrUtils you pointed to. I had previously
focused on a version where it uses SystemDefaultHttpClient but not as a static. It seems that
making it static made a critical difference. So this is awesome.
For the record, I would say I am using solrj 5.4.1, based on the presence of the following
files in my Nutch directories.
./apache-nutch-1.12/runtime/local/plugins/indexer-solr/solr-solrj-5.4.1.jar
./apache-nutch-1.12/build/plugins/indexer-solr/solr-solrj-5.4.1.jar

For httpclient, within the nutch.12 directories, I have a lot of jars in my nutch folder.
./apache-nutch-1.12/runtime/local/lib/httpclient-4.3.5.jar
./apache-nutch-1.12/runtime/local/lib/commons-httpclient-3.1.jar
./apache-nutch-1.12/runtime/local/plugins/protocol-httpclient/protocol-httpclient.jar
./apache-nutch-1.12/runtime/local/plugins/indexer-solr/httpclient-4.4.1.jar
./apache-nutch-1.12/runtime/local/plugins/lib-htmlunit/httpclient-4.3.4.jar
./apache-nutch-1.12/runtime/local/plugins/lib-selenium/httpclient-4.5.1.jar
./apache-nutch-1.12/runtime/local/plugins/indexer-cloudsearch/httpclient-4.3.6.jar
./apache-nutch-1.12/build/protocol-httpclient/protocol-httpclient.jar
./apache-nutch-1.12/build/lib/httpclient-4.3.5.jar
./apache-nutch-1.12/build/lib/commons-httpclient-3.1.jar
./apache-nutch-1.12/build/plugins/protocol-httpclient/protocol-httpclient.jar
./apache-nutch-1.12/build/plugins/indexer-solr/httpclient-4.4.1.jar
./apache-nutch-1.12/build/plugins/lib-htmlunit/httpclient-4.3.4.jar
./apache-nutch-1.12/build/plugins/lib-selenium/httpclient-4.5.1.jar
./apache-nutch-1.12/build/plugins/indexer-cloudsearch/httpclient-4.3.6.jar
The hadoop directory has the following httpclient-related jars/posix/hadoop-2.7.2/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/httpclient-4.2.5.jar
/posix/hadoop-2.7.2/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/httpclient-4.2.5.jar
/posix/hadoop-2.7.2/share/hadoop/tools/lib/httpclient-4.2.5.jar
/posix/hadoop-2.7.2/share/hadoop/tools/lib/commons-httpclient-3.1.jar
/posix/hadoop-2.7.2/share/hadoop/common/lib/httpclient-4.2.5.jar
/posix/hadoop-2.7.2/share/hadoop/common/lib/commons-httpclient-3.1.jar

Over on the Solr5 machine, we have./solr-5.4.1/dist/solrj-lib/httpclient-4.4.1.jar
./solr-5.4.1/server/solr-webapp/webapp/WEB-INF/lib/httpclient-4.4.1.jar

thanks again
      From: Furkan KAMACI <furkankamaci@gmail.com>
 To: Michael Coffey <mcoffey@yahoo.com> 
Cc: "user@nutch.apache.org" <user@nutch.apache.org>
 Sent: Thursday, December 22, 2016 10:29 AM
 Subject: Re: nutch 1.12 and Solr 5.4.1
   
Hi Michael,

That dependencies you sent are from ivy cache. I need to know the versions
of Solr and HTTP Client. You problem is probably a jar mismatch between
hadoop and Solr. Nutch 1.12 should work with Solr 5.4.1 as you can check
from here:
https://github.com/apache/nutch/blob/release-1.12/src/plugin/indexer-solr/ivy.xml

So, there maybe a bug at Nutch. Here is a workaround at given issue by you:
https://issues.apache.org/jira/browse/NUTCH-2267 Could you apply it to
SolrUtils.java (
https://github.com/sjwoodard/nutch/blob/master/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrUtils.java)
and check again? If you still get that error, I can try to fix it.

Kind Regards,
Furkan KAMACI

On Thu, Dec 22, 2016 at 6:26 PM, Michael Coffey <mcoffey@yahoo.com> wrote:

> Is it possible to get around this problem by using an older version of
> Solr or Nutch or both?
>
>
> ------------------------------
> *From:* Michael Coffey <mcoffey@yahoo.com.INVALID>
> *To:* "user@nutch.apache.org" <user@nutch.apache.org>; Furkan KAMACI <
> furkankamaci@gmail.com>; Michael Coffey <mcoffey@yahoo.com>
> *Sent:* Tuesday, December 20, 2016 8:41 PM
> *Subject:* Re: nutch 1.12 and Solr 5.4.1
>
> This should work, shouldn't it? But it is not working. I am using Nutch
> 1.12 with the recommended version of Solr (5.4.1) and Hadoop 2.7.2. I
> haven't changed any Java code, but I get a low-level Java error when trying
> to write to the index. Is this not a tested configuration? Based on web
> searching, I know that others have had similar problems, going back several
> months, but I haven't seen any solutions. I did try a couple of variations
> on the patch posted for NUTCH-2267 (a slightly different manifestation) and
> that did not help. I notice that the 2267 patch has been reverted in the
> master branch.
> I am willing to work on some Java code, if necessary, to help resolve
> this. At this point, I don't know what to try next, other than switching to
> ElasticSearch.
>
>      From: Michael Coffey <mcoffey@yahoo.com.INVALID>
>
> To: "user@nutch.apache.org" <user@nutch.apache.org>; Furkan KAMACI <
> furkankamaci@gmail.com>; Michael Coffey <mcoffey@yahoo.com>
> Sent: Monday, December 19, 2016 7:13 PM
> Subject: Re: nutch 1.12 and Solr 5.4.1
>
> Some additional info: I am using solr.server.type=http, not cloud. I have
> tried plugins.include with protocol-http and also with protocol-httpclient.
> My current settings are listed below. Also, I am using hadoop 2.7.2, in
> case that matters.
> <property>
>  <name>plugin.includes</name>
>  <value>protocol-http|urlfilter-regex|parse-(html|
> tika)|index-(basic|anchor)|indexer-solr|scoring-opic|
> urlnormalizer-(pass|regex|basic)</value>
> </property>
>
> <property>
>  <name>solr.server.type</name>
>  <value>http</value>
>  <description>
>    Specifies the SolrServer implementation to use. This is a string value
>    of one of the following 'cloud', 'concurrent', 'http' or 'lb'.
>    The values represent CloudSolrServer, ConcurrentUpdateSolrServer,
>    HttpSolrServer or LBHttpSolrServer respectively.
>  </description>
> </property>
>
> <property>
>  <name>solr.server.url</name>
>  <value>http://solr5-00:8983/solr/nutch-0</value>
>  <description>
>      Defines the Solr URL into which data should be indexed using the
>      indexer-solr plugin.
>  </description>
> </property>
>
>      From: Michael Coffey <mcoffey@yahoo.com.INVALID>
> To: Furkan KAMACI <furkankamaci@gmail.com>; "user@nutch.apache.org" <
> user@nutch.apache.org>
> Sent: Monday, December 19, 2016 5:10 PM
> Subject: Re: nutch 1.12 and Solr 5.4.1
>
> I'm not sure how to do that. According to a find command, I have more than
> one solrj on the nutch machine../hadass/apache-nutch-
> 1.12/runtime/local/plugins/indexer-solr/solr-solrj-5.4.1.
> jar./hadass/apache-nutch-1.12/build/plugins/indexer-solr/
> solr-solrj-5.4.1.jar./.ivy2/cache/org.apache.solr/solr-
> solrj./.ivy2/cache/org.apache.solr/solr-solrj/jars/solr-
> solrj-5.4.1.jar./.ivy2/cache/org.apache.solr/solr-solrj/jars/solr-solrj-4.6.0.jar On
> the solr machine, I have./solr-5.4.1/dist/solrj-lib
> ./solr-5.4.1/server/solr-webapp/webapp/WEB-INF/lib/solr-solrj-5.4.1.jar
> ./solr-5.4.1/docs/solr-solrj
> ./solr-5.4.1/docs/solr-solrj/org/apache/solr/client/solrj
> ./solr-5.4.1/docs/solr-core/org/apache/solr/client/solrj
>
> Should I make the change to SolrUtils.java, mentioned in
> https://issues.apache.org/jira/browse/NUTCH-2267
> Lewis and Stephen might know about this.
>
>      From: Furkan KAMACI <furkankamaci@gmail.com>
> To: Michael Coffey <mcoffey@yahoo.com>; user@nutch.apache.org
> Sent: Monday, December 19, 2016 4:13 PM
> Subject: Re: nutch 1.12 and Solr 5.4.1
>
> Hi Michael,
> Could you check the version of solrj at your Nutch and compare it with
> version of Solr at your server?
> Kind Regards,Furkan KAMACI
> On Dec 20, 2016 1:01 AM, "Michael Coffey" <mcoffey@yahoo.com.invalid>
> wrote:
>
> What is the recommended fix (or workaround) for the "bad return type"
> error related to "Type 'org/apache/http/impl/client/ DefaultHttpClient'
> (current frame, stack[0]) is not assignable to
> 'org/apache/http/impl/client/ CloseableHttpClient'"
> It seems that switching to different versions of Solr has not helped
> (6.3.0, 5.5.3, 5.4.1). FWIW, I have same version of Java on both machines.
>
> OpenJDK Runtime Environment (IcedTea 2.6.8) (7u121-2.6.8-1ubuntu0.14.04.1)
> OpenJDK 64-Bit Server VM (build 24.121-b00, mixed mode)
>
>
>
>      From: Michael Coffey <mcoffey@yahoo.com.INVALID>
>  To: "user@nutch.apache.org" <user@nutch.apache.org>; Michael Coffey <
> mcoffey@yahoo.com>
>  Sent: Saturday, November 19, 2016 8:05 AM
>  Subject: Re: nutch 1.12 and Solr 6.3.0
>
> I think this is what Lewis and Furkan know as NUTCH-2267. I get the same
> problem with Solr 5.5.3.
>
> I really would like to know which versions of nutch/solar work together
> "out of the box".
>
>      From: Michael Coffey <mcoffey@yahoo.com.INVALID>
>  To: "user@nutch.apache.org" <user@nutch.apache.org>
>  Sent: Friday, November 18, 2016 2:04 PM
>  Subject: nutch 1.12 and Solr 6.3.0
>
> I decided to plunge ahead with Solr indexing, but so far it doesn't work.
> The first error I got is listed below. Could it be that I am running JDK 7
> on the nutch server and JDK 8 on the Solr server. As far as I know Nutch
> 1.x won't work with JDK 8 and Solr 6.3 wont work with JDK less than 8. Any
> suggestions or advice?
>
> 16/11/18 13:59:52 INFO mapreduce.Job: Task Id :
> attempt_1479499237600_0021_r_ 000000_0, Status : FAILED
> Error: Bad return type
> Exception Details:
>  Location:
>    org/apache/solr/client/solrj/ impl/HttpClientUtil.
> createClient(Lorg/apache/solr/ common/params/SolrParams;Lorg/
> apache/http/conn/ ClientConnectionManager;)Lorg/ apache/http/impl/client/
> CloseableHttpClient; @58: areturn
>  Reason:
>    Type 'org/apache/http/impl/client/ DefaultHttpClient' (current frame,
> stack[0]) is not assignable to 'org/apache/http/impl/client/
> CloseableHttpClient' (from method signature)
>  Current Frame:
>    bci: @58
>    flags: { }
>    locals: { 'org/apache/solr/common/ params/SolrParams',
> 'org/apache/http/conn/ ClientConnectionManager', 'org/apache/solr/common/
> params/ModifiableSolrParams', 'org/apache/http/impl/client/
> DefaultHttpClient' }
>    stack: { 'org/apache/http/impl/client/ DefaultHttpClient' }
>  Bytecode:
>    0000000: bb00 0359 2ab7 0004 4db2 0005 b900 0601
>    0000010: 0099 001e b200 05bb 0007 59b7 0008 1209
>    0000020: b600 0a2c b600 0bb6 000c b900 0d02 002b
>    0000030: b800 104e 2d2c b800 0f2d b0
>  Stackmap Table:
>    append_frame(@47,Object[#143])
>
> Container killed by the ApplicationMaster.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


   
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message