Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Coffey <mcof...@yahoo.com.INVALID>
Subject Re: indexing to Solr
Date Sat, 17 Dec 2016 21:43:34 GMT
Here is another issue with the official Nutch tutorial.
In the section "Integrate Solr with Nutch" it says to backup the original solr schema.xml
and replace it with one from nutch. It say that the original schema.xml is in the directory
example/solr/collection1/conf. But there is no such directory. When I search for schema.xml,
I get the following.
./solr-5.4.1/example/example-DIH/solr/solr/conf/schema.xml
./solr-5.4.1/example/example-DIH/solr/db/conf/schema.xml
./solr-5.4.1/example/example-DIH/solr/mail/conf/schema.xml
./solr-5.4.1/example/example-DIH/solr/rss/conf/schema.xml
./solr-5.4.1/example/example-DIH/solr/tika/conf/schema.xml
./solr-5.4.1/server/solr/configsets/sample_techproducts_configs/conf/schema.xml
./solr-5.4.1/server/solr/configsets/basic_configs/conf/schema.xml

It's not obvious that any one of these is the right one to use.



      From: lewis john mcgibbney <lewismc@apache.org>
 To: "user@nutch.apache.org" <user@nutch.apache.org> 
 Sent: Monday, November 21, 2016 10:34 AM
 Subject: Re: indexing to Solr
   
Hi Michael,

On Sat, Nov 19, 2016 at 8:09 AM, <user-digest-help@nutch.apache.org> wrote:

> From: Michael Coffey <mcoffey@yahoo.com.invalid>
> To: "user@nutch.apache.org" <user@nutch.apache.org>
> Cc:
> Date: Fri, 18 Nov 2016 21:15:14 +0000 (UTC)
> Subject: indexing to Solr
> Where can I find up-to-date information on indexing to Solr?


http://wiki.apache.org/nutch/NutchTutorial
in particular
https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
If you find any issues with this tutorial then please let us know. Thank
you.


> When I search the web, I find tutorials that use the deprecated solrindex
> command. I also find questions where people want to know why it doesn't
> work.
>

That is because the only official documentation resides at
http://wiki.apache.org/nutch/NutchTutorial


> I have a good nutch 1.12 installation on a working hadoop cluster and a
> Solr 6.3.0 installation which works for their gettingstarted example.
>

You should use the specified version of Solr for the Nutch release. This is
Solr 5.4.1 as defined in the indexer-solr plugin ivy.xml


> I have questions likeDo I need to create a core and a collection in solr?


Yes I would. This is explained at
https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search


> Do I need http or cloud type server?Do I need solr.zookeeper.url ?
>

This is not a Nutch question. This is your preferred Solr configuration. If
you are just starting out then I would say it is not a big deal...
experiment and go with what works best for your requirements and resources
capacity.


> What else needs to be set in nutch-site.xml?
>

Not much. For reference though, here are the Solr configuration options.
https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1750-L1826


> What about schema?
>

This is covered in
https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search


>
> Thanks for all the help so far!
>
>
No problems. Any more issues, ping us here and we will help.
Ta


   
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message