HomeBig DataUtilizing Apache Solr REST API in CDP Public Cloud

Utilizing Apache Solr REST API in CDP Public Cloud


The Apache Solr cluster is accessible in CDP Public Cloud, utilizing the “Knowledge exploration and analytics” information hub template. On this article we are going to examine how to hook up with the Solr REST API operating within the Public Cloud, and spotlight the efficiency influence of session cookie configurations when Apache Knox Gateway is used to proxy the visitors to Solr servers. Info on this weblog publish might be helpful for engineers growing Apache Solr shopper purposes.

The Apache Solr servers within the Cloudera Knowledge Platform (CDP) expose a REST API, protected by Kerberos authentication. Usually, all of the Solr server cases can deal with visitors when the Solr cluster is operating in a distributed mode. The given Solr server that’s receiving the request from the shopper will ahead the question to all of the servers dealing with shards for the gathering and mix the outcomes earlier than sending again the response to the shopper. For scalability, it’s best to distribute the queries among the many Solr servers in a round-robin style.

When Solr is deployed within the public cloud utilizing the “information exploration and analytics” information hub template, there are two methods to achieve the Solr cluster from a separate shopper host. The primary, simpler strategy is to achieve Solr utilizing Knox Gateway as a proxy. The Apache Knox Gateway is a system that gives a single level of authentication and entry for Apache Hadoop providers in a cluster. Within the CDP Knowledge Hub cluster Knox accepts HTTP primary authentication, so CDP customers can use their workload or machine person credentials for authentication. Primarily based on these credentials Knox will ahead the requests to Solr servers in round-robin, utilizing Kerberos and Easy and Protected GSSAPI Negotiation Mechanism (SPNEGO) on behalf of the authenticated finish person. (See Determine 1)

Determine 1. Sending Solr queries to the Solr cluster by Knox Gateway

Once we connect with Solr by Knox, the Knox Gateway units the KNOXSESSIONID cookie within the HTTPS response. This cookie might be reused and set in every subsequent request, which can drastically enhance the efficiency of dealing with Solr requests.

One other strategy is to hook up with any Solr server occasion instantly, utilizing HTTPS with SPNEGO authentication. On this case the Knox Gateway will not be used. Organising this connection might be more difficult, as no primary authentication is feasible however Kerberos credentials are required. Additionally, if the Solr shopper host is exterior of the CDP surroundings, then all Solr server ports on the employee hosts must be uncovered. (See Determine 2) 

Determine 2. Sending Solr queries on to a Solr Server occasion


To measure the efficiency of the Solr API, we developed a small efficiency benchmark script and executed it from a gateway node of the information hub cluster. The benchmark script is accessible below Apache 2.0 license in this repository. 

The next desk and graph current our benchmark outcomes. We executed quick Solr queries on a really small Solr assortment. We diversified the variety of parallel threads (1..10) and on every thread we executed 100 Solr REST calls utilizing the “curl” command. We examined the Solr API each instantly (connecting to a single given Solr server with out load balancing) and utilizing Knox (connecting to Solr by a Knox Gateway occasion). We repeated the exams each with and with out reusing the cookies despatched again within the HTTPS responses. In all instances, the benchmark script was operating on the gateway host of the Solr information hub cluster. 

Desk 1: Efficiency benchmark outcomes (common response time and throughput) displaying the impact of cookie reuse between subsequent Solr API calls. Colours of the cells correspond to strains visualized in Determine 3.


Determine 3: Efficiency benchmark outcomes (common response) displaying the impact of cookie reuse between subsequent Solr API calls. Colours of the strains correspond to colours utilized in Desk 1.

Our outcomes clearly present how vital it’s to concentrate to make use of the KNOXSESSIONID cookie when connecting to Solr utilizing the Knox Gateway. When the cookie is ready, the efficiency is principally the identical, suggesting that the Knox Gateway will not be the bottleneck for this specific benchmark. Nevertheless, with out setting KNOXSESSIONID we get a really vital efficiency degradation, which is brought on by the truth that the Knox Gateway must authenticate every HTTPS request one after the other, but when this cookie is ready Knox can depend on earlier authentication.


We described two methods to hook up with Solr REST API within the CDP Public Cloud; hopefully the data on this weblog publish will assist you to decide on the most effective one on your mission. Connecting by Knox is preferable because the Knox Gateway supplies load balancing and in addition eases the authentication by eliminating the necessity for shopper aspect Kerberos configuration. Direct connection to the Solr server cases can be attainable and could be strategy if Knox gateway turns into a bottleneck or if the additional routing step made by Knox proves so as to add an excessive amount of additional latency to the visitors. Nonetheless, for many of the instances we advise beginning the mission by utilizing Knox Gateway to achieve Solr, primarily as a result of establishing safe connection and cargo balancing for a direct Solr entry might be more difficult. Utilizing the KNOXSESSIONID cookie will help to achieve efficiency just like the direct setup.


Most Popular

Recent Comments