使⽤java的HttpClient实现多线程并发
说明:以下的代码基于httpclient4.5.2实现。
我们要使⽤java的HttpClient实现get请求抓取⽹页是⼀件⽐较容易实现的⼯作:
public static String get(String url) {
CloseableHttpResponseresponse = null; BufferedReader in = null; String result = \"\"; try {
CloseableHttpClienthttpclient = HttpClients.createDefault(); HttpGethttpGet = new HttpGet(url); response = httpclient.execute(httpGet);
in = new BufferedReader(new InputStreamReader(response.getEntity().getContent())); StringBuffersb = new StringBuffer(\"\"); String line = \"\";
String NL = System.getProperty(\"line.separator\"); while ((line = in.readLine()) != null) { sb.append(line + NL); }
in.close();
result = sb.toString(); } catch (IOException e) { e.printStackTrace(); } finally { try {
if (null != response) response.close(); } catch (IOException e) { e.printStackTrace(); } }
return result; }
要多线程执⾏get请求时上⾯的⽅法也堪⽤。不过这种多线程请求是基于在每次调⽤get⽅法时创建⼀个HttpClient实例实现的。每个HttpClient实例使⽤⼀次即被回收。这显然不是⼀种最优的实现。
HttpClient提供了多线程请求⽅案,可以查看官⽅⽂档的《 Pooling connection manager 》这⼀节。HttpCLient实现多线程请求是基于内置的连接池实现的,其中有⼀个关键的类即PoolingHttpClientConnectionManager,这个类负责管理HttpClient连接池。在PoolingHttpClientConnectionManager中提供了两个关键的⽅法:setMaxTotal和setDefaultMaxPerRoute。setMaxTotal设置连接池的最⼤连接数,setDefaultMaxPerRoute设置每个路由上的默认连接个数。此外还有⼀个⽅法setMaxPerRoute——单独为某个站点设置最⼤连接个数,像这样:
HttpHosthost = new HttpHost(\"locahost\ cm.setMaxPerRoute(new HttpRoute(host), 50);
根据⽂档稍稍调整下我们的get请求实现:
package com.zhyea.robin;
import org.apache.http.client.methods.CloseableHttpResponse;import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import java.io.BufferedReader;import java.io.IOException;
import java.io.InputStreamReader;
public class HttpUtil {
private static CloseableHttpClienthttpClient;
static {
PoolingHttpClientConnectionManagercm = new PoolingHttpClientConnectionManager(); cm.setMaxTotal(200);
cm.setDefaultMaxPerRoute(20); cm.setDefaultMaxPerRoute(50);
httpClient = HttpClients.custom().setConnectionManager(cm).build(); }
public static String get(String url) {
CloseableHttpResponseresponse = null; BufferedReaderin = null; String result = \"\"; try {
HttpGethttpGet = new HttpGet(url);
response = httpClient.execute(httpGet);
in = new BufferedReader(new InputStreamReader(response.getEntity().getContent())); StringBuffersb = new StringBuffer(\"\"); String line = \"\";
String NL = System.getProperty(\"line.separator\"); while ((line = in.readLine()) != null) { sb.append(line + NL); }
in.close();
result = sb.toString(); } catch (IOException e) { e.printStackTrace(); } finally { try {
if (null != response) response.close(); } catch (IOException e) { e.printStackTrace(); } }
return result; }
public static void main(String[] args) {
System.out.println(get(\"https://www.baidu.com/\")); }}
这样就差不多了。不过对于我⾃⼰⽽⾔,我更喜欢httpclient的fluent实现,⽐如我们刚才实现的http get请求完全可以这样简单的实现:
package com.zhyea.robin;
import org.apache.http.client.fluent.Request;import java.io.IOException;
public class HttpUtil {
public static String get(String url) { String result = \"\"; try {
result = Request.Get(url) .connectTimeout(1000) .socketTimeout(1000)
.execute().returnContent().asString(); } catch (IOException e) { e.printStackTrace(); }
return result; }
public static void main(String[] args) {
System.out.println(get(\"https://www.baidu.com/\")); }}
我们要做的只是将以前的httpclient依赖替换为fluent-hc依赖:
org.apache.httpcomponents fluent-hc 4.5.2
并且这个fluent实现天然就是采⽤PoolingHttpClientConnectionManager完成的。它设置的maxTotal和defaultMaxPerRoute的值分别是200和100:
CONNMGR = new PoolingHttpClientConnectionManager(sfr);
CONNMGR.setDefaultMaxPerRoute(100); CONNMGR.setMaxTotal(200);
唯⼀⼀点让⼈不爽的就是Executor没有提供调整这两个值的⽅法。不过这也完全够⽤了,实在不⾏的话,还可以考虑重写Executor⽅法,然后直接使⽤Executor执⾏get请求:
Executor.newInstance().execute(Request.Get(url)) .returnContent().asString();
就这样!