温馨提示:本文翻译自stackoverflow.com,查看原文请点击:java - what makes Jsoup faster than HttpURLConnection & HttpClient in most cases
httpclient java jsoup optimization httpurlconnection

java - 在大多数情况下,使Jsoup比HttpURLConnection和HttpClient更快的原因

发布于 2020-04-07 11:05:23

我想比较标题中提到的三种实现的性能,我编写了一个Java程序来帮助我做到这一点。main方法包含三个测试块,每个块如下所示:

        nb=0; time=0;
        for (int i = 0; i < 7; i++) {
            double v = methodX(url);
            if(v>0){
                nb++;
                time+=v;
            }
        }
        if(nb==0) nb=1;
        System.out.println("HttpClient : "+(time/ ((double) nb))+". Tries "+nb+"/7");

变量nb用于避免失败的请求。现在方法methodX是以下之一:

    private static double testWithNativeHUC(String url){
        try {
            HttpURLConnection httpURLConnection= (HttpURLConnection) new URL(url).openConnection();
            httpURLConnection.addRequestProperty("User-Agent", UA);
            long before = System.currentTimeMillis();
            BufferedReader bufferedReader= new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream()));
            while (bufferedReader.readLine()!=null);
            return System.currentTimeMillis()-before;
        } catch (IOException e) {
            e.printStackTrace();
            return -1;
        }
    }

    private static double testWithHC(String url) {
        try {
            CloseableHttpClient httpClient = HttpClientBuilder.create().setUserAgent(UA).build();
            BasicResponseHandler basicResponseHandler = new BasicResponseHandler();
            long before = System.currentTimeMillis();
            CloseableHttpResponse response = httpClient.execute(new HttpGet(url));
            basicResponseHandler.handleResponse(response);
            return System.currentTimeMillis() - before;
        } catch (IOException e) {
            e.printStackTrace();
            return -1;
        }
    }

    private static double testWithJsoup(String url){
        try{
            long before = System.currentTimeMillis();
            Jsoup.connect(url).execute().parse();
            return System.currentTimeMillis()-before;
        }catch (IOException e){
            e.printStackTrace();
            return -1;
        }
    }

我得到的输出如下。

网址https://stackoverflow.com

    HttpUrlConnection : 325.85714285714283. Tries 7/7
    HttpClient : 299.0. Tries 7/7
    Jsoup : 172.42857142857142. Tries 7/7

网址https://online.vfsglobal.dz

    HttpUrlConnection : 104.57142857142857. Tries 7/7
    HttpClient : 181.0. Tries 7/7
    Jsoup : 57.857142857142854. Tries 7/7

网址https://google.com/

    HttpUrlConnection : 251.28571428571428. Tries 7/7
    HttpClient : 259.57142857142856. Tries 7/7
    Jsoup : 299.85714285714283. Tries 7/7

网址https://algeria.blsspainvisa.com/book_appointment.php

    HttpUrlConnection : 112.57142857142857. Tries 7/7
    HttpClient : 194.85714285714286. Tries 7/7
    Jsoup : 67.42857142857143. Tries 7/7

网址https://tunisia.blsspainvisa.com/book_appointment.php

    HttpUrlConnection : 439.2857142857143. Tries 7/7
    HttpClient : 283.42857142857144. Tries 7/7
    Jsoup : 144.71428571428572. Tries 7/7

即使重复测试也得出相同的结果,我也没有在请求之间使用睡眠时间来获得快速结果,我相信这不会对结果产生太大影响。

编辑 实际上,我分析了Jsoup的源代码,它表明它使用了带有BufferedInputStream的HttpURLConnection,我想以HttpURLConnection的方式使用这两种方法,但是正如您所看到的,相同的结果是明显的,而且Jsoup似乎比HttpURLConnection,它使用HttpURLConnection!

提前致谢,

查看更多

提问者
younes zeboudj
被浏览
96
Matthias 2020-02-05 20:35

您的基准没有意义。

我为这三个库编写了一个微基准测试,结果没有明显的区别。

Benchmark                                     Mode  Cnt    Score   Error  Units
HttpBenchmark.httpClientGoogle                avgt    2  151.162          ms/op
HttpBenchmark.httpClientStackoverflow         avgt    2  151.086          ms/op
HttpBenchmark.httpUrlConnectionGoogle         avgt    2  235.869          ms/op
HttpBenchmark.httpUrlConnectionStackoverflow  avgt    2  145.162          ms/op
HttpBenchmark.jsoupGoogle                     avgt    2  391.162          ms/op
HttpBenchmark.jsoupStackoverflow              avgt    2  188.059          ms/op

您的测试与我的测试之间只有一个小差异:

  • JSoup设置标头“ Accept-Encoding”,“ gzip”,这将减少带宽
  • JSoup使用更大的缓冲区(32kb)
  • 需要重用HttpClient

在我的测试中,JSoup是最慢的。当然,只有JSoup解析响应。

我的基准:

@Warmup(iterations = 1, time = 3, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Threads(1)
public class HttpBenchmark {

    private static final String GOOGLE          = "https://google.com/";
    private static final String STACKOVERFLOW   = "https://stackoverflow.com";

    private final CloseableHttpClient httpClient = HttpClientBuilder.create().build();

    @Benchmark
    public void httpClientGoogle() throws Exception {
        httpClient(GOOGLE);
    }

    @Benchmark
    public void httpClientStackoverflow() throws Exception {
        httpClient(STACKOVERFLOW);
    }

    @Benchmark
    public void httpUrlConnectionGoogle() throws Exception {
        httpUrlConnection(GOOGLE);
    }

    @Benchmark
    public void httpUrlConnectionStackoverflow() throws Exception {
        httpUrlConnection(STACKOVERFLOW);
    }

    @Benchmark
    public void jsoupGoogle() throws Exception {
        jsoup(GOOGLE);
    }

    @Benchmark
    public void jsoupStackoverflow() throws Exception {
        jsoup(STACKOVERFLOW);
    }

    private void httpClient(final String url) throws Exception {
        final CloseableHttpResponse response = httpClient.execute(new HttpGet(url));
        final BasicResponseHandler basicResponseHandler = new BasicResponseHandler();
        basicResponseHandler.handleResponse(response);
        response.close();
    }

    private void httpUrlConnection(final String url) throws Exception {
        final HttpURLConnection httpURLConnection = (HttpURLConnection) new URL(url).openConnection();
        httpURLConnection.addRequestProperty("Accept-Encoding", "gzip");
        try (final BufferedInputStream r = new BufferedInputStream(httpURLConnection.getInputStream())) {
            final byte[] tmp = new byte[1024 * 32];
            int read;
            while (true) {
                read = r.read(tmp);
                if (read == -1) {
                    break;
                }
            }
        }
    }

    private void jsoup(final String url) throws Exception {
        Jsoup.connect(url).execute().parse();
    }

}