Skip to content

Python多线程与协程对比

字数
2634 字
阅读时间
14 分钟
更新日期
6/28/2019

为了优化pocsuite3的并发效率,考虑引入协程,所以做了如下测试。

测试1

本地测试

测试代码

python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 2019/6/21 10:40 AM
# @Author  : w8ay
# @File    : xiechenga.py

# 协程
import asyncio
import queue
import threading
import time
import aiohttp
import requests


class threadpool:

    def __init__(self, threadnum):
        self.thread_count = self.thread_nums = threadnum
        self.queue = queue.Queue()
        self.isContinue = True
        self.thread_count_lock = threading.Lock()

    def push(self, payload):
        self.queue.put(payload)

    def changeThreadCount(self, num):
        self.thread_count_lock.acquire()
        self.thread_count += num
        self.thread_count_lock.release()

    def stop(self):
        self.isContinue = False

    def run(self):
        th = []
        for i in range(self.thread_nums):
            t = threading.Thread(target=self.scan)
            t.setDaemon(True)
            t.start()
            th.append(t)

        # It can quit with Ctrl-C
        try:
            while 1:
                if self.thread_count > 0 and self.isContinue:
                    time.sleep(0.01)
                else:
                    break
        except KeyboardInterrupt:
            exit("User Quit")

    def scan(self):
        while 1:
            if self.queue.qsize() > 0 and self.isContinue:
                p = self.queue.get()
            else:
                break
            try:
                resp = requests.get(p)
                a = resp.text
            except:
                pass

        self.changeThreadCount(-1)


async def request(url):
    try:
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as resp:
                content = await resp.text()
                a = content
    except:
        pass


def aiorequests(url_list):
    loop = asyncio.get_event_loop()
    tasks = []
    for url in url_list:
        task = loop.create_task(request(url))
        tasks.append(task)
    loop.run_until_complete(asyncio.wait(tasks))


if __name__ == '__main__':
    url_list = ["http://www.baidu.com", "http://www.hacking8.com", "http://www.seebug.org", "https://0x43434343.com/",
                "https://lorexxar.cn/", "https://0x48.pw/", "https://github.com"]
    print("测试url:{}".format(repr(url_list)))
    url_list = url_list * 100
    print("数据总量:{}".format(len(url_list)))

    start_time = time.time()
    aiorequests(url_list)
    print("协程 cost time", time.time() - start_time)

    start_time = time.time()
    http_pool = threadpool(50)
    for i in url_list:
        http_pool.push(i)
    http_pool.run()
    print("多线程 cost time", time.time() - start_time)

系统及相关配置环境

Mac Mojave 10.14.5 Python 3.7.2

requests          2.21.0    
aiohttp           3.5.4

其他:线程数 50

带宽

image-20190621144751219

数据总量:210

image-20190621144546954

两者速度差不多,因为线程数量是50,可能时间都花在了启动线程上面。

数据总量:700

image-20190621145136659

测试url:['http://www.baidu.com', 'http://www.hacking8.com', 'http://www.seebug.org', 'https://0x43434343.com/', 'https://lorexxar.cn/', 'https://0x48.pw/', 'https://github.com']
数据总量:700
协程 cost time 94.01872515678406
多线程 cost time 44.3960919380188

数据总量:3500

image-20190621150107622

bash
测试url:['http://www.baidu.com', 'http://www.hacking8.com', 'http://www.seebug.org', 'https://0x43434343.com/', 'https://lorexxar.cn/', 'https://0x48.pw/', 'https://github.com']
数据总量:3500
Fatal read error on socket transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x105ad5ac8>
transport: <_SelectorSocketTransport fd=604 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 60] Operation timed out
Fatal read error on socket transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x105a71160>
transport: <_SelectorSocketTransport fd=465 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 60] Operation timed out
Fatal read error on socket transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x105dbb390>
transport: <_SelectorSocketTransport fd=583 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 60] Operation timed out
Fatal read error on socket transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x1037707f0>
transport: <_SelectorSocketTransport fd=179 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 60] Operation timed out
Fatal read error on socket transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x103175be0>
transport: <_SelectorSocketTransport fd=3365 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 60] Operation timed out
Fatal read error on socket transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x102f1ceb8>
transport: <_SelectorSocketTransport fd=3364 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 60] Operation timed out
Fatal read error on socket transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x106799e48>
transport: <_SelectorSocketTransport fd=3286 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 60] Operation timed out
协程 cost time 301.6471061706543
多线程 cost time 96.64304375648499

可以看到多线程是优于协程。

修改后代码,数据总量700

协程运行时出现了部分报错,主要在ssl的读取写入方面,于是尝试限定协程的并发数量,加入了协程信号量用于限定协程并发数量。

python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 2019/6/21 10:40 AM
# @Author  : w8ay
# @File    : xiechenga.py

# 协程
import asyncio
import queue
import threading
import time
import aiohttp
import requests


class threadpool:

    def __init__(self, threadnum):
        self.thread_count = self.thread_nums = threadnum
        self.queue = queue.Queue()
        self.isContinue = True
        self.thread_count_lock = threading.Lock()

    def push(self, payload):
        self.queue.put(payload)

    def changeThreadCount(self, num):
        self.thread_count_lock.acquire()
        self.thread_count += num
        self.thread_count_lock.release()

    def stop(self):
        self.isContinue = False

    def run(self):
        th = []
        for i in range(self.thread_nums):
            t = threading.Thread(target=self.scan)
            t.setDaemon(True)
            t.start()
            th.append(t)

        # It can quit with Ctrl-C
        try:
            while 1:
                if self.thread_count > 0 and self.isContinue:
                    time.sleep(0.01)
                else:
                    break
        except KeyboardInterrupt:
            exit("User Quit")

    def scan(self):
        while 1:
            if self.queue.qsize() > 0 and self.isContinue:
                p = self.queue.get()
            else:
                break
            try:
                resp = requests.get(p)
                a = resp.text
            except:
                pass

        self.changeThreadCount(-1)


async def request(url, semaphore):
    async with semaphore:
        try:
            async with aiohttp.ClientSession() as session:
                async with session.get(url) as resp:
                    content = await resp.text()
                    a = content
        except:
            pass


def aiorequests(url_list):
    loop = asyncio.get_event_loop()
    semaphore = asyncio.Semaphore(200)
    tasks = []
    for url in url_list:
        task = loop.create_task(request(url, semaphore))
        tasks.append(task)
    loop.run_until_complete(asyncio.wait(tasks))


if __name__ == '__main__':
    url_list = ["http://www.baidu.com", "http://www.hacking8.com", "http://www.seebug.org", "https://0x43434343.com/",
                "https://lorexxar.cn/", "https://0x48.pw/", "https://github.com"]
    print("测试url:{}".format(repr(url_list)))
    url_list = url_list * 100
    print("数据总量:{}".format(len(url_list)))

    start_time = time.time()
    aiorequests(url_list)
    print("协程 cost time", time.time() - start_time)

    start_time = time.time()
    http_pool = threadpool(50)
    for i in url_list:
        http_pool.push(i)
    http_pool.run()
    print("多线程 cost time", time.time() - start_time)

返回结果如下

bash
测试url:['http://www.baidu.com', 'http://www.hacking8.com', 'http://www.seebug.org', 'https://0x43434343.com/', 'https://lorexxar.cn/', 'https://0x48.pw/', 'https://github.com']
数据总量:700
Fatal read error on socket transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x1093470f0>
transport: <_SelectorSocketTransport fd=129 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 60] Operation timed out
Fatal read error on socket transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x109181a90>
transport: <_SelectorSocketTransport fd=84 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 60] Operation timed out
协程 cost time 125.1102237701416
多线程 cost time 37.17827892303467

image-20190621151305461

数据总量:7000

测试url:['http://www.baidu.com', 'http://www.hacking8.com', 'http://www.seebug.org', 'https://0x43434343.com/', 'https://lorexxar.cn/', 'https://0x48.pw/', 'https://github.com']
数据总量:7000
协程 cost time 820.6436007022858
多线程 cost time 197.8958179950714

image-20190621153724104

远程测试

服务商:vultr

由于是国外服务器,测试地址换成了 baidu github google 同时将线程数量调整到20

测试代码

测试代码和上面修改后代码一致。

数据总量 30

root@vultr:~# python3 test.py
测试url:['http://www.baidu.com', 'http://github.com', 'http://google.com']
数据总量:30
协程 cost time 6.938291788101196
多线程 cost time 0.7714250087738037
root@vultr:~#

image-20190621152111696

奇怪的是在把baidu去掉后

image-20190621152246995

root@vultr:~# python3 test.py
测试url:['http://github.com', 'http://google.com']
数据总量:20
协程 cost time 0.5997216701507568
多线程 cost time 0.6937429904937744

两者近乎一样

数据总量 600

当测试url中不含baidu时

root@vultr:~# python3 test.py
测试url:['http://github.com', 'http://google.com']
数据总量:600
协程 cost time 3.049583673477173
多线程 cost time 12.464868545532227

image-20190621152513883

发现协程的效率变高了。

数据总量18000

先将baidu去掉,将数据总量增加到了18000

image-20190621153847906

bash
root@vultr:~# cat result1
测试url:['http://github.com', 'http://google.com']
数据总量:18000
协程 cost time 91.383061170578
多线程 cost time 322.98390197753906
root@vultr:~#

发现协程效率似乎更高。

加入baidu.com后,程序运行了将近50分钟,仍然没有跑完。。。

image-20190621165705242

最后跑完发现

image-20190624103028409

多线程却是足足慢了10倍。

Go语言测试

为了更全面的对比协程,也用golang写了一个demo用作对比,代码如下, 限定了协程并发200

go
package main

import (
    "crypto/tls"
    "fmt"
    "io/ioutil"
    "net/http"
    "sync"
    "time"
)

func Get(url string,ch chan int) (content []byte, err error) {
    transCfg := &http.Transport{
        TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, // disable verify
    }

    Client := &http.Client{
        Timeout:   100 * time.Second,
        Transport: transCfg,
    }
    req, err := http.NewRequest("GET", url, nil)

    if err != nil {
        return nil, err
    }
    resp, err2 := Client.Do(req)
    if err2 != nil {
        return nil, err
    }
    defer resp.Body.Close()
    bytes, _ := ioutil.ReadAll(resp.Body)
    <-ch

    return bytes, nil
}

func main() {
    var s1 = [...]string{"http://www.baidu.com", "http://www.hacking8.com", "http://www.seebug.org", "https://0x43434343.com/", "https://0x48.pw/", "https://lorexxar.cn/", "https://github.com"}
    number := 100 // 倍数
    fmt.Printf("数据总量%d\n", number*len(s1))
    wg := sync.WaitGroup{}
    t1 := time.Now() // get current time
    ch := make(chan int, 200)

    for i := 0; i < number; i++ {
        for _, v := range s1 {
            wg.Add(1)
            ch <- 1
            go func(url string,c chan int) {
                defer wg.Done()
                //fmt.Println(url)
                _, err := Get(url,c)
                if err != nil{
                    fmt.Println(err)
                }
                //fmt.Println(len(content))
            }(v,ch)
        }
        //fmt.Printf("c[%d]: %d\n", i, c[i])
    }
    wg.Wait()
    elapsed := time.Since(t1)
    fmt.Println("elapsed: ", elapsed)

}

数据总量 700

image-20190621170748247

对比python的协程,似乎更慢了一些。。

数据总量 7000

还没有跑出来。。总之很慢。。

测试2

晚上和公司大佬们讨论了下,同时又发现了一个奇怪的现象,有的网址用协程很快,有的网址用协程很慢,最后制定了一个全面的测试逻辑。

  • 将源码分离为协程测试脚本与线程测试脚本
  • 在一个vps上进行测试,公司网络波动比较大
  • 协程的信号量与线程的数目也是控制变量
  • go语言重写为生产者消费者模型

单网址测试

本地测试

同样使用测试1中的代码,只对github.com一个网址进行测试。先测试协程,并设置协程并发为200。

image-20190624133546065

测试1000个网站协程耗时21s

同样在使用多线程,设置线程总数为200

image-20190624133739846

测试1000个网站多线程耗时16s

再次调整为5000个网址,协程并发为200。

image-20190624134146780

测试5000个网站协程耗时133s

image-20190624134442849

测试5000个网站多线程耗时66s

远程测试

为了排除公司网络波动造成的影响,在腾讯云1G1H1M的centos服务器上进行相同测试。

image-20190624134925371

同样测试1000个网址,时间只需要18s

image-20190624135129906

同样的,测试1000个网址协程竟需要38s,可能怀疑协程并发量上面设置了上限的问题,将协程并发量改大了一点,改为8000,结果耗时更长了。

image-20190624135541235

go语言测试

go
package main

import (
    "crypto/tls"
    "fmt"
    "io/ioutil"
    "net/http"
    "sync"
    "time"
)

func Get2(url string) (content []byte, err error) {
    transCfg := &http.Transport{
        TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, // disable verify
    }

    Client := &http.Client{
        Timeout:   100 * time.Second,
        Transport: transCfg,
    }
    req, err := http.NewRequest("GET", url, nil)

    if err != nil {
        return nil, err
    }
    resp, err2 := Client.Do(req)
    if err2 != nil {
        return nil, err
    }
    defer resp.Body.Close()
    bytes, _ := ioutil.ReadAll(resp.Body)

    return bytes, nil
}

func main() {
    var s1 = [...]string{"https://github.com"}
    number := 1000 // 倍数
    fmt.Printf("数据总量%d\n", number*len(s1))
    wg := sync.WaitGroup{}
    t1 := time.Now() // get current time
    consumer := make(chan string,20)

    // 消费者
    for i := 0; i < 1000; i++ {
        go func() {
            for {
                url := <-consumer
                _, err := Get2(url)
                if err != nil {
                    fmt.Println(err)
                }
                wg.Done()
            }
        }()
    }

    // 生产者
    for i := 0; i < number; i++ {
        for _, v := range s1 {
            wg.Add(1)
            consumer <- v
        }
    }
    wg.Wait()
    elapsed := time.Since(t1)
    fmt.Println("elapsed: ", elapsed)

}

不明白同样的网站,效率咋这么低。

image-20190624164221726

多网址测试

测试环境在vultr debian服务器上,只加了两个网站 url_list = ["http://github.com","http://google.com"]

协程和多线程是分开测试的,效果如图

image-20190624165507274

测试url:['http://github.com', 'http://google.com']
数据总量:12000
协程 cost time 63.13658905029297

测试url:['http://github.com', 'http://google.com']
数据总量:12000
多线程 cost time 143.3647768497467

协程效果比多线程好一点。

此时我们在增加几个网站 ,效果如下

python3 test.py
测试url:['http://github.com', 'http://google.com', 'https://0x43434343.com/', 'https://lorexxar.cn/']
数据总量:4000
协程 cost time 20.34762692451477

python3 test.py
测试url:['http://github.com', 'http://google.com', 'https://0x43434343.com/', 'https://lorexxar.cn/']
数据总量:4000
多线程 cost time 58.68550777435303

总结

一开始不明白为什么对于不同网站,协程与线程间差距会如此之大,我只能做一个粗略的大致统计,不具有权威性。

如果网站访问快,协程速度会优于线程,如果网站访问速度慢的话,线程比协程表现好。

HTTP协程底层原理是IO多路复用,参考 https://www.zhihu.com/question/20511233

如果网站访问较快,协程不需要什么消耗,因为是单线程的,而多线程会消耗一些资源,但是如果网站访问比较慢的话,协程也会在连接的时候卡住,因为是单线程的,所以也会一直卡住,而多线程就没有这种烦恼了。

最后,基于各种条件,感觉pocsuite3引入协程还不太成熟,就暂时搁置了

撰写

布局切换

调整 VitePress 的布局样式,以适配不同的阅读习惯和屏幕环境。

全部展开
使侧边栏和内容区域占据整个屏幕的全部宽度。
全部展开,但侧边栏宽度可调
侧边栏宽度可调,但内容区域宽度不变,调整后的侧边栏将可以占据整个屏幕的最大宽度。
全部展开,且侧边栏和内容区域宽度均可调
侧边栏宽度可调,但内容区域宽度不变,调整后的侧边栏将可以占据整个屏幕的最大宽度。
原始宽度
原始的 VitePress 默认布局宽度

页面最大宽度

调整 VitePress 布局中页面的宽度,以适配不同的阅读习惯和屏幕环境。

调整页面最大宽度
一个可调整的滑块,用于选择和自定义页面最大宽度。

内容最大宽度

调整 VitePress 布局中内容区域的宽度,以适配不同的阅读习惯和屏幕环境。

调整内容最大宽度
一个可调整的滑块,用于选择和自定义内容最大宽度。

聚光灯

支持在正文中高亮当前鼠标悬停的行和元素,以优化阅读和专注困难的用户的阅读体验。

ON开启
开启聚光灯。
OFF关闭
关闭聚光灯。

聚光灯样式

调整聚光灯的样式。

置于底部
在当前鼠标悬停的元素下方添加一个纯色背景以突出显示当前鼠标悬停的位置。
置于侧边
在当前鼠标悬停的元素旁边添加一条固定的纯色线以突出显示当前鼠标悬停的位置。