Background

最近查資料剛好看到 這篇文章 覺得不錯就寫篇Blog來記錄一下這篇文章的內容。

這篇文章主要是為了解釋 python gevent 如何被實作的。但在解釋前要先了解一些 python 的基本知識 pthreads (python threading for multithreading)processes (python’s multiprocessing)

What’s the difference between concurrency and parallelism?

concurrency & parallelism 最主要的差別可以透過下面兩張圖解釋

concurrency

concurrency

parallelism

parallelism

  • concurrency: 當有多個Task的時候會把 Task 切成好幾個小部分當 TaskA 因為其他因素而需要等待時就會切到 TaskB 來執行。
  • parallelism: 某個 Process 重頭到尾都處理某個 Task 而不會切換。

What’s a coroutine?

簡單說 在 python 中 coroutine 是在 single thread 下允許程式來決定程式執行的順序,而有效達成非同步 I/O 的一種方法。

What is a thread?

也被稱作 lightweight process

A thread is a sequence of instructions within a process and it behaves like “a process within a process”.

上面的解釋其實想要表達的是 thread 其實跟 process 很像差別只是在 thread 沒有像是 Process Control Block 一樣的東西。

一個 thread 自身會有以下的東西:

  • thread ID
  • program counter
  • register set
  • stack

threads 彼此間互相共用:

  • code section
  • data section
  • any operating resources which are available to the task

What is a process?

In computing, a process is an instance of a computer program that is being executed.

簡單的說 process 就是在執行中的程式。

Most modern operating systems prevent direct communication between independent processes, providing strictly mediated and controlled inter-process communication (IPC).

兩個獨立的 process 基本上是不能直接互相溝通的,基本上都要透過 OS 層來完成溝通。

一個 Process 基本上含有以下這些 Resourece

  • an image of the executable machine code associated with the program
  • memory, which includes:
    • executable code
    • process-specific data (input and output)
    • call stack that keeps track of active subroutines and/or other events
    • heap which holds intermediate computation data during run time
  • operating system descriptors of resources that are allocated to the process such as file descriptors (unix/linux) and handles (windows), dat sources and sinks
  • security attributes (process owner and set of permissions, e.g. allowable operations)
  • processor state (context) such as registers and physical memory addressing

What’s the difference between threads and processes?

When we launch a python shell or executing a python script. The operating system creates a process in response to us starting the python shell or python script and the primary thread of our process begins executing.

  • single processor: multithreading 只是表示我們可以切換不同的 Task 來執行,但實際在還是只能做一件事。(只是context switch夠快才會讓我們產生並行的錯覺)
  • multiprocessor: threads can be truly concurrent。基本就是在多核CPU的情況下才能同時處理多個Task。

What does that mean in the context of a python application?

python’s (CPython) Global Interpretor Lock (GIL) prevents parallel threads of execution on multiple cores

在 python 中一個很重要的限制條件 Global Interpretor Lock (GIL)。 單一process所產生的 muti-threads 通常還是只分配在相同的 core 中,而不會分散到不同的 cores 中。

舉例來說: a CPU-bound python application will perform badly if we attempt to implement multiple threads。(multiple threads 就變得沒啥意義,context switch 還會讓費資源)

If CPython python has GIL, why do we still use it?

不太確定這一段原文想要表達什麼,但還是摘錄一下重點。 如果你想用 true threading capabilities 的話,Jython 和 IronPython 是不錯的選擇,但是相反的這兩個語言就不一定能夠支援所有 Python 的 Package。

So we cannot execute in parallel with python?

基本上就是如果你想要你的程式達成 parallel 的特性,在 Python 中不要用 muti-thread 而是要用 muti-process。

Reference