Background

最近剛好有需求建立類似雲端相本的需求,所以想說順便把 Python & Boto3 的操作順便整理一下。之前有點隨便 COPY 網路上相關的程式碼就直接用了,剛好最近比較閒一點就順便整理一下。

Installation & Setting

$ mkdir s3
$ cd s3
$ virtualenv .venv
$ source .venv/bin/activate
$ pip install boto3

接著我們建立一個新的 AWS User for s3

Screen-Shot-2019-10-07-at-6.06.37-PM-1

Screen-Shot-2019-10-07-at-6.06.43-PM

再來我們要在本機端建立 aws credentials,依照我們剛剛建立的 User 填入相對應的值。

touch ~/.aws/credentials
touch ~/.aws/config

~/.aws/credentials

[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

~/.aws/config

[default]
region = YOUR_PREFERRED_REGION

Client Versus Resource

Boto3 提供了兩種不同的方式來讓使用者使用這些API:

  • Client: low-level service access
  • Resource: higher-level object-oriented service access
import boto3

s3_client = boto3.client('s3')
or
s3_resource = boto3.resource('s3')

因為 client 的方式比較底層,你可以獲取更多的資訊,但相對的你可能要寫比較多的程式。那我自己基本上都使用 client 的方式來跟 s3 溝通。

Advanced Configurations

ACL (Access Control Lists)

訪問控制列表(ACL)可幫助您管理訪問的對象。它們被認為是管理S3權限的傳統方法。您為什麼要了解它們?如果必須管理對單個物件的訪問,則必須使用ACL。

默認情況下,將物件上傳到S3時,該物件是私有的。如果您想將此對象提供給其他人使用,則可以將對象的ACL設置為在創建時公開。當然你也可以把整個 bucket 設為公開:

second_file_name = create_temp_file(400, 'secondfile.txt', 's')
second_object = s3_resource.Object(first_bucket.name, second_file_name)
second_object.upload_file(second_file_name, ExtraArgs={
                          'ACL': 'public-read'})

Encryption

在S3中您可以使用加密保護數據。S3使用AES-256演算法加密:

third_file_name = create_temp_file(300, 'thirdfile.txt', 't')
third_object = s3_resource.Object(first_bucket_name, third_file_name)
third_object.upload_file(third_file_name, ExtraArgs={
                         'ServerSideEncryption': 'AES256'})

Storage

每個 S3 的物件都會有相對應的存儲類別(storage class)。所有可用的存儲類別均具有很高的耐用性。您可以根據應用程序的性能訪問要求選擇存儲對象的方式。

目前,您可以在S3中使用以下存儲類:

  • STANDARD:經常訪問的數據的默認值
  • STANDARD_IA:用於不經常使用的數據,但是在請求時能夠快速取出
  • ONEZONE_IA:與STANDARD_IA 類似,但將數據存儲在一個可用區中,而不是三個
  • REDUCED_REDUNDANCY:for frequently used noncritical data that is easily reproducible
third_object.upload_file(third_file_name, ExtraArgs={
                         'ServerSideEncryption': 'AES256', 
                         'StorageClass': 'STANDARD_IA'})

Versioning

您應該使用版本控制來保持對象的完整記錄。它還充當防止意外刪除對象的保護機制。當您請求版本對象時,Boto3將檢索最新版本。

當您添加對象的新版本時,該對象總共佔用的存儲空間是其版本大小的總和。因此,如果您要存儲 1GB的對象並創建10個版本,則必須支付10GB的存儲空間。

Uploading a File

這邊我針對 Uploading 的部分做比較詳細的紀錄,畢竟比較常用到。以下是我自己常用的寫法:

upload_file.py

import time
import boto3

def upload_to_aws(file, bucket):
    s3 = boto3.client('s3')

    filename = f'{str(int(time.time() * 1000))}.jpg'

    try:
        s3.upload_fileobj(file, bucket, filename)
        obj_url = "https://{0}.s3-{1}.amazonaws.com/{2}/{3}".format(
            bucket, "ap-northeast-1", "public", filename
        )
        return 1, "upload successful", obj_url
    except FileNotFoundError:
        return -1, "The file was not found", None
    except Exception as e:
        return -1, "something wrong" + str(e), None

s = time.perf_counter()

with open("/Users/taiker/Desktop/test.jpg", "rb") as f:
    err, err_msg, obj_url = upload_to_aws(f, "taiker-s3-example")
    print(err, err_msg, obj_url)

elapsed = time.perf_counter() - s
print(f"{__file__} executed in {elapsed:0.2f} seconds.")
$ python3 upload_file.py
1 upload successful https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public/1570525357741.jpg
upload_file.py executed in 6.85 seconds.

Async Uploading a File

前陣子看了一堆 Async 的文章,所以想說找個時間自己來玩玩,於是就有了這篇文章,以下是上面的程式把它改成 Async 版本。

async_upload_file.py

import time
import boto3
import asyncio

async def upload_to_aws(file, bucket):

    s3 = boto3.client('s3')
    filename = f'{str(int(time.time() * 1000))}.jpg'
    await s3.upload_fileobj(file, bucket, filename)
    obj_url = "https://{0}.s3-{1}.amazonaws.com/{2}/{3}".format(
        bucket, "ap-northeast-1", "public", filename
    )
    print(1, "Upload Successful", obj_url)


async def main():
    events = list()
    for i in range(1):
        with open(f"/Users/taiker/Desktop/test{i}.jpg", "rb") as f:
            await asyncio.gather(upload_to_aws(f, "taiker-s3-example"))

if __name__ == "__main__":

    s = time.perf_counter()

    asyncio.run(main())

    elapsed = time.perf_counter() - s
    print(f"{__file__} executed in {elapsed:0.2f} seconds.")

自己寫到這邊就卡住了,執行結果如下:

$ python3 async_upload_file.py
Traceback (most recent call last):
  File "async_upload_file.py", line 26, in <module>
    asyncio.run(main())
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "async_upload_file.py", line 20, in main
    await asyncio.gather(upload_to_aws(f, "taiker-s3-example"))
  File "async_upload_file.py", line 9, in upload_to_aws
    await s3.upload_fileobj(file, bucket, filename)
TypeError: object NoneType can't be used in 'await' expression

於是就去查了 TypeError: object NoneType can't be used in 'await' expression 的意思。原來是 s3.upload_fileobj() 這個 function 回傳的對象並不是 awaitable。

awaitable 對象必須滿足下面條件之一:

  • A native coroutine object returned from a native coroutine function .

  • A generator-based coroutine object returned from a function decorated with types.coroutine() .

  • An object with an await method returning an iterator.

好吧,看起來似乎沒招,但 google 了一下,發現 AWS 有提供新的 Async AWS SDK for Python,那就立馬來玩一下。

Async AWS SDK for Python

installation

$ pip install aioboto3

Async Uploading a File

那我們就繼續完成上面我們失敗的部分,利用 aioboto3 來完成 async upload 的動作,並且比較沒有使用 async 的情況:

upload_file.py

import time
import boto3

def upload_to_aws(file, bucket):
    s3 = boto3.client('s3')

    filename = f'public/{str(int(time.time() * 1000))}.jpg'

    try:
        s3.upload_fileobj(file, bucket, filename)
        obj_url = "https://{0}.s3-{1}.amazonaws.com/{2}".format(
            bucket, "ap-northeast-1", filename
        )
        return 1, "upload successful", obj_url
    except FileNotFoundError:
        return -1, "The file was not found", None
    except Exception as e:
        return -1, "something wrong" + str(e), None

def main():
    for i in range(5):
        with open(f"/Users/taiker/Desktop/test{i}.jpg", "rb") as f:
            err, err_msg, obj_url = upload_to_aws(f, "taiker-s3-example")
            print(err, err_msg, obj_url)

if __name__ == "__main__":

    s = time.perf_counter()

    main()

    elapsed = time.perf_counter() - s
    print(f"{__file__} executed in {elapsed:0.2f} seconds.")
$ python3 upload_file.py
1 upload successful https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public/1570533384638.jpg
1 upload successful https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public/1570533392180.jpg
1 upload successful https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public/1570533399733.jpg
1 upload successful https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public/1570533406045.jpg
1 upload successful https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public/1570533412706.jpg
upload_file.py executed in 34.97 seconds.

async_upload_file.py

from pprint import pprint
import time
import boto3
import asyncio
import aioboto3

async def upload_to_aws(s3, file_path, bucket):
    try:
        with open(file_path, "rb") as f:

            filename = f'public/{str(int(time.time() * 1000))}.jpg'
            await s3.upload_fileobj(f, bucket, filename)

            obj_url = "https://{0}.s3-{1}.amazonaws.com/{2}".format(
                bucket, "ap-northeast-1", filename
            )

            return 1, "upload successful", obj_url
    except Exception as e:
        return -1, "unable upload to s3" + str(e), None

async def main():
    events = list()
    async with aioboto3.client("s3") as s3:
        for i in range(5):
            events.append(upload_to_aws(s3, f"/Users/taiker/Desktop/test{i}.jpg", "taiker-s3-example"))
        res = await asyncio.gather(*events)

    pprint(res)

if __name__ == "__main__":

    s = time.perf_counter()

    asyncio.run(main())

    elapsed = time.perf_counter() - s
    print(f"{__file__} executed in {elapsed:0.2f} seconds.")

$ python3 async_upload_file.py
[(1,
  'upload successful',
  'https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public/1570533461856.jpg'),
 (1,
  'upload successful',
  'https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public/1570533461870.jpg'),
 (1,
  'upload successful',
  'https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public/1570533461872.jpg'),
 (1,
  'upload successful',
  'https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public//1570533461874.jpg'),
 (1,
  'upload successful',
  'https://taiker-s3-example.s3-ap-northeast-1.amazonaws.com/public/1570533461876.jpg')]
async_upload_file.py executed in 8.28 seconds.

最後執行起來也沒啥大問題,跟預料的差不多,大家可以多利用 aioboto3 來完成AWS非同步的動作。

Presigned URLs

那有些時候我們希望某些private的物件能夠暫時被擁有權限的人使用,這時我們會使用 Presigned url,來幫助我們完成這個需求,相關程式碼如下:

import logging
import boto3
from botocore.exceptions import ClientError
from pprint import pprint

def create_presigned_url(bucket_name, object_name, expiration=3600):
    """Generate a presigned URL to share an S3 object

    :param bucket_name: string
    :param object_name: string
    :param expiration: Time in seconds for the presigned URL to remain valid
    :return: Presigned URL as string. If error, returns None.
    """

    # Generate a presigned URL for the S3 object
    s3_client = boto3.client('s3')
    try:
        # 'ResponseContentType': "image/jpeg"
        response = s3_client.generate_presigned_url('get_object',
                                                    Params={'Bucket': bucket_name,
                                                            'Key': object_name},
                                                    ExpiresIn=expiration)
    except ClientError as e:
        logging.error(e)
        return None

    # The response contains the presigned URL
    return response

res = create_presigned_url("taiker-s3-example", "public/1570533230038.jpg")
pprint(res)

Reference