当前位置：网站首页>Use Amazon dynamodb and Amazon S3 combined with gzip compression to maximize the storage of player data

Use Amazon dynamodb and Amazon S3 combined with gzip compression to maximize the storage of player data

2022-07-27 07:34:00 【Amazon cloud developer】

Preface

In some traditional game architectures , use MySQL Store player archive data , Use sub database and sub table to disperse the storage and performance pressure of single database and single table , So as to support more players . As the amount of data grows , In the data table varchar Type can no longer meet the storage requirements of single field in the game , and blob The application of field is the lowest for the transformation cost under this architecture , So some games started when they were originally designed , The database table structure adopts Blob Field as its player's game task 、 Storage of props and other data .

Blob Field in MySQL 5.6 / 5.7 in bug（MySQL Bugs: #96466）, This bug There is a probability that the database cluster will crash , Cause data loss . Even in MySQL 8.0 in , Due to the design limitations of the engine itself , In the single table 20GB above , High frequency updates will lead to limited database performance . And as the table grows , Performance problems will become more and more obvious .

As the game business grows when it explodes , Traditional relational databases are divided into databases and tables , Application transformation is needed , At the same time, there is a certain downtime . And after these extensions are completed , Shrinking in the sunset of the game also requires application transformation , This undoubtedly caused a lot of extra workload to the business development and basic operation and maintenance departments .

Amazon DynamoDB It is very applicable to this scenario . At any stage of business development , Can be realized 0 Expansion of downtime , Auto scaling feature . And all this is completely transparent to the application layer . At the same time, it can also dynamically expand and shrink capacity according to the business load in daily operation and maintenance , So as to further reduce the cost .

MySQL Bugs: #96466：

https://bugs.mysql.com/bug.php?id=96466

summary

This article mainly talks about the game scene , according to Amazon DynamoDB The limitation of （ Each item must be less than 400KB）, Store as much data as possible under the limit and when the storage capacity exceeds the limit , Expand storage to maximize space . Focus on how to use Amazon DynamoDB+Amazon S3 Save the large amount of data attribute in the player's archive , Avoid data presence Amazon S3 After the , On data write Amazon S3 when , Read to... Occurs Amazon S3 Old archive . Simultaneous utilization gzip Compression reduces data size , Reduce IO The overhead improves performance .

Amazon DynamoDB：

https://docs.aws.amazon.com/zh_cn/amazondynamodb/latest/developerguide/ServiceQuotas.html#limits-items

Architecture diagram

Actual code

The goal is

1. Before saving all data gzip Compress , Use after reading gzip decompression .

2.Amazon S3 Storage and Amazon DynamoDB Of binary Field storage can be adaptive . If the compressed user data is greater than the specified value, write Amazon S3, Otherwise, it is directly saved to the fields in the current database project .

3.Amazon DynamoDB When reading items , Parse the extracted fields , If the string s3:// start , Then continue from Amazon S3 Get data in

4. Set up Amazon S3 Read lock field , Judge whether the current state is being written Amazon S3, To block the reading process . In each item, you need to write Amazon S3 It will be set before read_lock by Ture,Amazon S3 After writing successfully, it is set to False. After reading the record ,read_lock Is it True, If the judgment is blocked , The process will wait for a period of time and retry , Until the number of retries exceeds the specified value . Retry after timeout , The reading process will think that the writing process may never succeed for some reason , So it will read_lock Set to False.

First step ： Initialize environment parameters

from time import sleep
import boto3
import gzip
import random
import json
import hashlib
import logging

#  write in S3 Threshold , Beyond this value, data will be written S3, Otherwise, it is saved in the database , The default value is 350KB
UPLOAD_TO_S3_THRESHOLD_BYTES = 358400
#  The goal of user database storage S3 bucket 
USER_DATA_BUCKET = 'linyesh-user-data'
#  encounter S3 There's a read lock , Maximum number of re requests , If the number of times limit is exceeded, the lock will be automatically cleared 
S3_READ_LOCK_RETRY_TIMES = 10
#  encounter S3 There's a read lock , Read request retry interval 
S3_READ_RETRY_INTERVAL = 0.2

dynamodb = boto3.resource('dynamodb')
s3 = boto3.client('s3')
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

* Slide left to see more

Parameter description

UPLOAD_TO_S3_THRESHOLD_BYTES： Limit the maximum data storage length for the field . Unit is ： Number of bytes . because Amazon DynamoDB A project （Item） Data size is limited to 400KB. In addition to the largest field in the data archive, we must also reserve some space for other fields , Avoid the whole Item beyond 400KB.
USER_DATA_BUCKET：Amazon S3 Used to store more than 400KB After the player big field data . It needs to be built ahead of time , Please refer to ： Create buckets （https://docs.aws.amazon.com/zh_cn/AmazonS3/latest/userguide/create-bucket-overview.html）
S3_READ_LOCK_RETRY_TIMES： Limit when players are Amazon S3 When the archive on is in the write state , The number of read request retries . When the project is in the read lock state , The reading process will wait for a period of time and try again .
S3_READ_RETRY_INTERVAL： Read lock , The interval between retry reads , Company ： second .

Be careful ：S3_READ_LOCK_RETRY_TIMES multiply S3_READ_RETRY_INTERVAL In theory, the time of must be less than Amazon S3 Maximum archive upload time , Therefore, the actual use of the code in this article should be adjusted according to the possible size of the archive 2 Parameters . Otherwise, there may be a high probability of dirty reading in the archive .

The second step ： establish Amazon DynamoDB surface

def create_tables():
    """
     Create table 
    :return:
    """
    response = dynamodb.create_table(
        TableName='players',
        KeySchema=[
            {
                'AttributeName': 'username',
                'KeyType': 'HASH'
            }
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'username',
                'AttributeType': 'S'
            }
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 5,
            'WriteCapacityUnits': 5
        }
    )

    # Wait until the table exists.
    response.wait_until_exists()

    # Print out some data about the table.
    logger.debug(response.item_count)

* Slide left to see more

The third step ： Write auxiliary logic

Exponential backoff function

def run_with_backoff(function, retries=5, **function_parameters):
    base_backoff = 0.1  # base 100ms backoff
    max_backoff = 10  # sleep for maximum 10 seconds
    tries = 0
    while True:
        try:
            return function(function_parameters)
        except (ConnectionError, TimeoutError):
            if tries >= retries:
                raise
            backoff = min(max_backoff, base_backoff * (pow(2, tries) + random.random()))
            logger.debug(f"sleeping for {backoff:.2f}s")
            sleep(backoff)
            tries += 1

* Slide left to see more

Amazon S3 Path determination function

def is_s3_path(content):
    return content.startswith('s3://')

* Slide left to see more

Amazon S3 File acquisition

def get_s3_object(key):
    response = s3.get_object(Bucket=USER_DATA_BUCKET, Key=s3_key_generator(key))
    return response['Body']

* Slide left to see more

Check that the size exceeds the limit

def check_threshold(current_size):
     return current_size > UPLOAD_TO_S3_THRESHOLD_BYTES

* Slide left to see more

Amazon S3 Key Generating function

This function can randomly assign players' archives to S3 Different under the barrel Prefix in , This helps to improve Amazon S3 in IO Performance of .

def s3_key_generator(key):  
    s3_prefix = hashlib.md5((key).encode('utf-8')).hexdigest()[:8]  
    return s3_prefix + '/' + key

* Slide left to see more

Upload files To Amazon S3

def upload_content_to_s3(obj_param):  
    s3_key = s3_key_generator(obj_param['key'])  
    try:  
        response = s3.put_object(  
            Body=obj_param['content_bytes'],  
            Bucket=USER_DATA_BUCKET,  
            Key=s3_key)  
        return "s3://%s/%s" % (USER_DATA_BUCKET, s3_key)  
    except Exception as e:  
        logger.error(e)  
        raise e

* Slide left to see more

Step four ： Write the main logic

Write a single item to Amazon DynamoDB database

def put_item(load_data):  
    gzip_data 
= gzip.compress(load_data)  #  compressed data   
    logger.debug(' Compressed size %.2fKB, Original size %.2fKB, compression ratio  %.2f%%' % (  
        len(gzip_data) / 1024.0,  
        len(load_data) / 1024.0,  
        100.0 * len(gzip_data) / len(load_data)))  

    table = dynamodb.Table('players')  
    player_username = 'player' + str(random.randint(1, 1000))  
    if check_threshold(len(gzip_data)):  
        try:  
            #  Read lock protection   
            table.update_item(  
                Key={  
                    'username': player_username,  
                },  
                UpdateExpression="set read_lock = :read_lock",  
                ExpressionAttributeValues={  
                    ':read_lock': True,  
                },  
            )  

            #  Write data to S3  
            s3_path = run_with_backoff(upload_content_to_s3, key=player_username, content_bytes=gzip_data)  
            #  Release the read lock protection , At the same time, data is stored in S3 Up to the path   
            response = table.put_item(  
                Item={  
                    'username': player_username,  
                    'read_lock': False,  
                    'inventory': gzip.compress(s3_path.encode(encoding='utf-8', errors='strict')),  
                }  
            )  
            logger.debug(' Successfully uploaded the record to S3, route :%s' % s3_path)  
        except Exception as e:  
            logger.debug(' Archiving failed ')  
            logger.error(e)  
    else:  
        response = table.put_item(  
            Item={  
                'username': player_username,  
                'inventory': gzip_data,  
            }  
        )  
        logger.debug(' Successfully uploaded the record , username=%s' % player_username)

* Slide left to see more

Read a player record in the database

def get_player_profile(uid):  
    """ 
     Read records  
    :param uid:  The player id 
    :return: 
    """  
    table = dynamodb.Table('players')  
    player_name = 'player' + str(uid)  

    retry_count = 0  
    while True:  
        response = table.get_item(  
            Key={  
                'username': player_name,  
            }  
        )  

        if 'Item' not in response:  
            logger.error('Not Found')  
            return {}  

        item = response['Item']  
        #  Check the lock reading information ,  If there is a lock, set it according to the parameter , Reread the records at intervals   
        if 'read_lock' in item and item['read_lock']:  
            retry_count += 1  
            logger.info(' Current %d Retries ' % retry_count)  
            #  If the record cannot be read after timeout , Then eliminate the read lock , And re read the record   
            if retry_count < S3_READ_LOCK_RETRY_TIMES:  
                sleep(S3_READ_RETRY_INTERVAL)  
                continue  
            else:  
                table.update_item(  
                    Key={  
                        'username': player_name,  
                    },  
                    UpdateExpression="set read_lock = :read_lock",  
                    ExpressionAttributeValues={  
                        ':read_lock': False,  
                    },  
                )  

        inventory_bin = gzip.decompress(item['inventory'].value)  #  Decompress the data   
        inventory_str = inventory_bin.decode("utf-8")  
        if is_s3_path(inventory_str):  
            player_data = gzip.decompress(get_s3_object(player_name).read())  
            inventory_json = json.loads(player_data)  
        else:  
            inventory_json = json.loads(inventory_str)  

        user_profile = {**response['Item'], **{'inventory': inventory_json}}  
        return user_profile

* Slide left to see more

Last , Write test logic

Prepare several different sizes json file , Observe the changes written to the database .

if __name__ == '__main__':  
    path_example = 'small.json'  
    # path_example = '500kb.json'  
    # path_example = '2MB.json'  
    with open(path_example, 'r') as load_f:  
        load_str = json.dumps(json.load(load_f))  
        test_data = load_str.encode(encoding='utf-8', errors='strict')  
    put_item(test_data)  

    # player_profile = get_player_profile(238)  
    # logger.info(player_profile)

* Slide left to see more

If you need to test the read lock , You can combine the read_lock Set manually to True, Then observe the change of reading logic in this process .

Limit

For games with high concurrent archiving requirements for a single user , The above design is not included in the data storage Amazon S3 After the , Consider the scenario of concurrent writing . If there is a need for this scenario , Need some application logic or architecture adjustment .

Conclusion

It is found in this test ,json Format data use gzip after , The compression ratio is about 25% about , In theory, we can put a single project （item） Can store up to 1.6MB Data item of . Even if a small amount of compression exceeds 400KB The data of , It can also be stored in Amazon S3 On , Only in Amazon DynamoDB Metadata and big field data are stored in Amazon S3 Upper path .gzip It will bring some additional calculations and IO expenses , But these costs will mainly fall on the game server , For the database, it reduces IO The cost of .

In most scenarios , Player data rarely exceeds... Even if it is not compressed 400KB. In this case , It is recommended to try to compare the performance data of compression enabled and disabled scenarios . To decide which way is more suitable for your game .

Author of this article

Forestry

Amazon cloud technology solution architect , Responsible for consulting and architecture design of cloud computing solutions based on Amazon cloud technology . Have more than 14 Years of R & D experience , It has created tens of millions of users APP, Multinomial Github Open source project contributors . In the game 、IOT、 Smart city 、 automobile 、 E-commerce and other fields have rich practical experience .