当前位置:网站首页>Use Amazon dynamodb and Amazon S3 combined with gzip compression to maximize the storage of player data
Use Amazon dynamodb and Amazon S3 combined with gzip compression to maximize the storage of player data
2022-07-27 07:34:00 【Amazon cloud developer】

Preface
In some traditional game architectures , use MySQL Store player archive data , Use sub database and sub table to disperse the storage and performance pressure of single database and single table , So as to support more players . As the amount of data grows , In the data table varchar Type can no longer meet the storage requirements of single field in the game , and blob The application of field is the lowest for the transformation cost under this architecture , So some games started when they were originally designed , The database table structure adopts Blob Field as its player's game task 、 Storage of props and other data .
Blob Field in MySQL 5.6 / 5.7 in bug(MySQL Bugs: #96466), This bug There is a probability that the database cluster will crash , Cause data loss . Even in MySQL 8.0 in , Due to the design limitations of the engine itself , In the single table 20GB above , High frequency updates will lead to limited database performance . And as the table grows , Performance problems will become more and more obvious .
As the game business grows when it explodes , Traditional relational databases are divided into databases and tables , Application transformation is needed , At the same time, there is a certain downtime . And after these extensions are completed , Shrinking in the sunset of the game also requires application transformation , This undoubtedly caused a lot of extra workload to the business development and basic operation and maintenance departments .
Amazon DynamoDB It is very applicable to this scenario . At any stage of business development , Can be realized 0 Expansion of downtime , Auto scaling feature . And all this is completely transparent to the application layer . At the same time, it can also dynamically expand and shrink capacity according to the business load in daily operation and maintenance , So as to further reduce the cost .
MySQL Bugs: #96466:
https://bugs.mysql.com/bug.php?id=96466
summary
This article mainly talks about the game scene , according to Amazon DynamoDB The limitation of ( Each item must be less than 400KB), Store as much data as possible under the limit and when the storage capacity exceeds the limit , Expand storage to maximize space . Focus on how to use Amazon DynamoDB+Amazon S3 Save the large amount of data attribute in the player's archive , Avoid data presence Amazon S3 After the , On data write Amazon S3 when , Read to... Occurs Amazon S3 Old archive . Simultaneous utilization gzip Compression reduces data size , Reduce IO The overhead improves performance .
Amazon DynamoDB:
https://docs.aws.amazon.com/zh_cn/amazondynamodb/latest/developerguide/ServiceQuotas.html#limits-items
Architecture diagram

Actual code
The goal is
1. Before saving all data gzip Compress , Use after reading gzip decompression .
2.Amazon S3 Storage and Amazon DynamoDB Of binary Field storage can be adaptive . If the compressed user data is greater than the specified value, write Amazon S3, Otherwise, it is directly saved to the fields in the current database project .
3.Amazon DynamoDB When reading items , Parse the extracted fields , If the string s3:// start , Then continue from Amazon S3 Get data in
4. Set up Amazon S3 Read lock field , Judge whether the current state is being written Amazon S3, To block the reading process . In each item, you need to write Amazon S3 It will be set before read_lock by Ture,Amazon S3 After writing successfully, it is set to False. After reading the record ,read_lock Is it True, If the judgment is blocked , The process will wait for a period of time and retry , Until the number of retries exceeds the specified value . Retry after timeout , The reading process will think that the writing process may never succeed for some reason , So it will read_lock Set to False.
First step : Initialize environment parameters
from time import sleep
import boto3
import gzip
import random
import json
import hashlib
import logging
# write in S3 Threshold , Beyond this value, data will be written S3, Otherwise, it is saved in the database , The default value is 350KB
UPLOAD_TO_S3_THRESHOLD_BYTES = 358400
# The goal of user database storage S3 bucket
USER_DATA_BUCKET = 'linyesh-user-data'
# encounter S3 There's a read lock , Maximum number of re requests , If the number of times limit is exceeded, the lock will be automatically cleared
S3_READ_LOCK_RETRY_TIMES = 10
# encounter S3 There's a read lock , Read request retry interval
S3_READ_RETRY_INTERVAL = 0.2
dynamodb = boto3.resource('dynamodb')
s3 = boto3.client('s3')
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)* Slide left to see more
Parameter description
UPLOAD_TO_S3_THRESHOLD_BYTES: Limit the maximum data storage length for the field . Unit is : Number of bytes . because Amazon DynamoDB A project (Item) Data size is limited to 400KB. In addition to the largest field in the data archive, we must also reserve some space for other fields , Avoid the whole Item beyond 400KB.
USER_DATA_BUCKET:Amazon S3 Used to store more than 400KB After the player big field data . It needs to be built ahead of time , Please refer to : Create buckets (https://docs.aws.amazon.com/zh_cn/AmazonS3/latest/userguide/create-bucket-overview.html)
S3_READ_LOCK_RETRY_TIMES: Limit when players are Amazon S3 When the archive on is in the write state , The number of read request retries . When the project is in the read lock state , The reading process will wait for a period of time and try again .
S3_READ_RETRY_INTERVAL: Read lock , The interval between retry reads , Company : second .
Be careful :S3_READ_LOCK_RETRY_TIMES multiply S3_READ_RETRY_INTERVAL In theory, the time of must be less than Amazon S3 Maximum archive upload time , Therefore, the actual use of the code in this article should be adjusted according to the possible size of the archive 2 Parameters . Otherwise, there may be a high probability of dirty reading in the archive .
The second step : establish Amazon DynamoDB surface
def create_tables():
"""
Create table
:return:
"""
response = dynamodb.create_table(
TableName='players',
KeySchema=[
{
'AttributeName': 'username',
'KeyType': 'HASH'
}
],
AttributeDefinitions=[
{
'AttributeName': 'username',
'AttributeType': 'S'
}
],
ProvisionedThroughput={
'ReadCapacityUnits': 5,
'WriteCapacityUnits': 5
}
)
# Wait until the table exists.
response.wait_until_exists()
# Print out some data about the table.
logger.debug(response.item_count)* Slide left to see more
The third step : Write auxiliary logic
Exponential backoff function
def run_with_backoff(function, retries=5, **function_parameters):
base_backoff = 0.1 # base 100ms backoff
max_backoff = 10 # sleep for maximum 10 seconds
tries = 0
while True:
try:
return function(function_parameters)
except (ConnectionError, TimeoutError):
if tries >= retries:
raise
backoff = min(max_backoff, base_backoff * (pow(2, tries) + random.random()))
logger.debug(f"sleeping for {backoff:.2f}s")
sleep(backoff)
tries += 1* Slide left to see more
Amazon S3 Path determination function
def is_s3_path(content):
return content.startswith('s3://')* Slide left to see more
Amazon S3 File acquisition
def get_s3_object(key):
response = s3.get_object(Bucket=USER_DATA_BUCKET, Key=s3_key_generator(key))
return response['Body']* Slide left to see more
Check that the size exceeds the limit
def check_threshold(current_size):
return current_size > UPLOAD_TO_S3_THRESHOLD_BYTES* Slide left to see more
Amazon S3 Key Generating function
This function can randomly assign players' archives to S3 Different under the barrel Prefix in , This helps to improve Amazon S3 in IO Performance of .
def s3_key_generator(key):
s3_prefix = hashlib.md5((key).encode('utf-8')).hexdigest()[:8]
return s3_prefix + '/' + key* Slide left to see more
Upload files To Amazon S3
def upload_content_to_s3(obj_param):
s3_key = s3_key_generator(obj_param['key'])
try:
response = s3.put_object(
Body=obj_param['content_bytes'],
Bucket=USER_DATA_BUCKET,
Key=s3_key)
return "s3://%s/%s" % (USER_DATA_BUCKET, s3_key)
except Exception as e:
logger.error(e)
raise e* Slide left to see more
Step four : Write the main logic
Write a single item to Amazon DynamoDB database
def put_item(load_data):
gzip_data
= gzip.compress(load_data) # compressed data
logger.debug(' Compressed size %.2fKB, Original size %.2fKB, compression ratio %.2f%%' % (
len(gzip_data) / 1024.0,
len(load_data) / 1024.0,
100.0 * len(gzip_data) / len(load_data)))
table = dynamodb.Table('players')
player_username = 'player' + str(random.randint(1, 1000))
if check_threshold(len(gzip_data)):
try:
# Read lock protection
table.update_item(
Key={
'username': player_username,
},
UpdateExpression="set read_lock = :read_lock",
ExpressionAttributeValues={
':read_lock': True,
},
)
# Write data to S3
s3_path = run_with_backoff(upload_content_to_s3, key=player_username, content_bytes=gzip_data)
# Release the read lock protection , At the same time, data is stored in S3 Up to the path
response = table.put_item(
Item={
'username': player_username,
'read_lock': False,
'inventory': gzip.compress(s3_path.encode(encoding='utf-8', errors='strict')),
}
)
logger.debug(' Successfully uploaded the record to S3, route :%s' % s3_path)
except Exception as e:
logger.debug(' Archiving failed ')
logger.error(e)
else:
response = table.put_item(
Item={
'username': player_username,
'inventory': gzip_data,
}
)
logger.debug(' Successfully uploaded the record , username=%s' % player_username)* Slide left to see more
Read a player record in the database
def get_player_profile(uid):
"""
Read records
:param uid: The player id
:return:
"""
table = dynamodb.Table('players')
player_name = 'player' + str(uid)
retry_count = 0
while True:
response = table.get_item(
Key={
'username': player_name,
}
)
if 'Item' not in response:
logger.error('Not Found')
return {}
item = response['Item']
# Check the lock reading information , If there is a lock, set it according to the parameter , Reread the records at intervals
if 'read_lock' in item and item['read_lock']:
retry_count += 1
logger.info(' Current %d Retries ' % retry_count)
# If the record cannot be read after timeout , Then eliminate the read lock , And re read the record
if retry_count < S3_READ_LOCK_RETRY_TIMES:
sleep(S3_READ_RETRY_INTERVAL)
continue
else:
table.update_item(
Key={
'username': player_name,
},
UpdateExpression="set read_lock = :read_lock",
ExpressionAttributeValues={
':read_lock': False,
},
)
inventory_bin = gzip.decompress(item['inventory'].value) # Decompress the data
inventory_str = inventory_bin.decode("utf-8")
if is_s3_path(inventory_str):
player_data = gzip.decompress(get_s3_object(player_name).read())
inventory_json = json.loads(player_data)
else:
inventory_json = json.loads(inventory_str)
user_profile = {**response['Item'], **{'inventory': inventory_json}}
return user_profile* Slide left to see more
Last , Write test logic
Prepare several different sizes json file , Observe the changes written to the database .
if __name__ == '__main__':
path_example = 'small.json'
# path_example = '500kb.json'
# path_example = '2MB.json'
with open(path_example, 'r') as load_f:
load_str = json.dumps(json.load(load_f))
test_data = load_str.encode(encoding='utf-8', errors='strict')
put_item(test_data)
# player_profile = get_player_profile(238)
# logger.info(player_profile)* Slide left to see more
If you need to test the read lock , You can combine the read_lock Set manually to True, Then observe the change of reading logic in this process .
Limit
For games with high concurrent archiving requirements for a single user , The above design is not included in the data storage Amazon S3 After the , Consider the scenario of concurrent writing . If there is a need for this scenario , Need some application logic or architecture adjustment .
Conclusion
It is found in this test ,json Format data use gzip after , The compression ratio is about 25% about , In theory, we can put a single project (item) Can store up to 1.6MB Data item of . Even if a small amount of compression exceeds 400KB The data of , It can also be stored in Amazon S3 On , Only in Amazon DynamoDB Metadata and big field data are stored in Amazon S3 Upper path .gzip It will bring some additional calculations and IO expenses , But these costs will mainly fall on the game server , For the database, it reduces IO The cost of .
In most scenarios , Player data rarely exceeds... Even if it is not compressed 400KB. In this case , It is recommended to try to compare the performance data of compression enabled and disabled scenarios . To decide which way is more suitable for your game .
Author of this article

Forestry
Amazon cloud technology solution architect , Responsible for consulting and architecture design of cloud computing solutions based on Amazon cloud technology . Have more than 14 Years of R & D experience , It has created tens of millions of users APP, Multinomial Github Open source project contributors . In the game 、IOT、 Smart city 、 automobile 、 E-commerce and other fields have rich practical experience .


hear , Click below 4 Button
You won't encounter bug 了 !

边栏推荐
- (2022 Hangdian multi school III) 1009.package delivery (greedy)
- Oracle composite query
- 基于Arduino的温度、湿度测量显示装置
- Flink de duplication (I) summary of common de duplication schemes in Flink and Flink SQL
- 帮忙发一份招聘,base全国,有兴趣的可以过来看看
- Advanced IO outline
- Oracle database problems
- Logcat tool
- 正则表达式基础整理
- View the dmesg log before server restart
猜你喜欢
随机推荐
Py2exe QT interface style becomes Win98 solution
Top ten interview questions for software testing (with answers and analysis)
Use Popen to execute a command and get the return result
在rhel8上使用soci连接oracle和postgresql和sqlite
flink去重(一)flink、flink-sql中常见的去重方案总结
用shell来计算文本中的数字之和
帮忙发一份招聘,base全国,有兴趣的可以过来看看
冰冰学习笔记:类与对象(中)
sql语句批量更新 时间减去1天
记录一个自己挖的坑~
Tableau prep is connected to maxcompute and only writes simple SQL. Why is this error reported?
Okaleido生态核心权益OKA,尽在聚变Mining模式
在kettle中快速更新一个字段中的信息
Cadence(十一)丝印调整和后续事项
Plato Farm有望通过Elephant Swap,进一步向外拓展生态
连接MySQL时报错:Public Key Retrieval is not allowed 【解决方法】
C winfrom common function integration-2
C语言 pthread_cleanup_push()和pthread_cleanup_pop()函数(用于临界资源程序段中发生终止动作后的资源清理任务,以免造成死锁,临界区资源一般上锁)
ARP广播实践案例
ARP broadcasting practice cases









