当前位置:网站首页>2022 Teddy cup data mining challenge question C idea and post game summary
2022 Teddy cup data mining challenge question C idea and post game summary
2022-07-06 13:41:00 【zhugby】
The 10th “ Teddy cup ” Data mining challenge C topic ,@ Teammate :Pluto_Ct、Be Geek bacteria
This year, C The competition question of question is “ Analysis of peripheral tourism demand map under the background of epidemic situation ”, Analyze the changes in tourism and tourist demand before and after the COVID-19 , The objectives of the topic mainly include :
- Build a text classification model for official account articles , According to the relevance between the content of the article and cultural tourism, it can be divided into “ relevant ” and “ Unrelated ” Two types of ;
- From online travel (OTA) And user generated content (UGC) Extract tourism products from the data , And conduct heat analysis and ranking by year ;
- basis OTA and UGC data , Carry out correlation analysis on the extracted tourism products , Find out the scenic spot 、 The hotel 、 Strong correlation model with catering as the core , And on this basis, build a local tourism map , Conduct visual analysis ;
- Using the local tourism map as an analytical tool , Analysis of the changes of tourism products in Maoming before and after COVID-19 , And make suggestions .
among ,“ Local tourism map ” The concept given for the competition , On the basis of the general knowledge map, more requirements for the tourism industry are added . The title data includes 2018 - 2021 Article on official account of Maoming City, Guangdong Province in 、 Travelogues and hotels 、 The scenic spot 、 Catering reviews .
Link to the competition questions on the official website :
Catalog
1. Question 1 : WeChat official account classification
1.1 Problem analysis and ideas
1.4 Pseudo label data preparation
1.5 BERT Model training and prediction
1.6 Failed attempts during the game
2. Question two : Heat analysis of peripheral tourism products
2.1 Problem analysis and ideas
2.2.1 Sift out non Maoming tourism products
2.2.2 Filter the repeated content of travel notes
2.2.4 Traditional word processing
2.4 be based on BiLSTM-CRF and LAC Named entity recognition
2.4.1 be based on BiLSTM-CRF Named entity recognition
2.4.2 be based on LAC Named entity extraction of word segmentation tool
2.4.3 Named entity merge filter
2.5 Based on marker word matching and TextCNN Named entity classification
2.6 be based on Single-Pass Text clustering and geocoding entity alignment
2.7 Entity coding and backtracking
2.8 Establishment of index system and heat evaluation
3. Question 3 & Question 4 : Product association mining and local tourism map construction
3.1 Problem analysis and ideas
3.2 Association mode definition and index quantification
3.3 Construction and analysis of tourism product correlation map
3.4 Construction and analysis of tourism macro concept map
3.5 Distribution of tourism products and suggestions
3.6 Analysis and suggestions on the trend of tourism products before and after the epidemic
4.2 Deficiencies and future prospects
4.3 Game summary and Enlightenment
1. Question 1 : WeChat official account classification
1.1 Problem analysis and ideas
Build text classification model , For attachments 1 The official account of WeChat public is classified according to the relevance of its content and culture brigade. “ relevant ” and “ Unrelated ” Two types of , The classification results are shown in table 1 Save as file “result1.csv”. The theme with strong relevance to cultural tourism is tourism 、 Activities 、 festival celebration 、 Specialty 、 traffic 、 The hotel 、 The scenic spot 、 Scenic spot 、 Wen gen 、 Culture 、 rural tourism 、 Home stay 、 Holidays 、 During the holiday 、 tourists 、 pick 、 Marvel at flowers 、 Spring Tour 、 Outing 、 Health care 、 park 、 Coastal Tour 、 A holiday 、 Generated by 、 The script kills 、 travel 、 On foot 、 Industrial tourism 、 line 、 self-driving tours 、 Team tour 、 strategy 、 travel 、 Charter a car 、 Glass plank road 、 The yacht 、 golf 、 Hot springs, etc .
The competition data is not marked , All data are 6000 Yu Wen , And the title gives a large number of theme words related to culture and tourism , It should be an important basis for classification , The feasibility of manual marking training is not high , Consider expanding keywords to form a thesaurus , Then feature extraction and matching of text data .
First , Based on the theme words related to culture and tourism given by the title , Expand synonyms to form Thesaurus ; then , use TextRank、TF-IDF Extract separately Corpus keywords , With Thesaurus matching ; meanwhile , Use rules to directly match and judge the title and body , Get... Together 3 A label ,3 The labels are consistent Pseudo dimension data For training models . be based on chinese BERT Pre training language model Training classifier , It is used for the discrimination basis when the results obtained by the above three methods are inconsistent .
1.2 Data preprocessing
After exploration , There is no completely duplicate data in the data , Use after sifting out spaces and special characters jieba Conduct participle , and Screen reserved parts of speech as nouns 、 Verb words .
1.3 Thesaurus construction
Based on the theme words related to culture and tourism given in the title Screening , For each subject word after screening , Use Synonyms Library generation 5 A synonym , expand Form a thesaurus related to cultural tourism , Including 143 A theme word .
1.4 Pseudo label data preparation
Use the following three methods to judge the cultural tourism relevance of the article :
- Match the title and body with the filtered original subject words , If the title contains subject words or the subject words of the body exceed 20 individual , Think it is related to culture and tourism , The opposite is irrelevant ;
- Use TextRank Algorithm extracts every article before 15 Key words , And match with thesaurus , If it matches , Think it is related to culture and tourism , The opposite is irrelevant ;
- Use TF-IDF Algorithm extracts every article before 15 Key words , And match with thesaurus , If it matches , Think it is related to culture and tourism , The opposite is irrelevant .
Each article is obtained by using the above three methods 3 individual “ Is it related to cultural tourism ” The label of , The three agree that the classification is more accurate , As pseudo labeled data for training model ; Inconsistent people think that the classification result is questionable , The trained model needs to be used for further discrimination .
Based on the generated pseudo tagging corpus , Reuse TextRank、TF-IDF Each extraction 50 Key words , And add the title word segmentation to the keywords after screening the part of speech, and combine it with the title to remove the duplicate as the model input .
1.5 BERT Model training and prediction
Based on Chinese BERT Pre training language model training classifier , The accuracy of the model on the training set reaches 98.23%, The accuracy on the test set is 96.57%, Use the training model to classify the articles with questionable results . Consolidated results , Save to result1.csv.
1.6 Failed attempts during the game
I tried on the sample data in the early stage LDA And simple rule matching , Because the amount of data is too small ,LDA On the sample data, we can't distinguish the topics related to cultural tourism , Simple rules have achieved better results , But after all the data is released , Because data sources and data formats are more diverse than sample data , The originally established rules can no longer fully adapt to the data .
Later, we found an unsupervised classification algorithm that can classify text only based on tags LOTClass, But most of the online reference materials are oriented to English data , Due to time constraints , The attempt to realize the Chinese data of this problem failed .
2. Question two : Heat analysis of peripheral tourism products
2.1 Problem analysis and ideas
From attachments OTA、UGC Data extraction includes scenic spots 、 The hotel 、 Wanghong scenic spot 、 Home stay 、 Food Specialties 、 rural tourism 、 Examples of cultural and creative tourism products and other useful information , List the extracted tourism products and the corpus they rely on 2 Save as file “result2-1.csv”. Establishing a multi-dimensional heat evaluation model for tourism products , Analyze the heat of the extracted tourism products by year , And rank . Tabulate the results 3 Save as file “result2-2.csv”.
In order to fully explore the current situation of tourism development , Through travel notes and scenic spots 、 The hotel 、 Catering reviews 4 Extract tourism products from the data table , And build a heat index system , Calculate the heat value of each tourism product and rank and analyze it annually .
First , be based on CLUENER Fine grained Named Entity Recognition Corpus Filter related entities , use BiLSTM-CRF Training models , For travel notes introduction 、 Scenic spot comments 、 Hotel Reviews and catering reviews 4 Table for Named entity recognition (NER), And combine LAC The word segmentation tool comes with NER A set of functions is designed Screening and optimization of named entity recognition for tourism products Method ; secondly , be based on TextCNN Classification model 、Single-Pass Clustering algorithm and its application Geocoding Conduct Entity aligned ( Synonymous place names merge ), Co integration extraction to 631 Valid entities , Cumulative 1014 There are two ways to express ; then , Establish the enthusiasm for participation 、 The three major indicators of response heat and publicity heat Heat evaluation system , And calculate the index ; Last , use AHP-TOPSIS Methods calculate and sort the index weight , Get the final heat evaluation result .
2.2 Data cleaning
Explore the data and find some exceptions , Data cleaning is mainly carried out from the following aspects .
2.2.1 Sift out non Maoming tourism products
In the scenic spot comment data , There are some scenic spots that do not belong to Maoming (“ China Antarctic Great Wall Station ” Located in the South Pole ,“ Guangdong Ocean University ” Located in Zhanjiang , At the same time, there are many scenic spots named “ zhanjiang xxx” Non Maoming scenic spot )
Cleaning method : The name of the scenic spot includes “ zhanjiang ”“ Guangdong Ocean University ”“ Antarctic Great Wall Station ” The comments , Row deletion .
There are also some scenic spots that are not Maoming in the travel notes strategy , But it's less .
Cleaning method : In the later stage, filter according to the quantity ( Less than 3 Sift away the appearance of the article ).
2.2.2 Filter the repeated content of travel notes
There are a lot of repeated content such as links in the travel notes strategy , Seriously affect the effect of product extraction and heat evaluation .
Cleaning method : Because the structure is relatively unified , Use regular expression filtering , Delete the URL and the title before the URL .
re.sub(r'^.*?\n\d+\-\d+\-\d+.*?\nhttp.*?\n|\n.*?\d+\-\d+\-\d+.*?\nhttp.*?\n|\d+\-\d+\-\d+.*?\nhttp.*?\n| ', '', x)
2.2.3 To stop using words
Based on the inactive Thesaurus of Machine Intelligence Laboratory of Sichuan University , Supplement with the application scenario of this topic .
2.2.4 Traditional word processing
Because the data has some traditional content , May affect later judgment , Use OpenCC The library converts traditional Chinese content to simplified Chinese .
2.3 Thesaurus preparation
Crawl the main popular tourist attractions in Maoming from the Mafeng website ( A total of 200 strip ); Combined with the name of the scenic spot in the topic scenic spot comment table ( If it overlaps with the scenic spots on the Mafeng website , Take the longer one as the standard name ); To form “ Maoming tourist attractions thesaurus ”, And add LAC User dictionary of word segmentation tool , It is used to improve the effect of word segmentation and tourism product recognition for Maoming .
Based on some catering 、 The scenic spot 、 The structural characteristics of the hotel name , formation “ Classification feature thesaurus ”, Used to improve the accuracy of named entity classification .
2.4 be based on BiLSTM-CRF and LAC Named entity recognition
2.4.1 be based on BiLSTM-CRF Named entity recognition
be based on CLUENER Fine grained Named Entity Recognition Corpus , Sift through “ tourism ”“ Address ” Category annotation entities , Training for the tourism sector NER Model , And use the training model to recognize the named entity of the topic data .
2.4.2 be based on LAC Named entity extraction of word segmentation tool
Open source Baidu LAC Word segmentation tool It has certain named entity recognition function , Select the part of speech of the participle as LOC、ORG 's words , As the result of named entity extraction .
2.4.3 Named entity merge filter
The above two named entity recognition methods have their own advantages and disadvantages , Mainly reflected in :
(1) trained NER The model is aimed at the tourism field , We can find more entities in the tourism field ; But the result exists More invalid content and Incomplete words The situation of , Cleaning screening is required ;
(2) Baidu LAC Itself is a word segmentation tool , The context and the whole article are more fully considered , The extracted entities are cleaner 、 More complete structure ; but LAC The pertinence to the cultural tourism industry is not strong , There are some cases that cannot be extracted ;
Because both have tourism products that are not recognized by another method , Therefore, through certain rules to combine to form “CombinedNER”.
The specific process is as follows :
1. Add stop words : The government 、 association 、 Guangdong province, 、 maoming ;
2. The two references coincide , choose lac Of NER result ;
3. Model NER Result filtering method :
(1) Length filter : Longer than 2;
(2) The results of model training are lac Part of speech tagging , If it contains only one word and is a common word , or a/v/u Wait for the part of speech ending , Consider common words or incomplete structures , Remove ;
(3) contrast lac Whether there is word truncation in the training result of word segmentation judgment model , Exist, then remove , Such as :' Trapped in mountain villages in western Guangdong ', ' Villages and villages ';
(4) Only alphanumeric removal ;
(5) Contains special symbol filtering (re.sub);
(6) contain “ province ”&“ City ” I think it's explaining the geographical location , Remove ;
(7) contain “ Of ”, It is not the entity name of tourism products , Remove ;
(8)3-5 Word and with “ province ”“ City ”“ District ”“ county ”“ Committee, ” The end may be the penultimate word , Think it is not the tourism product entity we need , Remove ;
(9)3-4 Word and with “ road ”“ standing ”“ The town of ” ending , Think it is not the tourism product entity we need , Remove ;
(10)“ National People's Congress ”“ The group ”“ company ”“ High speed ” And so on , Think it is not the tourism product entity we need , Remove .
2.5 Based on marker word matching and TextCNN Named entity classification
To avoid grouping entities of different categories into one class , At the same time, prepare for the third question , Divide named entities into catering 、 The scenic spot 、 There are three types of hotels .
Named entity classification : Flag word matching +TextCNN
(1) First of all, based on “ Classification feature thesaurus ” Yes, it can be directly identified as catering / The scenic spot / Of the hotel Characteristic words ( Such as : The restaurant 、 Noodle shop 、 Restaurants belong to catering , The museum 、 Amusement parks belong to scenic spots , hotel 、 Home stay 、 The apartment belongs to the hotel ) Conduct matching , If it matches, it is considered to belong to this class ;
(2) matching Maoming scenic area thesaurus ( Match whole words ), Improve accuracy ;
(3) Based on online corpus , The training corpus is constructed by combining the comment objects in the topic data , contain 23571 This scenic spot 、 Restaurant 、 Hotel name data . structure TextCNN The model is trained based on corpus , The accuracy of the model on the training set reaches 93.74%; Use the trained model to classify other entities .
2.6 be based on Single-Pass Text clustering and geocoding entity alignment
Because the same entity has different expressions 、 Wrongly written characters 、 Incomplete identification, etc , Based on the classification of named entities , Clustering all entities according to classification based on text similarity . Named entities are usually short ,Single-Pass Is a short text clustering algorithm .
The preparation and main process steps of named entity clustering are as follows :
(1)Word2Vec Training 200 The word vector of dimension ;
(2)Embedding: Convert each word of the entity into a word vector , Average the vectors of each word to get the vector of the whole named entity word , The cosine similarity of two vectors can measure the similarity of these two named entities ;
(3)Single-Pass clustering : For each new named entity , Calculate the cosine similarity between it and each word in the group of named entity words , If its maximum value is higher than the threshold, it will be divided into this group ; After calculation, groups that do not meet the conditions are grouped independently ;
(4) Low frequency entity filtering : The sum of the frequency of the whole group of words after clustering is less than 3 And does not contain entity characteristic words , Think it is not located in Maoming or the recognition effect is poor , Get rid of ;
(5) Standard name selection : First match “ Maoming scenic spot thesaurus ” The entity name contained in is used as the standard name , Secondly, the entity name with the largest number in the group is used as the standard name , Finally, the entity name with the largest length is used as the standard name .
Because the different expressions of some of the same entities cannot be completely reflected from the text similarity level , Use Baidu map API Obtain the longitude and latitude coordinates for further entity alignment , Merge entities of the same kind with the same coordinates .
Use Baidu map API Location search function , Retrieve each entity within Maoming City and obtain longitude and latitude coordinates .
For entities with the same longitude and latitude coordinates , Think of the same entity , A merger ; If the supplementary geographical structure does not exist after multiple searches , It is considered that the entity is not located in Maoming or the reference is too broad , Eliminate it .
2.7 Entity coding and backtracking
Finally, it is extracted to 631 Valid entities 、1014 There are two ways to express , Encode entities , Different expressions of the same entity share the same code ; Trace the source of these entities , Accumulatively establish the relationship between entities and travel notes 10515 table of contents ( Different entities count as multiple 、 The same article refers to the same entity many times without double counting ), Expression is related to travel notes and comments 20434 table of contents ( Different expressions are counted as multiple times 、 The same expression is mentioned many times in the same article without double counting ).
Finally, sort it into the format required by the title , Save to result2-1.csv.
Draw the tourism products on the map of Maoming City according to the longitude and latitude coordinates , As shown below .
2.8 Establishment of index system and heat evaluation
The establishment includes Participation heat 、 Reaction heat and Publicity heat Three major indicators Heat evaluation system , And calculate the index ; Last , use AHP-TOPSIS Methods calculate and sort the index weight , Get the final heat evaluation result , Save to result2-2.csv.
First level indicators | Two level index | Quantification method | Specific calculation method |
Participation heat | Evaluation times | frequency | travel 、 Comment on 4 Add the times mentioned in the table ( Travel notes according to the departure time ) |
Active days | frequency | travel 、 Comment on 4 Add the days mentioned in the table | |
The density of cultural tourism products around | Baidu map obtains longitude and latitude coordinates , Calculated distance | The reciprocal sum of the distance from other tourism products , distance <1km Remember as 1km | |
Reaction heat | Tourist impression | Sentiment analysis | For each comment / Emotional analysis is carried out in the part of travel notes describing the tourism product |
Repurchase heat | frequency | Tourism products mention “ Come again next time ”“ Lived many times ”“ The second time ” The number of comments waiting for the repurchase logo | |
Publicity heat | Official account | frequency | The number of times mentioned in the official account article |
Introduction to travels | frequency | The times mentioned in the travel notes introduction ( By release time ) |
3. Question 3 & Question 4 : Product association mining and local tourism map construction
3.1 Problem analysis and ideas
Construction and analysis of local tourism map
According to the provided OTA、UGC data , Right questions 2 Correlation analysis of tourism products extracted from , Find out the scenic spot 、 The hotel 、 Strong correlation model with catering as the core , The results are shown in table 4 Save as file “result3.csv”. On this basis, the local tourism map is constructed and appropriate methods are selected for visual analysis . Teams are encouraged to explore and explain the hidden association patterns among tourism products .
Analysis of changes in demand for tourism products before and after the epidemic
Based on historical data , Using the local tourism map as an analytical tool , Analysis of the changes of tourism products in Maoming before and after COVID-19 , And write a letter no more than 2 The letter on page puts forward policy suggestions for the development of the tourism industry to the competent tourism authorities in the region .
The third and fourth questions need to conduct correlation analysis on the tourism products extracted from the second question , On the basis of fully considering the strong correlation mode between tourism products , From geographical location 、 Comment features and other multi-dimensional mining of implicit relationships between products , And quantify the degree of Correlation . On this basis, the local tourism map is constructed , And carry out visual presentation and analysis from multiple perspectives at the macro and micro levels .
First , After data preparation and entity extraction in the previous stage, the association pattern is defined , After quantitative calculation and standardization of each correlation mode, the results are exported and stored Neo4j In the graph database , Finally, visual analysis is carried out from the macro and micro perspectives .
3.2 Association mode definition and index quantification
After data preparation and entity extraction in the previous stage Definition of association pattern , According to the degree of local economic development 、 Geographical location and other factors define Eight Association modes , There are three kinds of Implicit correlation pattern .
Association patterns | Quantification method ( Before standardization ) | remarks |
Co existing relationship | A、B More than or equal to 3 when ,P(AB)/P(A)+P(AB)/P(B) | |
Nearest neighbor relationship | 1/ Geographical distance | Geographical distance <500m Remember as 500m( The same below ) |
Radiation relationship | The heat of the scenic spot /( Density of similar products around · Geographical distance ^2)^(1/3) | Limited to scenic spots and hotels / Between meals |
Competition | 1/( Economic distance · Geographical distance ^2) ^(1/3) | Economic distance =|A Catering hotel heat -B Catering hotel heat |, Limited to hotels / Catering among the same kind |
Diversion relationship | (A The heat of the scenic spot +B The heat of the scenic spot )/AB The geographical distance of the scenic spot | Between scenic spots |
The catering style is similar * | name 、 title 、 Comment content keyword similarity | Between meals |
The hotel style is similar * | name 、 Room type setting 、 Comment content keyword similarity | Between hotels |
Historical and cultural connection * | History and culture 、 The number of occurrences of high-level concept corpus | Based on the history and culture of Maoming Baidu Encyclopedia 、 Scenic spots construction corpus |
Quantify and standardize the indicators , Save to result3.csv.
3.3 Construction and analysis of tourism product correlation map
Store tourism products and relevance into Neo4j Graph database in , Maoming City before and after the outbreak Tourism product correlation map and Macro concept map of local tourism .
Blue is the scenic spot , Yellow is catering , Pink is hotel . Restaurant / The similarity of hotel style shows the former relevance 50 Name Association , In each of the other associations, the correlation degree is displayed before 200 Name Association .
It can be seen that the correlation between the epidemic prospects is relatively closer , After the epidemic, the correlation between scenic spots and catering hotels has increased relatively , Tourism tends to shift from touring many scenic spots to visiting a few scenic spots in depth , And further explore the surrounding catering and other tourism products .
3.4 Construction and analysis of tourism macro concept map
Given by the title “ Local tourism map ” The example contains many macro concepts , Such as “ strategy ”“ accommodation ” etc. , It's not entirely the relationship between products , At the same time, the analysis of the fourth question also requires some analytical tools for the changes in the macro tourism market before and after the epidemic , By defining the upper concepts and looking for high-frequency words in relevant popular tourism products and reviews , Build a macro concept map of tourism .
Each kind of tourism products is the most popular 5 A product 、 The most frequent 8 Key words ; Introduction to travels , Show the most frequent 10 Key words , At the same time, history and culture 、 High level concepts and other correlations are drawn on the map .
By analyzing the macro concept map of tourism, it can be seen that coastal tourist attractions such as romantic coast are greatly impacted by the epidemic , Popularity has declined ; Catering taste demand and hot products have changed ; There are also some changes in the relevant evaluation of services .
3.5 Distribution of tourism products and suggestions
There are many popular scenic spots in Maoming Concentrated on the coast , But at the same time, coastal scenic spots are both popular and iconic scenic spots in Maoming , It's also The impact of the epidemic is greatest One of the types of scenic spots ;
stay Under the background of normalization of epidemic situation , Maoming tourism industry can consider supporting the operation of coastal scenic spots , We will intensify efforts to develop other projects that combine historical culture and rural characteristics Emerging tourist attractions , Establish a new tourism industry “ Growth pole “.
3.6 Analysis and suggestions on the trend of tourism products before and after the epidemic
Based on the product association pattern 、 Objective to compare and analyze the changes of geothermal degree and comment on hot words , Make the following suggestions :
- Tourism tends from “ Long trip and full view ” The transition to “ Deep excavation full-scale ”, Maoming tourism can focus on building Short term tourism market , Launch new play 、 New experience short-term products , Attract more surrounding tourists ;
- The impact of the epidemic makes Local tourists It accounts for a large proportion of the tourism market rising , It can increase local advertising investment , Enrich the forms of tourism products , Provide citizens with more high-quality “ A good place to go at home ” ;
- It can be combined with the geographical characteristics of Maoming 、 Cultural heritage On line drainage , Design supporting software services for the whole scene and the whole process to stimulate consumption .
4. Summary and prospect
4.1 Summary of the plan
One 、 WeChat official account classification : First, based on the theme words related to culture and tourism given by the competition questions, we screen and expand the synonyms , Form a thesaurus ; then , use TextRank、TF-IDF Extract corpus keywords respectively , Match with thesaurus , At the same time, establish certain rules to directly match the title and the text , Get... Together 3 A label ,3 Tags are consistent as pseudo labeled data for training model ; Last , Based on Chinese Bert Pre training language model training classifier , As the discrimination basis when the results obtained by the above three methods are inconsistent ; It achieves a better classification effect of official account articles .
Two 、 Tourism product extraction and heat analysis : First , be based on CLUENER Fine grained Named Entity Recognition Corpus filters related entities , use BiLSTM-CRF Training models , For travel notes introduction 、 Scenic spot comments 、 Hotel Reviews and catering reviews 4 Table for named entity recognition (NER), And combine LAC The library designs a set of named entity recognition screening and optimization methods for tourism products ; secondly , be based on TextCNN Classification model 、Single-Pass Clustering algorithm and geocoding to merge synonymous place names , Co integration extraction to 631 Valid entities , Cumulative 1014 There are two ways to express ; then , Establish the enthusiasm for participation 、 The heat evaluation system of three indicators of response heat and publicity heat , And calculate the index ; Last , use AHP-TOPSIS Methods calculate and sort the index weight , Get the final heat evaluation result .
3、 ... and 、 Product association mining and local tourism map construction : First , After data preparation and entity extraction in the previous stage, the association pattern is defined , According to the degree of local economic development 、 Geographical location and other factors define eight correlation patterns , There are three patterns of implicit relationships ; then , Quantify and standardize each correlation mode , Store tourism products and links in Neo4j In the graph database , Generate the correlation map of Maoming tourism products and the macro concept map of local tourism before and after the epidemic ; Based on the product association pattern 、 Objective to compare and analyze the changes of geothermal degree and comment on hot words , It is found that tourists tend to travel from “ Long trip and full view ” The transition to “ Deep excavation full-scale ”、 The catering industry has ushered in a better development trend and marketing outlet 、 The co-occurrence relationship of tourism products is becoming more and more obvious ; Last , Based on the above findings, write a letter to put forward policy suggestions for the development of Maoming tourism to Maoming Tourism Administration .
4.2 Deficiencies and future prospects
(1) There is still room for improvement in model training
TextCNN and BERT The model effect depends on the parameter setting , You can also further tune .
(2) Rule design can be further refined
There are many characteristics in the cultural tourism industry , A more detailed thesaurus and program rules can be established , Further optimize classification 、 The effect of tasks such as named entity recognition .
(3) The current situation of local tourism in Maoming can be further explored in combination with field visits
At this stage, the conclusion is mainly based on travel notes and comments data mining and analysis , The on-site investigation can better understand the current situation and problems of local tourism development , Make the research more practical , Put forward more targeted and feasible suggestions .
4.3 Game summary and Enlightenment
This competition spent more energy on improving the scheme , Many new problems were found during the implementation of the plan , Constantly modify and expand , He also sighed repeatedly during the competition “ To do more and more ”. Write code , The overall time control is not good enough , Later papers are in a hurry . In terms of schedule control , Members who write papers know what is currently available 、 What else is missing , It may be better to control the progress .
Skill improvement requires long-term accumulation , Learn more and practice more . I learned a lot of new knowledge, skills and methods in this competition , I also found many places where I still have to continue to learn .
5. At the end
This summary has been written since the end of the competition , At that time, I didn't expect to have the honor to enter the defense ; After many days of preparation , There is an opportunity to show our plan and conclusions more clearly , And had some meaningful exchanges with the judge teacher .
The experience of this competition is very good , Each team member is good at 、 A serious and responsible instructor 、 A plan for excellence , Discuss again after the game ……
Keep trying in the future !
边栏推荐
- 【九阳神功】2017复旦大学应用统计真题+解析
- 5.函数递归练习
- The overseas sales of Xiaomi mobile phones are nearly 140million, which may explain why Xiaomi ov doesn't need Hongmeng
- [the Nine Yang Manual] 2019 Fudan University Applied Statistics real problem + analysis
- 仿牛客技术博客项目常见问题及解答(二)
- 3.输入和输出函数(printf、scanf、getchar和putchar)
- [au cours de l'entrevue] - Comment expliquer le mécanisme de transmission fiable de TCP
- [the Nine Yang Manual] 2021 Fudan University Applied Statistics real problem + analysis
- Write a program to simulate the traffic lights in real life.
- [the Nine Yang Manual] 2018 Fudan University Applied Statistics real problem + analysis
猜你喜欢
IPv6 experiment
(原创)制作一个采用 LCD1602 显示的电子钟,在 LCD 上显示当前的时间。显示格式为“时时:分分:秒秒”。设有 4 个功能键k1~k4,功能如下:(1)k1——进入时间修改。
Differences among fianl, finally, and finalize
7.数组、指针和数组的关系
Change vs theme and set background picture
fianl、finally、finalize三者的区别
Cookie和Session的区别
4.二分查找
7. Relationship between array, pointer and array
C language to achieve mine sweeping game (full version)
随机推荐
The difference between abstract classes and interfaces
There is always one of the eight computer operations that you can't learn programming
Caching mechanism of leveldb
Change vs theme and set background picture
MySQL limit x, -1 doesn't work, -1 does not work, and an error is reported
一段用蜂鸣器编的音乐(成都)
凡人修仙学指针-2
Wei Pai: the product is applauded, but why is the sales volume still frustrated
[面试时]——我如何讲清楚TCP实现可靠传输的机制
View UI Plus 發布 1.3.1 版本,增强 TypeScript 使用體驗
Redis实现分布式锁原理详解
View UI plus released version 1.3.0, adding space and $imagepreview components
Set container
4.分支语句和循环语句
7.数组、指针和数组的关系
Mortal immortal cultivation pointer-2
【九阳神功】2022复旦大学应用统计真题+解析
Write a program to simulate the traffic lights in real life.
Mode 1 two-way serial communication is adopted between machine a and machine B, and the specific requirements are as follows: (1) the K1 key of machine a can control the ledi of machine B to turn on a
C语言实现扫雷游戏(完整版)