当前位置:网站首页>Data mining scenario - false invoice
Data mining scenario - false invoice
2022-07-23 12:19:00 【Emperor Confucianism is supreme】
— Make a brief introduction to the basic business , The details of Taxation will be supplemented later
One . Business analysis
1. What is false VAT invoice
(1) Not buying or selling goods or not providing or receiving taxable services for others 、 For yourself 、 Let others do it for themselves 、 Introduce others to issue special VAT invoices ;
(2) There are goods purchased and sold or taxable services provided or accepted but for others 、 For yourself 、 Let others do it for themselves 、 Introduce others to issue special VAT invoices with false quantity or amount ;
(3) Carried out actual business activities , But let others issue special VAT invoices for themselves .
2. Business analysis
Taxpayer risk profile is analyzed by data model 、 Machine learning algorithm , Find tax risk enterprises suspected of falsely issuing VAT invoices in batches . Through qualitative and quantitative labels, we can depict the significant characteristics of tax risk taxpayers , Form a risk portrait , Assist tax personnel in the discovery and identification of tax risk taxpayers . Provide label model management 、 Risk inventory , Support group portraits 、 Multi type portrait mode such as single portrait .
Through data model analysis 、 Machine learning algorithm , Make comprehensive use of invoice relationship 、 Three members of the enterprise / The analysis of other relationships such as the cross employment relationship of four members, and the judgment of the entire invoice falsely written gang , Find abnormal enterprises suspected of falsely issuing VAT invoices in batches . Through qualitative and quantitative labels, we can depict the significant characteristics of the group of suspected taxpayers who falsely issue value-added tax invoices , Assist tax personnel in the discovery and identification of suspected taxpayers who falsely issue VAT invoices .
3、 ... and . Types and characteristics of false VAT invoices
Here I mainly introduce some of the things I have analyzed :
1. Walk away type of virtual opening
Walk away type of virtual opening , Also known as “ Violence is empty ”, Refer to , After the actor completes the virtual opening , Do not declare tax , Or declare without paying taxes . Such false Invoicing parties usually fight guerrilla warfare , One shot for another .
The characteristic of this kind of nihility is : Actors usually register multiple companies , And usually use others' ID cards to register , After that, we will carry out intensive virtual activities .
2. The ticket and goods are separated and falsely opened
The ticket and goods are separated and falsely opened , Usually refer to , For transactions on false invoices , The drawer has a corresponding real transaction , The real buyer does not need an invoice . Whether or not an invoice is issued , The drawer is required to declare and pay taxes , Therefore, the drawer will transfer the invoice corresponding to the real transaction to other people who need it . This kind of empty behavior passes Li Daitao's deadlock 、 The way of cheating , Avoid tax declaration on falsely issued invoices , Make the downstream deduct the input tax or ( and ) Pre tax deduction .
The typical mode of ticket and cargo separation virtual opening is :A The company sells goods to Li Si , Li Si doesn't need an invoice , therefore A The company will issue an invoice to B company . Take a popular sketch in life for example : I go to a restaurant for dinner , I ordered a bowl of fried noodles , Then I didn't eat and changed a bowl of noodles with the store , When the shopkeeper asked me to pay , I said I traded fried noodles for soup noodles , I didn't eat fried noodles, so I don't have to pay . In this way, can I eat noodles for nothing ?( ha-ha , for instance )
3. Tax preference type false opening
The so-called false opening of tax preference , It refers to the drawer's use of preferential tax policies , Or special policies similar to preferential tax policies ( such as , Tax verification 、 Financial subsidies 、 Bonded system 、 Agricultural products purchase invoice, etc ), The false behavior implemented . The characteristic of this false behavior is , The actor adopts preferential tax policies or special policies similar to preferential tax policies , There is no need to declare and pay taxes in full on falsely issued invoices .
Four . What are the characteristics of false VAT invoices
From the category of falsely issuing special VAT invoices 、 Characteristics and corresponding data , We can list the following characteristics :
(1) The name of the company is often changed when Invoicing , Most of them are commercial enterprises ;—( Existing enterprises change their names )
(2) A large number of invoices are invalid after being issued ;—( It also involves other )
(3) Most of the company's tax invoices are issued at the maximum amount , The full amount of the invoice is higher than 90%;—( Now, with the strengthening of management , The ceiling is falling )
(4) The registration information is the same , Enterprise legal person 、 financial staff 、 The tax personnel are mostly the same ;
(5) The names of goods purchased and sold by trading companies deviate seriously ;
(6) The invoice has been incrementally updated for many times ;
(7) There are a large number of red ink ordinary invoices 、 Issue red ink invoices at will to offset the blue ink invoices of previous years ;—( Whether the current month , Don't cross the moon . It's a negative number .)
(8) The number of capital or inventory turnover is more than five times per month ;
(9) The amount of value-added tax invoice issued within a certain period of time increases suddenly ;
(10) The establishment time is short , The establishment time is mostly within half a year , But the business scale expanded rapidly ;
(11) The registered address is usually a room on a floor in a residential area , Obviously not suitable for external business ;
(12) The legal person's registered residence is not local 、 Abnormal concentration of legal person establishment ;
(13) Production energy consumption, such as electricity, is seriously inconsistent with sales ;—( undetermined )
(14) The company is mostly subscribed or the paid in capital is mostly a lower amount ;
(15) The registered legal person of multiple enterprises is the same , And the mobile phone number left in the tax registration information is also the same mobile phone number ;
(16) A number of enterprises that continuously and simultaneously handle tax registration or are recognized by general taxpayers ;
(17) The industry of the company belongs to a false high-risk industry ;
(18) legal person 、 The financial principal once served as the principal or financial principal of abnormal accounts 、 And the legal person and the person in charge of Finance cross serve ;
(19) Many labor tickets are issued ;—( It should be judged in combination with the individual income tax payment )
(20) Night billing ;—( Now criminals are also “ progress ”, They are also making themselves more like normal enterprises )
5、 ... and . Algorithm model building
In various cases of tax evasion , You can see the most obvious 、 The easiest thing to check is that the purchase and sales of the goods invoice do not match . Therefore, the algorithm model of this scenario is built here .
(1) Business understanding :
For a normal enterprise , It will carry out business and production activities , Therefore, there will be purchase and sales records , That is, an enterprise will buy relevant goods that meet its business scope , That is, the input set , It will also sell relevant goods that meet its business scope to the market , That is, the output set . So look at it like this , The input set and output set of a normal enterprise are related . If the input and output of an enterprise have no correlation or the correlation is relatively small , Then this enterprise is likely to be abnormal , That is, abnormal operation , Then the invoice issued by this enterprise is also false . For example, in taxation , Some enterprises that falsely issue invoices and enterprises that change invoices , A large number of special invoices for VAT on goods with tax reduction and exemption will be used , Or the illegal act of tax evasion for the downstream to issue these invoices for deduction ; Another example is in export tax rebate enterprises , According to the goods they buy , The tax rate of the goods it should export is different from that of the goods it declares , So as to cheat and refund tax exemption activities .
(2) Algorithm to choose :
Word2Vec The algorithm maps the content of goods purchased and sold by enterprises , Construct semantic word vector , On this basis, the improved similarity 00 Degree algorithm exploration finds abnormal ticket changing enterprises . This algorithm can model the correlation of the enterprise's purchase and sale commodity set , By scoring the enterprise , To analyze whether the enterprise is reasonable . In this scoring process , The higher the score of an enterprise , Then the more normal this enterprise is ; conversely , The more abnormal . The collection of purchased and sold goods is composed of the goods and amount they buy and sell , So for now , Commodity is the smallest unit of these two sets , Therefore, what we should do is to start from the correlation between commodities , Then based on the relevance of the goods , Get the correlation between purchase and sales .
(3) analysis :
Generally speaking , There is a great connection between the goods purchased and sold by normal enterprises . Then based on this assumption , Use Word2Vec Use a tool n Dimensional real number vector to characterize each commodity , And satisfying the correlation between vectors can characterize the correlation between commodities . And the original Word2Vec Is used to process natural language , The analysis is the correlation between words . So here we assume that each commodity is regarded as a word , Then construct the commodity sequence .
Here, an enterprise is regarded as a statement , The purchase and sale goods of the enterprise work together to construct the commodity sequence . After the sequence construction of each enterprise is completed , Lost to Word2Vec, Output the n Dimension vector v. Last , Use cosine Correlation measurement formula for two different commodities p,q Measure the correlation between . as follows :
After the correlation between commodities is determined , Based on the correlation between commodities , And integrate the size of the amount to measure the correlation between each enterprise's purchase and sales of goods . set up G It is the input collection of the enterprise ,X It is the output collection of the enterprise . structure G、X Yes , For each of these p Belong to G The goods , from X Find the most similar q, constitute GX1={<p ,q>} The collection ; And for every q Belong to X The goods , from G Find the most similar p, constitute GX2={<p,q>}; Finally take GX1 And GX2 Union , obtain GX. Last G And X The measurement formula is as follows :
among ,sim(p,q) Indicates the goods in the input p Vectors and sales items q The correlation value between vectors ,min Indicates the goods in the input p The purchase amount and the goods in the sales q The smaller amount between the sales amount ,max Is the larger amount between these two amounts .
From this, we can get the correlation between each enterprise's purchase and sale commodity set , And use the correlation to judge whether the enterprise is abnormal . If the Correlation sim(G,X) Less than a given threshold , Then I think the enterprise is abnormal , Otherwise it's normal . This correlation can also be used as the normality of each enterprise .
边栏推荐
- Numpy summary
- 从已有VOC2007数据集生成yolov3所需要的数据集,以及正式开始调试程序需要修改的地方
- Use pyod to detect outliers
- 生命科学领域下的医药研发通过什么技术?冷冻电镜?分子模拟?IND?
- G2o installation path record -- for uninstallation
- 深度学习-神经网络
- A hundred schools of thought contend at the 2021 trusted privacy computing Summit Forum and data security industry summit
- Using or tools to solve path planning problem (VRP)
- google or-tools的复杂排班程序深度解读
- 单片机学习笔记7--SysTick定时器(基于百问网STM32F103系列教程)
猜你喜欢

Data analysis of time series (III): decomposition of classical time series

ARM架构与编程7--异常与中断(基于百问网ARM架构与编程教程视频)

Use pyod to detect outliers

ARM架构与编程1--LED闪烁(基于百问网ARM架构与编程教程视频)

Pytoch personal record (do not open)

Eigen multi version library installation

Introduction and use of Ninja

保存实质审查请求书出现Schema校验失败的解决方法

实用卷积相关trick

Using pycaret for data mining: association rule mining
随机推荐
UE4 solves the problem that the WebBrowser cannot play H.264
Notes | Baidu flying plasma AI talent Creation Camp: How did amazing ideas come into being?
2021可信隐私计算高峰论坛暨数据安全产业峰会上百家争鸣
从已有VOC2007数据集生成yolov3所需要的数据集,以及正式开始调试程序需要修改的地方
论文解读:《基于预先训练的DNA载体和注意机制识别增强子-启动子与神经网络的相互作用》
论文解读:《一种利用二核苷酸One-hot编码器识别水稻基因组中N6甲基腺嘌呤位点的卷积神经网络》
ARM架构与编程2--ARM架构(基于百问网ARM架构与编程教程视频)
时间序列的数据分析(三):经典时间序列分解
Check the sandbox file in the real app
with语句
Opencv library installation path (don't open this)
Using Google or tools to solve logical problems: Zebra problem
ARM架构与编程7--异常与中断(基于百问网ARM架构与编程教程视频)
Comparison between pytorch and paddlepaddle -- Taking the implementation of dcgan network as an example
The green data center "counting from the east to the west" was fully launched
Using or tools to solve the path planning problem with capacity constraints (CVRP)
Interpretation of the paper: using attention mechanism to improve the identification of N6 methyladenine sites in DNA
numpy总结
After the VR project of ue4.24 is packaged, the handle controller does not appear
matplotlib使用总结