当前位置:网站首页>Dgraph: large scale dynamic graph dataset
Dgraph: large scale dynamic graph dataset
2022-07-04 13:12:00 【Zhiyuan community】
In recent days, , Yang Yang's scientific research group of Zhejiang University (yangy.org) Hexin also jointly released a large-scale dynamic graph data set DGraph, Aimed at service graph neural network 、 Graph mining 、 Social networks 、 Researchers in the direction of anomaly detection , Provide large-scale data of real scenes .DGraph On the one hand, it can be used as the standard data to verify the performance of the correlation graph model , On the other hand, it can also be used to carry out user portrait 、 Network analysis and other research work .
Dataset home page :https://dgraph.xinye.com/
Github:
https://github.com/DGraphXinye/
Related papers :
DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection. Xuanwen Huang, Yang Yang*, Yang Wang, Chunping Wang, Zhisheng Zhang, Jiarong Xu, and Lei Chen. Preprint, 2022. (http://yangy.org/works/dgraph/dgraph_2022.pdf)
Data set description

DGraph The source data of is provided by Xinye Technology .DGraph It is a directed dynamic graph with no right , Contains more than 370 Ten thousand nodes and 430 Ten thousand dynamic edges . As shown in the figure below ,DGraph The node in represents the financial lending user of Xinye technology service , A directed edge indicates an urgent contact relationship , Each node contains the attribute characteristics after desensitization , And a label indicating whether it is a financial fraud user .
Data features
The scene is real
DGraph It comes from the real financial business scenario , Its construction logic is close to the industrial landing , It provides an opportunity for users of data sets to explore how to extend the graph model to the financial field . To be specific ,DGraph The proportion of abnormal and normal users in is about 1:100, Its “ The label is unbalanced ” The characteristics of the are in line with the real scene , Support exception detection 、 Research on classification of unbalanced nodes .
Structural dynamics
DGraph User relationships in are sampled from across 27 A business scenario for months , And the network structure will evolve over time , It provides data support for the current dynamic graph model and mining research .
Large scale
DGraph contain 370 Thousands of desensitized real financial lending users and 430 Ten thousand dynamic relationships , Its scale is about the largest dynamic graph data in the financial field Elliptic Of 17 times , Support the research and evaluation of large-scale graph models . Besides ,DGraph Contained in the 60% Of “ Background node ”, That is, it is not a classification or analysis object, but it actually exists 、 Nodes that have an indirect impact on business logic . These nodes play an important role in maintaining the connectivity of the network , Widely exists in industry . Reasonable processing of background nodes can effectively improve the storage space of data and the operation efficiency of the model in large-scale data scenarios .DGraph It contains more than 200 10000 background nodes , It can support researchers to explore the properties of background nodes .
Open source community maintenance
Ranking List
DGraph Users can submit at any time 、 Refreshed performance leaderboard (leaderboard), To track the research progress of the latest graph model . The list provides a unified evaluation process , All results are open and transparent .
Research results
DGraph It has rich characteristics , Support graph research in multiple directions .
Algorithm contest
Xinye technology revolves around DGraph The seventh Xinye Technology Cup algorithm competition was held , Task and DGraph The fraud user identification in is consistent . The competition is open to the whole society , Colleges and universities at home and abroad 、 Scientific research institutes 、 Internet enterprises can sign up for the competition , The bonus pool is abundant , total 31 Thousands of yuan .
Welcome interested colleagues to patronize DGraph Public data website , Work together to provide rich application data for the field of artificial intelligence , Work together to build an open digital ecosystem .

边栏推荐
- Why can the implementation class of abstractdispatcherservletinitializer be called when initializing the web container
- C language function
- Jetson TX2 configures common libraries such as tensorflow and pytoch
- 用fail2ban阻止密码尝试攻
- 16. Memory usage and segmentation
- AI painting minimalist tutorial
- Runc hang causes the kubernetes node notready
- 阿里云有奖体验:用PolarDB-X搭建一个高可用系统
- Annual comprehensive analysis of China's mobile reading market in 2022
- 从0到1建设智能灰度数据体系:以vivo游戏中心为例
猜你喜欢

Concepts and theories related to distributed transactions

Etcd storage, watch and expiration mechanism

C语言数组

C語言函數

8个扩展子包!RecBole推出2.0!

阿里云有奖体验:用PolarDB-X搭建一个高可用系统

Interviewer: what is the difference between redis expiration deletion strategy and memory obsolescence strategy?

美团·阿里关于多模态召回的应用实践

Xue Jing, director of insight technology solutions: Federal learning helps secure the flow of data elements

CTF竞赛题解之stm32逆向入门
随机推荐
干货整理!ERP在制造业的发展趋势如何,看这一篇就够了
Argminer: a pytorch package for processing, enhancing, training, and reasoning argument mining datasets
Apache server access log access Log settings
C语言函数
Two dimensional code coding theory
诸神黄昏时代的对比学习
Interviewer: what is the difference between redis expiration deletion strategy and memory obsolescence strategy?
面试官:Redis 过期删除策略和内存淘汰策略有什么区别?
Is the main thread the same as the UI thread- Is main thread the same as UI thread?
Master the use of auto analyze in data warehouse
AI 绘画极简教程
请问大佬们有遇到这个情况吗,cdc 1.4 连接MySQL 5.7 无法使用 timestamp
再说rsync+inotify实现数据的实时备份
[leetcode] 96 and 95 (how to calculate all legal BST)
强化学习-学习笔记1 | 基础概念
Meituan Ali's Application Practice on multimodal recall
【云原生 | Kubernetes篇】深入了解Ingress(十二)
Etcd storage, watch and expiration mechanism
I want to talk about yesterday
Will the concept of "being integrated" become a new inflection point of the information and innovation industry?