当前位置:网站首页>Learning notes of SAS programming and data mining business case 19
Learning notes of SAS programming and data mining business case 19
2022-07-05 20:51:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm the king of the whole stack , I've prepared for you today Idea Registration code .
continue 《SAS Programming with data Mining business Case study 》 Study note , This paper focuses on data processing practice . contain :HASH object 、 Define your own format、 And powerful regular expressions
One :HASH object
Hash Objects are also called hash tables , It is a data structure that can be accessed directly according to the key value . It is a data structure that can be accessed directly according to the key value .
sas Two classes are provided to handle hash tables . Used to store data hash And for traversal hiter,hash Class provides a way to find 、 Join in 、 changes 、 Delete and so on ,hiter Provides for positioning and traversal first、next Other methods .
Strengths : The key value is searched in memory , It's good for improving performance ;
hash Tables can be executed in data steps , Dynamically add, update or delete observations .
hash The table can locate data very quickly , Reduce the number of searches .
Often used :
definekey: Define key
Definedata: Definition value
definedone: That's it . Able to load data
add: Add key value . If in hash Table already exists , It ignores ;
replace: Suppose you are alive hash Presence in table , The replacement . If it does not exist, add the key value
remove: Clear key value pair
find: Find health value , If it exists, write the value to the corresponding variable
check: Find key values , If it exists, it returns rc=0, Do not change the value of the current variable ;
output: take hash Table output to dataset
clear: Empty hash surface , But the object is not deleted
equal: Infer two hash Are classes equal
find Demonstration examples of methods :
libname chapt12 ‘f:\data_model\book_data\chapt12’;
data results;
if _n_=0 then set chapt12.participants;
if _n_ = 1 then do;
declare hash h(dataset:’chapt12.participants’);
h.definekey(‘name’);
h.definedata(‘gender’, ‘treatment’);
h.definedone();
end;
set chapt12.weight;
if h.find() = 0 then
output;
run;
hiter An example of an object :
data patients;
length patient_id $ 16 discharge 8;
input patient_id discharge:date9.;
datalines;
smith-4123 15mar2004
hagen-2834 23apr2004
smith-2437 15jan2004
flinn-2940 12feb2004
;
data _null_;
if _n_=0 then set patients;
declare hash ht(dataset:”patients”,ordered:”ascending”);
ht.definekey(“patient_id”);
ht.definedata(“patient_id”, “discharge”);
ht.definedone();
declare hiter iter(“ht”);
rc = iter.first();
do while (rc=0);
put patient_id discharge:date9.;
rc = iter.next();
end;
run;
use declare hiter iter(“ht”); to hash surface ht Defines a traverser iter, Then call first Method locates the iterator to hash The first observation of the table , And then use next Methods through hash All records in the table and output .
Business practice – Merging of two data sets :
data both1(drop=rc);
declare hash plan ();
rc = plan.definekey (‘plan_id’);
rc = plan.definedata (‘plan_desc’);
rc = plan.definedone ();
do until (eof1) ;
set chapt12.plans end = eof1;
rc = plan.add ();
end;
do until (eof2) ;
set chapt12.members end = eof2;
call missing(plan_desc);
rc = plan.find ();
output;
end;
stop;
run;
The above procedure can be simplified to :
data both2;
length plan_id 3 plan_desc 20;
if _n_ = 1 then do;
declare hash h(dataset:’chapt12.plans’);
h.definekey(‘plan_id’);
h.definedata(‘plan_desc’);
h.definedone();
call missing(plan_desc);
end;
set chapt12.members;
rc=h.find();
run;
Two :format
Define your own format:
Proc Format;
Value $ Sex_Fmt
‘F’=’ Woman ‘
‘M’=’ male ‘
Other = ‘ Unknown ‘;
Value Age_Dur
Low-10=”10 Under the age of “
11-13=”11-13 year “
14-<15=”14-15″
15-High=”15 Years of age or older “;
Run;
application :
Data test;
Set sashelp.class(keep=sex age);
x=put(sex,$sex_fmt);y=put(age,age_dur.);
Run;
3、 ... and : Regular expressions :
/…/ The beginning and end of a regular expression .
| Choice between several items ,“ or ” operation ;
() Match group , Mark the beginning and end of a subexpression .
. Random characters other than line breaks .
\w Any word character , Numbers, upper and lower case letters, and underscores
\W Any non word character
\s Any white space character , Include spaces 、 tabs 、 A newline 、 A carriage return 、 Chinese full corner space, etc ;
\S Any non white space character ,
\d 0-9 Any number
\D Any non numeric character
[…]
[^…]
[a-z] from a To z
[^a-z] Not from a To z Random characters in the range
^ Match the starting position of the input string
$ Matches the end of the input string
\b Describe the front or back boundaries of narrative words
\B Indicates a non word boundary
* matching 0 Times or times
+ Match once or more
? Match zero times or once
{n} matching n Time
{n,} matching n More than once
{n,m} matching n To m Time
Functions are often used :
Prxparse Define a regular expression
Prxmatch Return the first matching position of the matching pattern
Call prxsubstr Returns the starting position and length of the matching pattern in the target string
Prxposn Returns the matching pattern value corresponding to the regular expression sub expression
Call prxposn Returns the corresponding matching pattern and length of the regular expression sub expression
Cal l prxnext Returns multiple matching positions and lengths of the matching pattern in the target string
Prxchange Replace the value of the matching pattern
Call prxchange Replace the value of the matching pattern
eg1:
data _null_;
if _n_ = 1 then pattern_num = rxparse(“/cat/”);
retain pattern_num;
input string $30.;
position = rxmatch(pattern_num,string);
file print;
put pattern_num= string= position=;
datalines;
there is a cat in this line.
does not match cat
cat in the beginning
at the end, a cat
cat
;
run;
eg2: data validation
data match_phone;
set chapt12.phone_numbers;
if _n_ = 1 then pattern = prxparse(“/\(\d\d\d\) ?
\d\d\d-\d{4}/”);
retain pattern;
if prxmatch(pattern,phone) gt 0 then output;
run;
Find out the mismatched mobile phone number
data unmatch_phone;
set chapt12.phone_numbers;
where not prxmatch(“/\(\d\d\d\) ?
\d\d\d-\d{4}/”,phone);
run;
Eg3: Extract a string that matches a pattern
data extract;
if _n_ = 1 then do;
pattern = prxparse(“/\(\d\d\d\) ?
\d\d\d-\d{4}/”);
if missing(pattern) then do;
put “error in compiling regular expression”;
stop;
end;
end;
retain pattern;
length number $ 15;
input string $char80.;
call prxsubstr(pattern,string,start,length);
if start gt 0 then do;
number = substr (string,start,length);
number = compress(number,” “);
output;
end;
keep number;
datalines;
this line does not have any phone numbers on it
this line does: (123)345-4567 la di la di la
also valid (123) 999-9999
two numbers here (333)444-5555 and (800)123-4567
;
run;
eg4: Pick up the name
data ReversedNames;
input name & $32.;
datalines;
Jones, Fred
Kavich, Kate
Turley, Ron
Dulix, Yolanda
;
data FirstLastNames;
length first last $ 16;
keep first last;
retain re;
if _N_ = 1 then
re = prxparse(‘/(\w+), (\w+)/’);
set ReversedNames;
if prxmatch(re, name) then
do;
last = prxposn(re, 1, name);
first = prxposn(re, 2, name);
end;
run;
notes :1,2 Each represents two groups in the regular expression
eg5: Extract qualified names
data old;
input name $60.;
datalines;
Judith S Reaveley
Ralph F. Morgan
Jess Ennis
Carol Echols
Kelly Hansen Huff
Judith
Nick
Jones
;
data new;
length first middle last $ 40;
re1 = prxparse(‘/(\S+)\s+([^\s]+\s+)?(\S+)/o’);
re2 = prxparse(‘/(\S+)(\s+)([^\s]+\s+)(?)(\S+)/o’);
set old;
id1=prxmatch(re1, name);
id2=prxmatch(re2, name);
if id1 then
do;
first = prxposn(re1, 1, name);
middle = prxposn(re1, 2, name);
last = prxposn(re1, 3, name);
end;
if id2 then test=prxposn(re1, 4, name);
put test=;
run;
Eg6: Return multiple locations of the matching pattern
data _null_;
expressionid = prxparse(‘/[crb]at/’);
text = ‘the woods have a bat, cat, and a rat!’;
start = 1;
stop = length(text);
call prxnext(expressionid, start, stop, text, position, length);
do while (position > 0);
found = substr(text, position, length);
put found= position= length=;
call prxnext(expressionid, start, stop, text, position, length);
end;
run;
notes : First run call prxnext Return to one position, Then enter the cycle , In extracting substrings that meet the conditions . Run again all prxnext, The next matching position;
Eg7: replace text
data cat_and_mouse;
input text $char40.;
length new_text $ 80;
if _n_ = 1 then match = prxparse(“s/[Cc]at/mouse/”);
retain match;
call prxchange(match,-1,text,new_text,len,trunc,num);
if trunc then put “note: new_text was truncated”;
datalines;
the Cat in the hat
there are two cat cats in this line
here is no replacement
;
run;
Copyright notice : This article is an original blog article . Blog , Without consent , Shall not be reproduced .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/117664.html Link to the original text :https://javaforall.cn
边栏推荐
- 序列联配Sequence Alignment
- MySQL InnoDB架构原理
- Norgen AAV提取剂盒说明书(含特色)
- 解析创客教育的知识迁移和分享精神
- Abnova丨荧光染料 620-M 链霉亲和素方案
- 教你自己训练的pytorch模型转caffe(二)
- Promouvoir le développement de l'industrie culturelle et touristique par la recherche, l'apprentissage et l'enseignement pratique du tourisme
- 最长摆动序列[贪心练习]
- Duchefa cytokinin dihydrozeatin (DHZ) instructions
- Mathematical analysis_ Notes_ Chapter 9: curve integral and surface integral
猜你喜欢

Duchefa p1001 plant agar Chinese and English instructions

表单文本框的使用(二) 输入过滤(合成事件)

AI 从代码中自动生成注释文档

使用WebAssembly在浏览器端操作Excel

Make Jar, Not War

leetcode:1755. 最接近目标值的子序列和

2.<tag-哈希表, 字符串>补充: 剑指 Offer 50. 第一个只出现一次的字符 dbc

Duchefa cytokinin dihydrozeatin (DHZ) instructions
MySQL fully parses json/ arrays
![[quick start of Digital IC Verification] 2. Through an example of SOC project, understand the architecture of SOC and explore the design process of digital system](/img/1d/22bf47bfa30b9bdc2e8fd348180f49.png)
[quick start of Digital IC Verification] 2. Through an example of SOC project, understand the architecture of SOC and explore the design process of digital system
随机推荐
基于AVFoundation实现视频录制的两种方式
重上吹麻滩——段芝堂创始人翟立冬游记
渗透创客精神文化转化的创客教育
Redis唯一ID生成器的实现
最长摆动序列[贪心练习]
Duchefa MS medium contains vitamin instructions
Abnova maxpab mouse derived polyclonal antibody solution
解析五育融合之下的steam教育模式
证券开户选择哪个证券比较好?网上开户安全么?
Abnova CRISPR spcas9 polyclonal antibody protocol
[quick start of Digital IC Verification] 2. Through an example of SOC project, understand the architecture of SOC and explore the design process of digital system
Implementation of redis unique ID generator
Abnova DNA marker high quality control test program
Hongmeng OS' fourth learning
Analyze the knowledge transfer and sharing spirit of maker Education
Usaco3.4 "broken Gong rock" band raucous rockers - DP
当Steam教育进入个性化信息技术课程
leetcode:1755. 最接近目标值的子序列和
Matplotlib drawing retouching (how to form high-quality drawings, such as how to set fonts, etc.)
Abnova cyclosporin a monoclonal antibody and its research tools