当前位置:网站首页>kettle引用外部脚本完成电话号码清洗、去重缩进
kettle引用外部脚本完成电话号码清洗、去重缩进
2022-07-27 16:28:00 【黑暗料理界的扛把子】
项目需要对电话是否真实进行判断,并去除重复项。
今天在项目当中引用java 脚本进行数据清洗时,创建 List<String> 之后程序就各种报错。
求大神指教~~~
于是参考了张小凡vip的kettle案例四使用java脚本进行数据处理,解决了遇到的问题。
1、电话清洗
电话分为两种:座机、手机
清洗规则如下:
正则匹配出其中全部的数字;删除首位非0数字前的全部0;针对剩余数字进行判断:
少于8位电话:删除;
8位电话:5,6,8开头保留,否则无效座机删除;
9位电话:1开头:若接5,6,8,错误的本地座机,保留后8位;若第2位不是5,6,8,错误的手机号,删除。非1开头:前面加0,异常的座机号;
10位电话:1位为1:2位为0或1:(3位为5,6,8,错误的本地座机,保留后8位;3位非5,6,8,错误手机号);2位非0或1:错误手机号,删除。1位非1,座机号,前面加0;
11位电话:1位为1:若2:3位为00,10,11,01,且4位5,6,8,那么本地座机,保留后8位;否则为手机号。1位非1,外地座机,前面添加0;
11位以上电话:1位为1:错误的手机号;1位非1,外地座机,前面添加0;
核心java代码如下:
//去重缩进需要用到的函数
package cyt.com.dudu.cyt;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class phoneClean {
//验证手机号码函数
public static final String CleaningPhone (String sPhone){
String regEx = "[^0-9]";//匹配指定范围内的数字
try {
//取出空格及- 对座机进行整理
//1、判断是否为空
if(sPhone.indexOf("空")!=-1){
return "空";
}else {
//利用正则去除除数字外的杂志
//Pattern是一个正则表达式经编译后的表现模式
Pattern p = Pattern.compile(regEx);
// 一个Matcher对象是一个状态机器,它依据Pattern对象做为匹配模式对字符串展开匹配检查。
Matcher m = p.matcher(sPhone);
String string = m.replaceAll(" ").trim();
//取出前面的零
String sPhone1 = string.replaceAll("\\s*", "");
System.out.println(sPhone1);
for(int b=0; b<sPhone1.length();b++){
System.out.println(b);
if(sPhone1.substring(0, b+1).equals("0")){
sPhone1 = sPhone1.substring(1);
System.out.println(sPhone1);
}else {
System.out.println("首位非0.跳出");
break ;
}
}
//判断前两位是否是86
if(sPhone1.substring(0, 2).equals("86")){
sPhone1 = sPhone1.substring(2);
System.out.println(sPhone1);
}else {
//前两位非86,判断其手机号长度
int phonenum = sPhone1.length();
System.out.println("手机号长度:" + phonenum );
if(phonenum<8){
sPhone1 = "空" ;
}else if(phonenum==8){
if(sPhone1.substring(0, 1).equals("5") ||sPhone1.substring(0, 1).equals("6") ||sPhone1.substring(0, 1).equals("8") ){
}else{
sPhone1 = "空" ;
}
}else if(phonenum==9){
if(sPhone1.substring(0, 1).equals("1") ){
if(sPhone1.substring(1, 2).equals("5") ||sPhone1.substring(1, 2).equals("6") ||sPhone1.substring(1, 2).equals("8")){
sPhone1 = sPhone1.substring(1);
System.out.println("1开头的9位手机号,删除1保留后八位:"+sPhone1);
}else{
sPhone1 = "空" ;
}
}else{
sPhone1 = "空" ;
}
}else if(phonenum==10){
if(sPhone1.substring(0, 1).equals("1") ){
if(sPhone1.substring(1, 2).equals("0") ||sPhone1.substring(1, 2).equals("1") ){
if(sPhone1.substring(2, 3).equals("5") ||sPhone1.substring(2, 3).equals("6") ||sPhone1.substring(2, 3).equals("8")){
sPhone1 = sPhone1.substring(2);
System.out.println("1开头,2位0或1,3位5、6、8的10位手机号,保留后八位:"+sPhone1);
}else{
sPhone1 = "空" ;
}
}else{
sPhone1 = "空" ;
}
}else{
sPhone1 = "0" + sPhone1 ;
}
}else if(phonenum==11){
if(sPhone1.substring(0, 1).equals("1") ){
if(sPhone1.substring(1, 2).equals("0") ||sPhone1.substring(1, 2).equals("1") ){
if(sPhone1.substring(2, 3).equals("0") ||sPhone1.substring(2, 3).equals("1") ){
if(sPhone1.substring(3, 4).equals("5") ||sPhone1.substring(3, 4).equals("6") ||sPhone1.substring(3, 4).equals("8"))
sPhone1 = sPhone1.substring(3);
System.out.println("1开头,2、3位0或1,4位5、6、8的11位座机号,保留后八位:"+sPhone1);
}else{
sPhone1 = "空" ;
}
sPhone1 = sPhone1.substring(1);
System.out.println("1开头的9位手机号,删除1保留后八位:"+sPhone1);
}else{
System.out.println("十一位手机号:"+ sPhone1 );
}
}else{
sPhone1 = "0" + sPhone1 ;
}
}else if(phonenum>11){
if(sPhone1.substring(0, 1).equals("1") ){
sPhone1 = "空" ;
}else{
sPhone1 = "0" + sPhone1 ;
System.out.println(sPhone1);
}
}
}
//整理完成后的电话号码
System.out.println("整理完成后的电话号码:" + sPhone1 );
return sPhone1;
}
} catch (Exception e) {
return "空";
}
}
}
2、电话去重
本想使用数组进行去重,返回一个 list 数组。奈何 kettle 报了关于数组奇怪的错,所以采用 StringBuffer 完成拼接
public static final String toReenter(String phone1,String phone2,String phone3,String phone4,String phone5 ){
List<String> phoneListTest = new ArrayList<>();
phoneListTest.add(phone1);
phoneListTest.add(phone2);
phoneListTest.add(phone3);
phoneListTest.add(phone4);
phoneListTest.add(phone5);
StringBuffer phoneList = new StringBuffer();
for(int b=0;b<5;b++){
if(b==0){
phoneList.append(phoneListTest.get(b).toString());
phoneList.append(",");
}else {
if(phoneList.indexOf(phoneListTest.get(b).toString())!=-1){
}else {
phoneList.append(phoneListTest.get(b).toString());
phoneList.append(",");
}
}
}
return phoneList.toString() ;
}3、kettle 实现
kettle 需要对字段是否为空进行判断,以前出过关于字段为空的各种问题,所以此次直接对含有空值的字段进行判断,赋值。

核心的功能已经上传到我的博客当中,可以自己去找。
边栏推荐
- Big enemies, how to correctly choose the intended enterprises in the new testing industry?
- Day 3 of leetcode question brushing
- Latex use - subfigure vertical graphics
- Some advice for NS2 beginner.
- SSM项目使用过滤器实现登录监听
- WinForm remove the close button in the upper right corner
- ES6学习笔记(1)——快速入门
- idea优化小攻略
- "Testing novice encyclopedia" 5-minute quick start pytest automated testing framework
- Nodejs template engine EJS
猜你喜欢

Self control principle learning notes - system stability analysis (2) - loop analysis and Nyquist bode criterion

`this.$ Emit ` the child component passes multiple parameters to the parent component

Kinect for Unity3d----KinectManager

Sentinel1.8.4 persistent Nacos configuration

新系统安装MySQL+SQLyog

Extension of regular expression

An article allows you to master threads and thread pools, and also solves thread safety problems. Are you sure you want to take a look?

收下这份实操案例,还怕不会用Jmeter接口测试工具

正则表达式的扩展

阿里云视频点播服务的开通和使用
随机推荐
Big enemies, how to correctly choose the intended enterprises in the new testing industry?
Some advice for NS2 beginner.
Micaz+tinyos learning notes (1)
`this.$ Emit ` the child component passes multiple parameters to the parent component
Introduction to assembly language (1)
Double insurance for line breaking
CMD command
转行软测&跳槽到新公司,工作怎样快速上手?
Unity显示Kinect捕获的镜头
Day 3 of leetcode question brushing
NPM, cnpm Taobao image
Unity display Kinect depth data
Useful resources for ns2
C语言打印菱形
The understanding of string in C.
用函数在Excel中从文本字符串提取数字
Normal distribution, lognormal distribution, generation of normal random numbers
SSM integration
自控原理学习笔记-系统稳定性分析(1)-BIBO稳定及Routh判据
express