当前位置：网站首页>Code implementation of sorting and serializing cases in MapReduce

Code implementation of sorting and serializing cases in MapReduce

2022-06-10 16:00:00 【QYHuiiQ】

The case we want to implement here is to sort the students' names first （ Dictionary sort ）, In case of duplicate names , Then sort the age （ Ascending ）.

Upload the original data file to HDFS

[[email protected] test_data]# hdfs dfs -mkdir /test_comparation_input
[[email protected] test_data]# hdfs dfs -put test_comparation.txt /test_comparation_input

This is where the original 5 The row data is based on the name ascii Dictionary sort by value , For duplicate names Bob, It's going to be good for 2 Xing He 4 OK, two Bob Sort twice by age .

newly build project：

introduce pom rely on

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>wyh.test</groupId>
    <artifactId>test_comparation</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
    </properties>

    <packaging>jar</packaging>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>RELEASE</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <minimizeJar>true</minimizeJar>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

Custom implementation of sorting and serialization Bean class

package wyh.test.comparation;

import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class ComparationBean implements WritableComparable<ComparationBean> {

    // Definition Bean Properties to be included in the object 
    private String studentName;
    private int age;

    public String getStudentName() {
        return studentName;
    }

    public void setStudentName(String studentName) {
        this.studentName = studentName;
    }

    @Override
    public String toString() {
        // The original format is not used here , We redefined ourselves toString The format of 
        return studentName + "\t" + age;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    // Define collation , We just need to define comparison rules , We don't need to care about how to call 
    @Override
    public int compareTo(ComparationBean comparationBean) {
        // First use String Class compareTo() Implementation string studentName Sort 
        int compareResult = this.studentName.compareTo(comparationBean.getStudentName());
        // When studentName The comparison is 0 when , Description duplicate name , Then compare age Value 
        if(compareResult == 0){
            return this.age - comparationBean.getAge();
        }
        return compareResult;
    }

    // This method is used to implement serialization , Convert the original data into byte stream 
    @Override
    public void write(DataOutput dataOutput) throws IOException {
        // take Bean The first property in the object implements serialization , For string serialization , It uses writeUTF()
        dataOutput.writeUTF(studentName);
        // take Bean The second attribute in the object implements serialization , about int Serialization of types , It uses writeInt()
        dataOutput.writeInt(age);
    }

    // This method is used to implement deserialization 
    @Override
    public void readFields(DataInput dataInput) throws IOException {
        // Assign the deserialized value to the member variable 
        this.studentName = dataInput.readUTF();
        this.age = dataInput.readInt();

    }
}

Customize Mapper class

package wyh.test.comparation;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

// there K2 The type is customized by us Bean（ Because it is necessary to realize sorting and serialization ）, This Bean The properties in the object come from V1.
public class ComparationMapper extends Mapper<LongWritable, Text, ComparationBean, NullWritable> {
    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, ComparationBean, NullWritable>.Context context) throws IOException, InterruptedException {
        // take V1 The row data of is split and extracted into two attributes , Assign to custom Bean object 
        String[] split = value.toString().split(",");
        ComparationBean comparationBean = new ComparationBean();
        comparationBean.setStudentName(split[0]);
        comparationBean.setAge(Integer.parseInt(split[1]));
        // take K2,V2 write in context object 
        context.write(comparationBean, NullWritable.get());
    }
}

Customize Reducer class

package wyh.test.comparation;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

// In our case reduce No need. K2,V2 Do further processing , So here will be K2,V2 Assign a value to K3,V3 that will do 
public class ComparationReducer extends Reducer<ComparationBean, NullWritable, ComparationBean, NullWritable> {
    @Override
    protected void reduce(ComparationBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
        context.write(key, NullWritable.get());
    }
}

Customize JobMain class

package wyh.test.comparation;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.OutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class ComparationJobMain extends Configured implements Tool {
    @Override
    public int run(String[] strings) throws Exception {
        Job job = Job.getInstance(super.getConf(), "test_comparation_job");
        job.setJarByClass(ComparationJobMain.class);
        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.addInputPath(job, new Path("hdfs://192.168.126.132:8020/test_comparation_input"));
        job.setMapperClass(ComparationMapper.class);
        job.setMapOutputKeyClass(ComparationBean.class);
        job.setMapOutputValueClass(NullWritable.class);
        job.setReducerClass(ComparationReducer.class);
        job.setOutputKeyClass(ComparationBean.class);
        job.setOutputValueClass(NullWritable.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        TextOutputFormat.setOutputPath(job, new Path("hdfs://192.168.126.132:8020/test_comparation_output"));
        boolean status = job.waitForCompletion(true);
        return status?0:1;
    }

    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        // start-up job Mission 
        int runStatus = ToolRunner.run(configuration, new ComparationJobMain(), args);
        System.exit(runStatus);
    }
}

pack project

clean---package

take jar Upload to server

function jar

[[email protected] test_jar]# hadoop jar test_comparation-1.0-SNAPSHOT.jar wyh.test.comparation.ComparationJobMain

see HDFS Directory tree

View the output result file

[[email protected] test_jar]# hdfs dfs -cat /test_comparation_output/part-r-00000

You can see that the names are sorted first , And then the same name Bob It will be sorted twice according to the ascending order of age ：

In this way, it is easy to realize MapReduce Sorting and serialization functions in .

原网站

版权声明
本文为[QYHuiiQ]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/161/202206101527452137.html

当前位置：网站首页>Code implementation of sorting and serializing cases in MapReduce

Code implementation of sorting and serializing cases in MapReduce

边栏推荐

猜你喜欢

随机推荐