当前位置:网站首页>Consistency inspection and evaluation method kappa
Consistency inspection and evaluation method kappa
2022-07-27 00:44:00 【Hello, tomorrow,,】
Kappa Scoring measures the consistency between two scores . Quadratic weighting kappa It is calculated by using the scores assigned by the manual rater and the predicted scores . This indicator is from -1( The raters are completely inconsistent ) To 1( The raters are completely consistent ) Unequal .κ Is defined as :

among k It's the number of categories ,oi j and ei j They are the elements in the observation matrix and the expectation matrix .wi j The calculation is as follows :

because Cohens Kappa characteristic , Researchers must carefully explain this ratio . for example , If we consider that two pairs of raters have the same agreement percentage , But the rating ratio is different , We should know , This will greatly affect Kappa ratio . Another problem is the amount of code : As the number of code increases ,Kappa Become taller . Besides ,Kappa May be lower , Even if there is a high level of consistency , Even if the personal rating is accurate . All the above factors make Kappa Become an analyzable volatility . Use Kappa The main reason for the ratio is that we cannot get the label of the validation and test data set . Of these datasets Kappa Value is achieved by submitting our model and running code to Kaggle Obtained by the inspection system on the site . Besides , We cannot explicitly access the images in the test data set .
Generally, whether there is difference between the number of the same evaluation obtained by the model and the expectation based on possibility is analyzed , When the two models have the same number of evaluations and the number based on possibility expectation is basically the same ,kappa The value of is close to 1.
Take a chestnut , Model A And benchmark kappa:

kappa = (p0-pe) / (n-pe)
among ,P0 = Sum of observed values in diagonal cells ;pe = Sum of expected values in diagonal cells .
according to kappa The calculation method is divided into simple kappa(simple kappa) And weighting kappa(weighted kappa), weighting kappa It is divided into linear weighted kappa and quadratic weighted kappa.
weighted kappa
About linear still quadratic weighted kappa The choice of , Depending on your dataset class The significance of the difference between . For example, for fundus image recognition data ,class=0 For health ,class=4 It is very serious in the late stage of the disease , So for class=0 Predicted success 4 The punishment caused by your behavior should be far greater than class=0 Predicted success class=1 act , Use quadratic Words 0->4 The punishment is equal to 16 Times 0->1 The punishment . The following figure shows the comparison of two calculation methods of a four classification .

Python Code implementation :
#! /usr/bin/env python2.7
import numpy as np
def confusion_matrix(rater_a, rater_b, min_rating=None, max_rating=None):
"""
Returns the confusion matrix between rater's ratings
"""
assert(len(rater_a) == len(rater_b))
if min_rating is None:
min_rating = min(rater_a + rater_b)
if max_rating is None:
max_rating = max(rater_a + rater_b)
num_ratings = int(max_rating - min_rating + 1)
conf_mat = [[0 for i in range(num_ratings)]
for j in range(num_ratings)]
for a, b in zip(rater_a, rater_b):
conf_mat[a - min_rating][b - min_rating] += 1
return conf_mat
def histogram(ratings, min_rating=None, max_rating=None):
"""
Returns the counts of each type of rating that a rater made
"""
if min_rating is None:
min_rating = min(ratings)
if max_rating is None:
max_rating = max(ratings)
num_ratings = int(max_rating - min_rating + 1)
hist_ratings = [0 for x in range(num_ratings)]
for r in ratings:
hist_ratings[r - min_rating] += 1
return hist_ratings
def quadratic_weighted_kappa(rater_a, rater_b, min_rating=None, max_rating=None):
"""
Calculates the quadratic weighted kappa
quadratic_weighted_kappa calculates the quadratic weighted kappa
value, which is a measure of inter-rater agreement between two raters
that provide discrete numeric ratings. Potential values range from -1
(representing complete disagreement) to 1 (representing complete
agreement). A kappa value of 0 is expected if all agreement is due to
chance.
quadratic_weighted_kappa(rater_a, rater_b), where rater_a and rater_b
each correspond to a list of integer ratings. These lists must have the
same length.
The ratings should be integers, and it is assumed that they contain
the complete range of possible ratings.
quadratic_weighted_kappa(X, min_rating, max_rating), where min_rating
is the minimum possible rating, and max_rating is the maximum possible
rating
"""
rater_a = np.array(rater_a, dtype=int)
rater_b = np.array(rater_b, dtype=int)
assert(len(rater_a) == len(rater_b))
if min_rating is None:
min_rating = min(min(rater_a), min(rater_b))
if max_rating is None:
max_rating = max(max(rater_a), max(rater_b))
conf_mat = confusion_matrix(rater_a, rater_b,
min_rating, max_rating)
num_ratings = len(conf_mat)
num_scored_items = float(len(rater_a))
hist_rater_a = histogram(rater_a, min_rating, max_rating)
hist_rater_b = histogram(rater_b, min_rating, max_rating)
numerator = 0.0
denominator = 0.0
for i in range(num_ratings):
for j in range(num_ratings):
expected_count = (hist_rater_a[i] * hist_rater_b[j]
/ num_scored_items)
d = pow(i - j, 2.0) / pow(num_ratings - 1, 2.0)
numerator += d * conf_mat[i][j] / num_scored_items
denominator += d * expected_count / num_scored_items
return 1.0 - numerator / denominator
def linear_weighted_kappa(rater_a, rater_b, min_rating=None, max_rating=None):
"""
Calculates the linear weighted kappa
linear_weighted_kappa calculates the linear weighted kappa
value, which is a measure of inter-rater agreement between two raters
that provide discrete numeric ratings. Potential values range from -1
(representing complete disagreement) to 1 (representing complete
agreement). A kappa value of 0 is expected if all agreement is due to
chance.
linear_weighted_kappa(rater_a, rater_b), where rater_a and rater_b
each correspond to a list of integer ratings. These lists must have the
same length.
The ratings should be integers, and it is assumed that they contain
the complete range of possible ratings.
linear_weighted_kappa(X, min_rating, max_rating), where min_rating
is the minimum possible rating, and max_rating is the maximum possible
rating
"""
assert(len(rater_a) == len(rater_b))
if min_rating is None:
min_rating = min(rater_a + rater_b)
if max_rating is None:
max_rating = max(rater_a + rater_b)
conf_mat = confusion_matrix(rater_a, rater_b,
min_rating, max_rating)
num_ratings = len(conf_mat)
num_scored_items = float(len(rater_a))
hist_rater_a = histogram(rater_a, min_rating, max_rating)
hist_rater_b = histogram(rater_b, min_rating, max_rating)
numerator = 0.0
denominator = 0.0
for i in range(num_ratings):
for j in range(num_ratings):
expected_count = (hist_rater_a[i] * hist_rater_b[j]
/ num_scored_items)
d = abs(i - j) / float(num_ratings - 1)
numerator += d * conf_mat[i][j] / num_scored_items
denominator += d * expected_count / num_scored_items
return 1.0 - numerator / denominator
def kappa(rater_a, rater_b, min_rating=None, max_rating=None):
"""
Calculates the kappa
kappa calculates the kappa
value, which is a measure of inter-rater agreement between two raters
that provide discrete numeric ratings. Potential values range from -1
(representing complete disagreement) to 1 (representing complete
agreement). A kappa value of 0 is expected if all agreement is due to
chance.
kappa(rater_a, rater_b), where rater_a and rater_b
each correspond to a list of integer ratings. These lists must have the
same length.
The ratings should be integers, and it is assumed that they contain
the complete range of possible ratings.
kappa(X, min_rating, max_rating), where min_rating
is the minimum possible rating, and max_rating is the maximum possible
rating
"""
assert(len(rater_a) == len(rater_b))
if min_rating is None:
min_rating = min(rater_a + rater_b)
if max_rating is None:
max_rating = max(rater_a + rater_b)
conf_mat = confusion_matrix(rater_a, rater_b,
min_rating, max_rating)
num_ratings = len(conf_mat)
num_scored_items = float(len(rater_a))
hist_rater_a = histogram(rater_a, min_rating, max_rating)
hist_rater_b = histogram(rater_b, min_rating, max_rating)
numerator = 0.0
denominator = 0.0
for i in range(num_ratings):
for j in range(num_ratings):
expected_count = (hist_rater_a[i] * hist_rater_b[j]
/ num_scored_items)
if i == j:
d = 0.0
else:
d = 1.0
numerator += d * conf_mat[i][j] / num_scored_items
denominator += d * expected_count / num_scored_items
return 1.0 - numerator / denominator
def mean_quadratic_weighted_kappa(kappas, weights=None):
"""
Calculates the mean of the quadratic
weighted kappas after applying Fisher's r-to-z transform, which is
approximately a variance-stabilizing transformation. This
transformation is undefined if one of the kappas is 1.0, so all kappa
values are capped in the range (-0.999, 0.999). The reverse
transformation is then applied before returning the result.
mean_quadratic_weighted_kappa(kappas), where kappas is a vector of
kappa values
mean_quadratic_weighted_kappa(kappas, weights), where weights is a vector
of weights that is the same size as kappas. Weights are applied in the
z-space
"""
kappas = np.array(kappas, dtype=float)
if weights is None:
weights = np.ones(np.shape(kappas))
else:
weights = weights / np.mean(weights)
# ensure that kappas are in the range [-.999, .999]
kappas = np.array([min(x, .999) for x in kappas])
kappas = np.array([max(x, -.999) for x in kappas])
z = 0.5 * np.log((1 + kappas) / (1 - kappas)) * weights
z = np.mean(z)
return (np.exp(2 * z) - 1) / (np.exp(2 * z) + 1)
def weighted_mean_quadratic_weighted_kappa(solution, submission):
predicted_score = submission[submission.columns[-1]].copy()
predicted_score.name = "predicted_score"
if predicted_score.index[0] == 0:
predicted_score = predicted_score[:len(solution)]
predicted_score.index = solution.index
combined = solution.join(predicted_score, how="left")
groups = combined.groupby(by="essay_set")
kappas = [quadratic_weighted_kappa(group[1]["essay_score"], group[1]["predicted_score"]) for group in groups]
weights = [group[1]["essay_weight"].irow(0) for group in groups]
return mean_quadratic_weighted_kappa(kappas, weights=weights)边栏推荐
猜你喜欢
![[3. Basic search and first knowledge of graph theory]](/img/a2/dced231f746cc049d310e364a81856.png)
[3. Basic search and first knowledge of graph theory]

Lt9611ux Mipi to HDMI 2.0 dual port with audio
![[4.9 detailed explanation of inclusion exclusion principle]](/img/c9/673507abab48a1593486c2901adac9.png)
[4.9 detailed explanation of inclusion exclusion principle]

Signal and system learning zero input response

Use of postman

【4.10 博弈论详解】

2020-12-20 九九乘法表

Based on the theoretical principle and simulation results of MATLAB spherical decoding, compare 2norm spherical decoding, infinite norm spherical decoding, ML detection

Medical data of more than 4000 people has been exposed for 16 years

DOM day_01(7.7) dom的介绍和核心操作
随机推荐
动态联编和静态联编、以及多态
【4.10 博弈论详解】
13_ Ensemble learning and random forests
Apply with new, delete and malloc, free to free the heap space
postman的使用
[qt] container class, iterator, foreach keyword
ArcGIS and CASS realize elevation points of cross-section Exhibition
DOM day_04(7.12)BOM、打开新页面(延迟打开)、地址栏操作、浏览器信息读取、历史操作
C language to find prime numbers, leap years and minimum common multiples and maximum common divisors
[qt] attribute
9_逻辑回归(Logistic Regression)
Today's 20220719 toss deeplobcut
【4.9 容斥原理详解】
JSCORE day_02(7.1)
【 Educational Codeforces Round 132 (Rated for Div. 2) A·B·C】
继承,继承,继承
DOM day_ 04 (7.12) BOM, open new page (delayed opening), address bar operation, browser information reading, historical operation
Lt9611ux Mipi to HDMI 2.0 dual port with audio
[leetcode] no duplicate longest string
JSCORE day_04(7.5)