[Logistic classification] 개념

Peter Note

Web & LLM FullStacker, Application Architecter, KnowHow Dispenser and Bike Rider

Publication

[Logistic classification] 개념

Logistic Classification 강좌를 정리한다.

Regression vs Classification

- Regression: 숫자를 예측

- Classification: 정해진 카테고리를 정하는 것: Pass(1) 또는 Fail(0) 으로 판단한다.

ex) Spam Detection, Facebook feed, Credit Card Fraud Detection

Linear Regression에서 x값이 너무 커서 Output이 1보다 커지는 것을 방지하기 위해 H(x) 를 z로 놓고, g(z)의 결과가 0 ~ 1 사이에 수렴되는 것을 sigmoid(시그모이드) 함수라 하고, Logistic function 이라고도 한다.

수식은 다음과 같다.

New Cost function for Logistic Classification

기존 Linear Regression과 Sigmoid function을 적용했을 때의 모습

- 기울기가 평평해 지는 지점에서 멈춤. 그러나 sigmoid에서는 기존 cost function을 적용하면 울퉁 불퉁하므로 시작점에 따라서 종료점(최소) 구간이 틀려질 수 있다. 즉 training을 멈추게 된다.

따라서 Hypethesis를 변경했기 때문에 Cost function도 변경한다.

- y:1 일때 예측이 틀릴경우 즉 H(x) = 0이면 cost 는 무한대가 된다.

- y:0 일때 H(x) =0 이면 cost는 0이되고, H(x) = 1 이면 cost는 무한대가 된다.

즉 잘 못 예측되면 cost가 무한대로 커진다.

위의 수식을 y=0일때와 y=1일때 합치면 경사타고 내려가기인 그릇모양이 된다.

공식에 대한 구현 코드는 다음과 같다.

Logistic classifier를 Tensorflow로 구현하기

Y데이터 값으로 0 또는 1을 가지면 binary classification이다. None은 n개를 의미한다. (예제)

수식을 tensorflow 코드로 구현한다.

- placeholder를 만들고

- sigmoid (logistic) function 구현

- new cost function 구현

- GradientDescenOptimizer를 이용한 경사 내려가기 미분 구현

결과: 1=true, 0=false

예제

import tensorflow as tf

import numpy as np

tf.set_random_seed(777) # for reproducibility

xy = np.loadtxt('data-03-diabetes.csv', delimiter=',', dtype=np.float32)

x_data = xy[:, 0:-1]

y_data = xy[:, [-1]]

print(x_data.shape, y_data.shape)

# placeholders for a tensor that will be always fed.

X = tf.placeholder(tf.float32, shape=[None, 8])

Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([8, 1]), name='weight')

b = tf.Variable(tf.random_normal([1]), name='bias')

# Hypothesis using sigmoid: tf.div(1., 1. + tf.exp(-tf.matmul(X, W)))

hypothesis = tf.sigmoid(tf.matmul(X, W) + b)

# cost/loss function

cost = -tf.reduce_mean(Y * tf.log(hypothesis) + (1 - Y) *

tf.log(1 - hypothesis))

train = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost)

# Accuracy computation

# True if hypothesis>0.5 else False

predicted = tf.cast(hypothesis > 0.5, dtype=tf.float32)

accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, Y), dtype=tf.float32))

# Launch graph

with tf.Session() as sess:

# Initialize TensorFlow variables

sess.run(tf.global_variables_initializer())

for step in range(10001):

cost_val, _ = sess.run([cost, train], feed_dict={X: x_data, Y: y_data})

if step % 200 == 0:

print(step, cost_val)

# Accuracy report

h, c, a = sess.run([hypothesis, predicted, accuracy],

feed_dict={X: x_data, Y: y_data})

print("\nHypothesis: ", h, "\nCorrect (Y): ", c, "\nAccuracy: ", a)

참조

- 김성훈교수님의 Logistic Classification 함수 강좌

- Github Logistic regression classification 예제, 파일 읽기 예제

'Deep Learning > NN by Sung Kim' 카테고리의 다른 글

[Multinomial Classification] Softmax 실습 (0)	2018.07.05
[Mutinomial Classification] 개념 (0)	2018.07.04
[Linear Regression] Multi Variable 개념 및 실습 (0)	2018.07.04
[Linear Regression] Tensorflow를 이용한 Minimized Cost 실습 (0)	2018.07.03
[Linear Regression] Minimize Cost (0)	2018.07.03

posted by Peter Note

AI Convergence

Publication

Tag

Category

Recent Post

[Logistic classification] 개념

'Deep Learning > NN by Sung Kim' 카테고리의 다른 글

티스토리툴바

AI Convergence

Publication

Tag

Search

Category

Recent Post

[Logistic classification] 개념

'Deep Learning > NN by Sung Kim' 카테고리의 다른 글

티스토리툴바