TensorMSA Guide – AutoEncoder

Use AutoEncoder for Anomaly Detection

concept of this algorithm

AutoEncoder 를 정형 데이터 적용하기 위한 알고리즘을 제공한다. AutoEncoder는 Unsupervised 형태의 훈련 알고리즘으로 별도의 레이블 값 없이 Encoder 와 Decoder 형태의 모델로 인풋 데이터와 같은 아웃풋을 다시 생성해 내는 것을 목표로 하는 알고리즘이다.Anomlay Detection 의 경우 데이터의 분포가 매우 불균형한 바이너리 클레시피케이션 문제를 풀기 위한 방법의 하나이다. AutoEncoder 로 Anomlay Detection 문제를 접근 할 수 있는데, 풍부한 데이터 레이블을 기준으로 훈련하여 해당 데이터를 잘 설명 할 수 있도록 훈련하여 자신의 데이터를 잘 복월할 수 있도록 훈련한다. 모델을 사용시에는 Feed Worwarding 하여 Decoder 에서 복원된 데이터와 입력한 데이터의 백터간의 유사도 차이가 기준치보다 복원을 잘 못할 경우 비정상 데이터로 판별하는 방법으로 사용할 수 있다.

APIs – Define Neural Network

import requests
import json, os

nn_id = 'nn992798' 

####(1) 네트워크 생성 ####
resp = requests.post('http://' + url + '/api/v1/type/common/target/nninfo/nnid/' + nn_id + '/',
                     json={
                         "biz_cate": "MES",
                         "biz_sub_cate": "M60",
                         "nn_title" : "test",
                         "nn_desc": "test desc",
                         "use_flag" : "Y",
                         "dir": "autoencoder_csv",
                         "config": "N"
                     })
data = json.loads(resp.json())
print("evaluation result : {0}".format(data))

####(2) 버전 생성 ####
resp = requests.post('http://' + url + '/api/v1/type/common/target/nninfo/nnid/' + nn_id + '/version/',
                 json={
                     "nn_def_list_info_nn_id": "",
                     "nn_wf_ver_info": "test version info",
                     "condition": "1",
                     "active_flag": "Y"
                 })
data = json.loads(resp.json())
print("evaluation result : {0}".format(data))

신경망에 대한 정의를 하는 작업은 모든 종류의 알고리즘이 동일하다.

APIs – Define Graph Flow

# Work Flow 틀을 구성하도로고 지시한다. (정해진 틀을 강제로 생성)
resp = requests.post('http://' + url + '/api/v1/type/wf/target/init/mode/simple/' + nn_id +'/wfver/1/',
                     json={
                         "type": 'autoencoder_csv'
                     })
data = json.loads(resp.json())
print("evaluation result : {0}".format(data))

심플 타입으로 그래프 플로우를 생성하는 경우 별도의 세팅은 필요하지 않다

APIs – upload train file

return_dict = {}
return_dict['test'] = open('../../data/seq2seq_mansearch_3.csv', 'rb')

resp = requests.post('http://' + url + '/api/v1/type/wf/state/framedata/src/local/form/raw/prg/source/nnid/'+nn_id+'/ver/1/node/datasrc/',
                     files = return_dict)

data = json.loads(resp.json())
print("evaluation result : {0}".format(data))

훈련에 사용할 데이터를 간단하게 로컬에서 REST API 를 통해 업로드 가능하다

APIs – data parameter set up

resp = requests.put('http://' + url + '/api/v1/type/wf/state/framedata/src/local/form/raw/prg/source/nnid/'+nn_id+'/ver/1/node/datasrc/',
                     json={
                         "type": "csv",
                         "source_server": "local",
                         "source_sql": "all",
                         "preprocess":  "none",
                     })

데이터를 이해하기 위한 (소스정보, 접속정보 등)을 정의한다.
로컬 파일이기 때문에 별도의 ETL 을 위한 설정은 필요하지 않다.

APIs – data feeder parameter set up

resp = requests.post('http://' + url + '/api/v1/type/wf/state/pre/detail/feed/src/frame/net/autoencoder/nnid/'+nn_id+'/ver/1/node/feed_train/',
 json={ 
 "encode_column" : ["PRODUCT_CD","CUR_FAC_OP_CD","MC_NO","CAST_STR_NUM","SM_STEEL_GRD","CC_UNCOND_SF_MTH"],
 "vocab_size" : 10,
 "preprocess": "frame",
 "embed_type" : 'onehot'
 })
data = json.loads(resp.json())
print("evaluation result : {0}".format(data))

데이터를 이해하기 위한 정보를 정의한다.
-encode_column : Data Frame 에서 분석에 실제로 사용하고자 하는 컬럼을 지정한다.
-vocab_size : 카테고리형 데이터는 자동으로 벡터로 변환한다. 그때 최대 버킷사이즈
-embed_type : 카테고리 데이터 임배딩시 백터 표현 방법

APIs – network parameter set up

resp = requests.put('http://' + url + '/api/v1/type/wf/state/netconf/detail/autoencoder/nnid/' + nn_id + '/ver/1/node/netconf_node/',
                     json={
                        "learning_rate" : 0.01,
                        "iter" : 10,
                        "batch_size" : 10,
                        "examples_to_show" : 10,
                        "n_hidden" : [200, 100] 
                     })
data = json.loads(resp.json())
print("evaluation result : {0}".format(data))

오토인코더의 경우 기본적인 하이퍼 파라메터 외에 히든 사이즈가 중요하게 생각되는데 간단하게 Array 형태로 지정하여 사용할 수 있다. 배열의 사이즈가 히든사이즈의 Depth 이며 그 값이 각 레이어의 퍼셉트론의 수가 되겠다.

APIs – Run Train

resp = requests.post('http://' + url + '/api/v1/type/runmanager/state/train/nnid/'+nn_id+'/ver/1/')
data = json.loads(resp.json())

print("evaluation result : {0}".format(data))

지금까지 정의한 파이프라인에 따라 Job을 실행한다.