AI Convergence :: [Data2Vis] tf-seq2seq를 사용하여 자동으로 차트 만들기

2018. 8. 21. 16:52 Deep Learning/Read Paper

[Data2Vis] tf-seq2seq를 사용하여 자동으로 차트 만들기 - 2

Data2Vis 논문에 대한 개념을 알아본 후 다른 곳에 응용을 하려면 어떻게 어떤 단계를 거쳐서 진행해야 할지 실험을 해본다.

준비

컴파일 환경

Python v3.7

Tensorflow v1.9

Anaconda기반에서 구동한다.

Step-1) 모델 환경설정

train_options.json 에 정의된 Model의 파라미터 내용

- Data2Vis는 Attention 메카니즘을 가지는 Encoder-Decoder 아키텍쳐이다.

- 2-layer bidirectional RNN encoder/decoder를 사용한다.

- GRU보다 LSTM이 보다 좋은 성능을 나타내서 LSTM을 사용한다.

Loss(Cost)와 Training 함수를 포함한 모델(Model)은 AttentionSeq2Seq를 사용하고, 해당 모델에 대한 환경설정 파일은 example_configs/nmt_large.yml에 정의되어 있다.

- 데이터: source와 target 정보의 위치를 지정한다.

- 가설/비용 함수: Encoder/Decoder 를 구성하고 inference 파라미터등도 설정한다. Encoder/Decoder의 Cell은 LSTMCell을 사용한다.

- Training 함수: Adam optimizer 사용

	model: AttentionSeq2Seq
	model_params:
	attention.class: seq2seq.decoders.attention.AttentionLayerBahdanau
	attention.params:
	num_units: 512
	bridge.class: seq2seq.models.bridges.ZeroBridge
	embedding.dim: 512
	encoder.class: seq2seq.encoders.BidirectionalRNNEncoder
	encoder.params:
	rnn_cell:
	cell_class: LSTMCell
	cell_params:
	num_units: 512
	dropout_input_keep_prob: 0.5
	dropout_output_keep_prob: 1.0
	num_layers: 2
	decoder.class: seq2seq.decoders.AttentionDecoder
	decoder.params:
	rnn_cell:
	cell_class: LSTMCell
	cell_params:
	num_units: 512
	dropout_input_keep_prob: 0.5
	dropout_output_keep_prob: 1.0
	num_layers: 4
	optimizer.name: Adam
	optimizer.params:
	epsilon: 0.0000008
	optimizer.learning_rate: 0.0001
	source.max_seq_len: 500
	source.reverse: false
	target.max_seq_len: 500

view raw nmt_large.yml hosted with ❤ by GitHub

Step-2) Data 전처리

모델을 Training시키기 위해서 Dataset의 Field를 numeric, string, temporal, ordinal, categorical등으로 분류를 해놓는다. 이에 대한 Output(Labeled)으로 Vega-lite문법에 맞추어 환경파일을 각각 만든다.

- sourcedata/*.sources 또는 *.targets 파일중에 dev.sources와 dev.targets를 보면 dataset의 index당 vega-lite spec을 매칭했다.

- vega-lite문법에서 data 필드만 제외한다.

- 총 3가 성격의 sources, targets를 준비한다.

+ dev

+ train

+ vocab

- dataset의 필드를 특별히 str<index>, num<index> 로 변환한다.

- 데이터 전처리를 위한 스크립트는 utils/*.py에 있다.

- 데이터 전처리 전의 실데이터는 testdata/*.json에 vega-lite의 다양한 spec은 examples/*.json 에 있다.

//dev.sources

[{"num0": 0, "num1": null, "str0": "Small", "str1": "AMERICAN AIRLINES", "str2": "AUSTIN-BERGSTROM INTL", "str3": "Approach", "str4": "Day", "str5": "None", "str6": "Unknown bird - small", "num2": 0, "str7": "MD-80", "str8": "8/1/95 0:00", "str9": "Texas", "num3": 0}]

[{"num0": 0, "num1": 140, "str0": "Small", "str1": "US AIRWAYS*", "str2": "CHARLOTTE/DOUGLAS INTL ARPT", "str3": "Approach", "str4": "Day", "str5": "None", "str6": "European starling", "num2": 0, "str7": "B-737-300", "str8": "7/19/99 0:00", "str9": "North Carolina", "num3": 0}]

//dev.targets

{"encoding": {"y": {"field": "str0", "type": "nominal", "selected": true, "primitiveType": "string"}, "x": {"type": "quantitative", "field": "num2"}}, "mark": "point"}

{"encoding": {"y": {"field": "str3", "type": "nominal", "selected": true, "primitiveType": "string"}, "x": {"type": "quantitative", "field": "num0"}}, "mark": "tick"}

Step-3) 모델 생성하기

모델 환경설정과 Training을 위한 source, target 데이터가 준비되었다면 모델을 생성한다.

- procject-directory 위치를 변경한다.

- bin/train.py를 수행을 위한 파라미터이다.

- vizmodel로 ckpt파일을 생성되므로 별도 지정을 해보자. (data2vis에 이미 생성된 ckpt가 존재한다.)

	# export DATA_DIR=project-directory
	export DATA_DIR=.
	python -m bin.train \
	--config_paths="
	$DATA_DIR/example_configs/nmt_large.yml,
	$DATA_DIR/example_configs/train_seq2seq.yml,
	$DATA_DIR/example_configs/text_metrics_bpe.yml" \
	--model_params "
	vocab_source: $DATA_DIR/sourcedata/vocab.source
	vocab_target: $DATA_DIR/sourcedata/vocab.target"
	--input_pipeline_train "
	class: ParallelTextInputPipeline
	params:
	source_delimiter: ''
	target_delimiter: ''
	source_files:
	- $DATA_DIR/sourcedata/train.sources
	target_files:
	- $DATA_DIR/sourcedata/train.targets"
	--input_pipeline_dev "
	class: ParallelTextInputPipeline
	params:
	source_delimiter: ''
	target_delimiter: ''
	source_files:
	- $DATA_DIR/sourcedata/dev.sources
	target_files:
	- $DATA_DIR/sourcedata/dev.targets"
	--batch_size 32 \
	--train_steps 100000 \
	# --output_dir $DATA_DIR/model_directory
	--output_dir $DATA_DIR/vizmodel

view raw create_model_ckpt.data2vis.sh hosted with ❤ by GitHub

Step-4) 추론 검증

Data2Vis는 Model을 미리 ckpt로 저장해 놓았고, WebDemo가 존재한다. webserver.py 는 Flask로 구성하여 간단하게 다음의 작업을 수행한다.

- 웹화면에서 Generate Example 버튼을 클릭하면 examplesdata/*.json에서 실제 dataset 을 random하게 읽어온다.

- 실데이터의 field를 str, num으로 바꾸어 inference에 넣은후 Vega-lite spec를 output로 받는다.

- 출력으로 나온 Vega-lite spec에 data 필드에 실데이터를 맵핑하여 최종 Vega-lite spec를 만들어 HTTP response를 한다.

	def run_inference():
	# tf.reset_default_graph()
	with graph.as_default():
	saver = tf.train.Saver()
	checkpoint_path = loaded_checkpoint_path
	if not checkpoint_path:
	checkpoint_path = tf.train.latest_checkpoint(model_dir_input)

	def session_init_op(_scaffold, sess):
	saver.restore(sess, checkpoint_path)
	tf.logging.info("Restored model from %s", checkpoint_path)

	scaffold = tf.train.Scaffold(init_fn=session_init_op)
	session_creator = tf.train.ChiefSessionCreator(scaffold=scaffold)
	with tf.train.MonitoredSession(
	session_creator=session_creator, hooks=hooks) as sess:
	sess.run([])
	# print(" ****** decoded string ", decoded_string)
	return decoded_string


	@app.route("/examplesdata")
	def examplesdata():
	source_data = data_utils.load_test_dataset()
	f_names = data_utils.generate_field_types(source_data)
	data_utils.forward_norm(source_data, destination_file, f_names)

	print('1 >>>>')
	print('source data: ', source_data)
	run_inference()

	# Perform post processing - backward normalization
	# decoded_post_array = []
	# for row in decoded_string:
	# decoded_post = data_utils.backward_norm(row, f_names)
	# decoded_post_array.append(decoded_post)

	decoded_string_post = data_utils.backward_norm(decoded_string[0], f_names)
	print('2 >>>>')
	print('f_names: ', f_names)
	print('decoded string post: ', decoded_string_post)

	try:
	vega_spec = json.loads(decoded_string_post)
	vega_spec["data"] = {"values": source_data}
	response_payload = {"vegaspec": vega_spec, "status": True}
	print('3 >>>>')
	print('response: ', response_payload)
	except JSONDecodeError as e:
	response_payload = {
	"status": False,
	"reason": "Model did not produce a valid vegalite JSON",
	"vegaspec": decoded_string
	}
	return jsonify(response_payload)

view raw webserver_snippet.py hosted with ❤ by GitHub

또는 command console에서 직접 수행해 볼 수 있다.

	python -m bin.infer \
	--tasks "
	- class: DecodeText
	params:
	delimiter: '' " \
	--model_dir vizmodel \
	--model_params "
	inference.beam_search.beam_width: 2" \
	--input_pipeline "
	class: ParallelTextInputPipeline
	params:
	source_delimiter: ''
	target_delimiter: ''
	source_files:
	- test.txt "

view raw inference.sh hosted with ❤ by GitHub

<참조>

- 구글
tf-seq2seq 튜토리얼
seq2seq NMT 튜토리얼

- Data2Vis 논문

'Deep Learning > Read Paper' 카테고리의 다른 글

[Data2Vis] tf-seq2seq를 사용하여 자동으로 차트 만들기 - 1 (0)	2018.08.16

posted by Peter Note

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

AI Convergence

Publication

Tag

Category

Recent Post

[Data2Vis] tf-seq2seq를 사용하여 자동으로 차트 만들기 - 2

'Deep Learning > Read Paper' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

AI Convergence

Publication

Tag

Search

Category

Recent Post

[Data2Vis] tf-seq2seq를 사용하여 자동으로 차트 만들기 - 2

'Deep Learning > Read Paper' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역