Kedro?なにそれ?おいしいの???
僕もはじめてさわるのでよくわかりません。
よくわからないので、ドキュメントやらぐぐってでてきた記事を参考にチュートリアルしてたけど動かない。
$ kedro run 2020-11-24 12:23:27,840 - root - INFO - ** Kedro project get_started Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/kedro/io/core.py", line 417, in parse_dataset_definition class_obj = next(obj for obj in trials if obj is not None) StopIteration During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/kedro/io/core.py", line 149, in from_config config, load_version, save_version File "/opt/conda/lib/python3.7/site-packages/kedro/io/core.py", line 419, in parse_dataset_definition raise DataSetError("Class `{}` not found.".format(class_obj)) kedro.io.core.DataSetError: Class `pandas.CSVDataSet` not found. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/conda/bin/kedro", line 10, in <module> sys.exit(main()) File "/opt/conda/lib/python3.7/site-packages/kedro/cli/cli.py", line 638, in main ("Project specific commands", project_groups), File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/app/get_started/kedro_cli.py", line 278, in run pipeline_name=pipeline, File "/opt/conda/lib/python3.7/site-packages/kedro/context/context.py", line 482, in run save_version=save_version, journal=journal, load_versions=load_versions File "/opt/conda/lib/python3.7/site-packages/kedro/context/context.py", line 245, in _get_catalog conf_catalog, conf_creds, save_version, journal, load_versions File "/opt/conda/lib/python3.7/site-packages/kedro/context/context.py", line 269, in _create_catalog load_versions=load_versions, File "/opt/conda/lib/python3.7/site-packages/kedro/io/data_catalog.py", line 300, in from_config ds_name, ds_config, load_versions.get(ds_name), save_version File "/opt/conda/lib/python3.7/site-packages/kedro/io/core.py", line 154, in from_config "for DataSet `{}`:\n{}".format(name, str(ex)) kedro.io.core.DataSetError: An exception occurred when parsing config for DataSet `example_iris_data`: Class `pandas.CSVDataSet` not found.
エラーでググってみたけど、よくわからず。
意味わからんので、とりあえずインタラクティブモードで動くかなーとか思って確認しようとした。
$ python Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> from kedro.io import DataCatalog >>> from kedro.extras.datasets.pandas import ( ... CSVDataSet, ... SQLTableDataSet, ... SQLQueryDataSet, ... ParquetDataSet, ... ) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/conda/lib/python3.7/site-packages/kedro/extras/datasets/pandas/__init__.py", line 35, in <module> from .gbq_dataset import GBQTableDataSet # NOQA File "/opt/conda/lib/python3.7/site-packages/kedro/extras/datasets/pandas/gbq_dataset.py", line 37, in <module> from google.cloud import bigquery File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/__init__.py", line 35, in <module> from google.cloud.bigquery.client import Client File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 57, in <module> from google.cloud.bigquery import _pandas_helpers File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 25, in <module> from google.cloud import bigquery_storage_v1beta1 File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1beta1/__init__.py", line 25, in <module> from google.cloud.bigquery_storage_v1beta1 import types File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1beta1/types.py", line 23, in <module> from google.cloud.bigquery_storage_v1beta1.proto import arrow_pb2 File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1beta1/proto/arrow_pb2.py", line 20, in <module> create_key=_descriptor._internal_create_key, AttributeError: module 'google.protobuf.descriptor' has no attribute '_internal_create_key' >>> from kedro.extras.datasets.pandas import CSVDataSet Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/conda/lib/python3.7/site-packages/kedro/extras/datasets/pandas/__init__.py", line 35, in <module> from .gbq_dataset import GBQTableDataSet # NOQA File "/opt/conda/lib/python3.7/site-packages/kedro/extras/datasets/pandas/gbq_dataset.py", line 37, in <module> from google.cloud import bigquery File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/__init__.py", line 35, in <module> from google.cloud.bigquery.client import Client File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 57, in <module> from google.cloud.bigquery import _pandas_helpers File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 25, in <module> from google.cloud import bigquery_storage_v1beta1 File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1beta1/__init__.py", line 25, in <module> from google.cloud.bigquery_storage_v1beta1 import types File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1beta1/types.py", line 23, in <module> from google.cloud.bigquery_storage_v1beta1.proto import arrow_pb2 File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1beta1/proto/arrow_pb2.py", line 20, in <module> create_key=_descriptor._internal_create_key, AttributeError: module 'google.protobuf.descriptor' has no attribute '_internal_create_key' >>>
違うエラーがでてきた???はて???
ぐぐったら下記記事が
試す
$ pip install --upgrade protobuf
もう一度試す
$ python Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> from kedro.extras.datasets.pandas import CSVDataSet >>> df = CSVDataSet(filepath="./data/01_raw/iris.csv"), >>> df (<kedro.extras.datasets.pandas.csv_dataset.CSVDataSet object at 0x7f791f4371d0>,)
はて?エラーはでなくなった
$ kedro run 2020-11-24 12:40:35,101 - root - INFO - ** Kedro project get_started 2020-11-24 12:40:35,337 - kedro.io.data_catalog - INFO - Loading data from `example_iris_data` (CSVDataSet)... 2020-11-24 12:40:35,347 - kedro.io.data_catalog - INFO - Loading data from `params:example_test_data_ratio` (MemoryDataSet)... 2020-11-24 12:40:35,350 - kedro.pipeline.node - INFO - Running node: split_data([example_iris_data,params:example_test_data_ratio]) -> [example_test_x,example_test_y,example_train_x,example_train_y] 2020-11-24 12:40:35,374 - kedro.io.data_catalog - INFO - Saving data to `example_train_x` (MemoryDataSet)... 2020-11-24 12:40:35,376 - kedro.io.data_catalog - INFO - Saving data to `example_train_y` (MemoryDataSet)... 2020-11-24 12:40:35,378 - kedro.io.data_catalog - INFO - Saving data to `example_test_x` (MemoryDataSet)... 2020-11-24 12:40:35,381 - kedro.io.data_catalog - INFO - Saving data to `example_test_y` (MemoryDataSet)... 2020-11-24 12:40:35,383 - kedro.runner.sequential_runner - INFO - Completed 1 out of 4 tasks 2020-11-24 12:40:35,386 - kedro.io.data_catalog - INFO - Loading data from `example_train_x` (MemoryDataSet)... 2020-11-24 12:40:35,388 - kedro.io.data_catalog - INFO - Loading data from `example_train_y` (MemoryDataSet)... 2020-11-24 12:40:35,391 - kedro.io.data_catalog - INFO - Loading data from `parameters` (MemoryDataSet)... 2020-11-24 12:40:35,393 - kedro.pipeline.node - INFO - Running node: train_model([example_train_x,example_train_y,parameters]) -> [example_model] 2020-11-24 12:40:36,164 - kedro.io.data_catalog - INFO - Saving data to `example_model` (MemoryDataSet)... 2020-11-24 12:40:36,169 - kedro.runner.sequential_runner - INFO - Completed 2 out of 4 tasks 2020-11-24 12:40:36,172 - kedro.io.data_catalog - INFO - Loading data from `example_model` (MemoryDataSet)... 2020-11-24 12:40:36,175 - kedro.io.data_catalog - INFO - Loading data from `example_test_x` (MemoryDataSet)... 2020-11-24 12:40:36,178 - kedro.pipeline.node - INFO - Running node: predict([example_model,example_test_x]) -> [example_predictions] 2020-11-24 12:40:36,197 - kedro.io.data_catalog - INFO - Saving data to `example_predictions` (MemoryDataSet)... 2020-11-24 12:40:36,200 - kedro.runner.sequential_runner - INFO - Completed 3 out of 4 tasks 2020-11-24 12:40:36,204 - kedro.io.data_catalog - INFO - Loading data from `example_predictions` (MemoryDataSet)... 2020-11-24 12:40:36,206 - kedro.io.data_catalog - INFO - Loading data from `example_test_y` (MemoryDataSet)... 2020-11-24 12:40:36,208 - kedro.pipeline.node - INFO - Running node: report_accuracy([example_predictions,example_test_y]) -> None 2020-11-24 12:40:36,210 - iris_test.pipelines.data_science.nodes - INFO - Model accuracy on test set: 96.67% 2020-11-24 12:40:36,213 - kedro.runner.sequential_runner - INFO - Completed 4 out of 4 tasks 2020-11-24 12:40:36,214 - kedro.runner.sequential_runner - INFO - Pipeline execution completed successfully.
こっちもうまくいった。なんだったんじゃ