Draco

A collection of end-to-end solutions for machine learning problems commonly found in monitoring wind energy production system.
https://github.com/sintel-dev/Draco

Category: Renewable Energy
Sub Category: Wind Energy

Keywords

classification machine-learning time-series

Keywords from Contributors

generative-adversarial-network gan anomaly-detection benchmarking orion signals unsupervised-learning generative-models synthetic-data tabular-data

Last synced: about 3 hours ago
JSON representation

Repository metadata

State space and deep generative models for time series.

Host: GitHub
URL: https://github.com/sintel-dev/Draco
Owner: sintel-dev
License: mit
Created: 2018-09-27T14:50:42.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2023-07-31T15:42:39.000Z (almost 2 years ago)
Last Synced: 2025-06-15T14:33:50.220Z (15 days ago)
Topics: classification, machine-learning, time-series
Language: Jupyter Notebook
Homepage: https://sintel-dev.github.io/Draco
Size: 15 MB
Stars: 54
Watchers: 7
Forks: 19
Open Issues: 6
Releases: 3
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.md
- Contributing: CONTRIBUTING.rst
- License: LICENSE

Draco

License: MIT
Documentation: https://sintel-dev.github.io/Draco
Homepage: https://github.com/sintel-dev/Draco

Overview

The Draco project is a collection of end-to-end solutions for machine learning problems
commonly found in time series monitoring systems. Most tasks utilize sensor data
emanating from monitoring systems. We utilize the foundational innovations developed for
automation of machine Learning at Data to AI Lab at MIT.

The salient aspects of this customized project are:

A set of ready to use, well tested pipelines for different machine learning tasks. These are
vetted through testing across multiple publicly available datasets for the same task.
An easy interface to specify the task, pipeline, and generate results and summarize them.
A production ready, deployable pipeline.
An easy interface to tune pipelines using Bayesian Tuning and Bandits library.
A community oriented infrastructure to incorporate new pipelines.
A robust continuous integration and testing infrastructure.
A learning database recording all past outcomes --> tasks, pipelines, outcomes.

Resources

Install

Requirements

Draco has been developed and runs on Python 3.6, 3.7 and 3.8.

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering
with other software installed in the system where you are trying to run Draco.

Download and Install

Draco can be installed locally using pip with
the following command:

pip install draco-ml

This will pull and install the latest stable release from PyPi.

If you want to install from source or contribute to the project please read the
Contributing Guide.

Data Format

The minimum input expected by the Draco system consists of the following two elements,
which need to be passed as pandas.DataFrame objects:

Target Times

A table containing the specification of the problem that we are solving, which has three
columns:

turbine_id: Unique identifier of the turbine which this label corresponds to.
cutoff_time: Time associated with this target
target: The value that we want to predict. This can either be a numerical value or a
categorical label. This column can also be skipped when preparing data that will be used
only to make predictions and not to fit any pipeline.

	turbine_id	cutoff_time	target
0	T1	2001-01-02 00:00:00	0
1	T1	2001-01-03 00:00:00	1
2	T2	2001-01-04 00:00:00	0

Readings

A table containing the signal data from the different sensors, with the following columns:

turbine_id: Unique identifier of the turbine which this reading comes from.
signal_id: Unique identifier of the signal which this reading comes from.
timestamp (datetime): Time where the reading took place, as a datetime.
value (float): Numeric value of this reading.

	turbine_id	signal_id	timestamp	value
0	T1	S1	2001-01-01 00:00:00	1
1	T1	S1	2001-01-01 12:00:00	2
2	T1	S1	2001-01-02 00:00:00	3
3	T1	S1	2001-01-02 12:00:00	4
4	T1	S1	2001-01-03 00:00:00	5
5	T1	S1	2001-01-03 12:00:00	6
6	T1	S2	2001-01-01 00:00:00	7
7	T1	S2	2001-01-01 12:00:00	8
8	T1	S2	2001-01-02 00:00:00	9
9	T1	S2	2001-01-02 12:00:00	10
10	T1	S2	2001-01-03 00:00:00	11
11	T1	S2	2001-01-03 12:00:00	12

Turbines

Optionally, a third table can be added containing metadata about the turbines.
The only requirement for this table is to have a turbine_id field, and it can have
an arbitraty number of additional fields.

	turbine_id	manufacturer	...	...	...
0	T1	Siemens	...	...	...
1	T2	Siemens	...	...	...

CSV Format

A part from the in-memory data format explained above, which is limited by the memory
allocation capabilities of the system where it is run, Draco is also prepared to
load and work with data stored as a collection of CSV files, drastically increasing the amount
of data which it can work with. Further details about this format can be found in the
project documentation site.

Quickstart

In this example we will load some demo data and classify it using a Draco Pipeline.

1. Load and split the demo data

The first step is to load the demo data.

For this, we will import and call the draco.demo.load_demo function without any arguments:

from draco.demo import load_demo

target_times, readings = load_demo()

The returned objects are:

target_times: A pandas.DataFrame with the target_times table data:

  turbine_id cutoff_time  target
0       T001  2013-01-12       0
1       T001  2013-01-13       0
2       T001  2013-01-14       0
3       T001  2013-01-15       1
4       T001  2013-01-16       0

readings: A pandas.DataFrame containing the time series data in the format explained above.

  turbine_id signal_id  timestamp  value
0       T001       S01 2013-01-10  323.0
1       T001       S02 2013-01-10  320.0
2       T001       S03 2013-01-10  284.0
3       T001       S04 2013-01-10  348.0
4       T001       S05 2013-01-10  273.0

Once we have loaded the target_times and before proceeding to training any Machine Learning
Pipeline, we will have split them in 2 partitions for training and testing.

In this case, we will split them using the train_test_split function from scikit-learn,
but it can be done with any other suitable tool.

from sklearn.model_selection import train_test_split

train, test = train_test_split(target_times, test_size=0.25, random_state=0)

Notice how we are only splitting the target_times data and not the readings.
This is because the pipelines will later on take care of selecting the parts of the
readings table needed for the training based on the information found inside
the train and test inputs.

Additionally, if we want to calculate a goodness-of-fit score later on, we can separate the
testing target values from the test table by popping them from it:

test_targets = test.pop('target')

2. Exploring the available Pipelines

Once we have the data ready, we need to find a suitable pipeline.

The list of available Draco Pipelines can be obtained using the draco.get_pipelines
function.

from draco import get_pipelines

pipelines = get_pipelines()

The returned pipeline variable will be list containing the names of all the pipelines
available in the Draco system:

['lstm',
 'lstm_with_unstack',
 'double_lstm',
 'double_lstm_with_unstack']

For the rest of this tutorial, we will select and use the pipeline
lstm_with_unstack as our template.

pipeline_name = 'lstm_with_unstack'

3. Fitting the Pipeline

Once we have loaded the data and selected the pipeline that we will use, we have to
fit it.

For this, we will create an instance of a DracoPipeline object passing the name
of the pipeline that we want to use:

from draco.pipeline import DracoPipeline

pipeline = DracoPipeline(pipeline_name)

And then we can directly fit it to our data by calling its fit method and passing in the
training target_times and the complete readings table:

pipeline.fit(train, readings)

4. Make predictions

After fitting the pipeline, we are ready to make predictions on new data by calling the
pipeline.predict method passing the testing target_times and, again, the complete
readings table.

predictions = pipeline.predict(test, readings)

5. Evaluate the goodness-of-fit

Finally, after making predictions we can evaluate how good the prediction was
using any suitable metric.

from sklearn.metrics import f1_score

f1_score(test_targets, predictions)

What's next?

For more details about Draco and all its possibilities and features, please check the
project documentation site
Also do not forget to have a look at the tutorials!

Owner metadata

Name: The Signal Intelligence Project
Login: sintel-dev
Email: [email protected]
Kind: organization
Description: Systems and tools to design, develop and deploy AI applications on top of signals.
Website: https://dai.lids.mit.edu/
Location:
Twitter:
Company:
Icon url: https://avatars.githubusercontent.com/u/13336772?v=4
Repositories: 11
Last ynced at: 2023-03-06T08:20:19.533Z
Profile URL: https://github.com/sintel-dev

GitHub Events

Total

Watch event: 1
Fork event: 1

Last Year

Watch event: 1
Fork event: 1

Committers metadata

Last synced: 5 days ago

Total Commits: 236
Total Committers: 7
Avg Commits per committer: 33.714
Development Distribution Score (DDS): 0.597

Commits in past year: 10
Committers in past year: 1
Avg Commits per committer in past year: 10.0
Development Distribution Score (DDS) in past year: 0.0

Name	Email	Commits
Carles Sala	c**s@p**m	95
Plamen Valentinov Kolev	p**r@g**m	90
sarahmish	s**h@g**m	31
joanvaquer	j**4@g**m	14
Kalyan Veeramachaneni	k**u@g**m	4
Fletcher	O**6@e**l	1
Jones	Z**7@e**l	1

Committer domains:

gmx.com: 1
pythiac.com: 1

Issue and Pull Request metadata

Last synced: 1 day ago

Total issues: 23
Total pull requests: 54
Average time to close issues: 2 months
Average time to close pull requests: 9 days
Total issue authors: 4
Total pull request authors: 7
Average comments per issue: 0.26
Average comments per pull request: 0.35
Merged pull request: 53
Bot issues: 0
Bot pull requests: 0

Past year issues: 0
Past year pull requests: 0
Past year average time to close issues: N/A
Past year average time to close pull requests: N/A
Past year issue authors: 0
Past year pull request authors: 0
Past year average comments per issue: 0
Past year average comments per pull request: 0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/sintel-dev/Draco

Top Issue Authors

csala (12)
sarahmish (6)
pvk-developer (4)
fletcherbranch (1)

Top Pull Request Authors

csala (19)
sarahmish (14)
pvk-developer (14)
kveerama (4)
fletcherbranch (1)
robertfjones (1)
joanvaquer (1)

Top Issue Labels

enhancement (9)
housekeeping (2)
bug (2)

Top Pull Request Labels

Package metadata

Total packages: 1
Total downloads:
- pypi: 88 last-month
Total dependent packages: 0
Total dependent repositories: 1
Total versions: 10
Total maintainers: 3

pypi.org: draco-ml

AutoML for Time Series.

Homepage: https://github.com/sintel-dev/Draco
Documentation: https://draco-ml.readthedocs.io/
Licenses: MIT license
Latest release: 0.3.0 (published almost 2 years ago)
Last Synced: 2025-06-29T21:31:31.311Z (1 day ago)
Versions: 10
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 88 Last month
Rankings:
- Dependent packages count: 7.31%
- Forks count: 8.924%
- Stargazers count: 9.499%
- Average: 14.787%
- Dependent repos count: 22.088%
- Downloads: 26.116%
Maintainers (3)
- mit_dai_lab
- liudy
- smish

Dependencies

setup.py pypi

baytune >=0.4.0,<0.5
dask >=2.6.0,<3
fsspec >=0.8.5,<0.9
mlblocks >=0.4.0,<0.5
mlprimitives >=0.3.2,<0.4
numpy >=1.16.0,<1.21.0
pandas >=1,<2
partd >=1.1.0,<2
pymongo >=3.7.2,<4
scikit-learn >=0.21
scipy >=1.0.1,<2
tabulate >=0.8.3,<0.9
tensorflow >=2,<2.3
tqdm <4.50.0,>=4.36.1
xlsxwriter >=1.3.6<1.4

.github/workflows/docs.yml actions

actions/checkout v2 composite
actions/setup-python v1 composite
peaceiris/actions-gh-pages v3 composite

.github/workflows/tests.yml actions

actions/checkout v1 composite
actions/setup-python v2 composite

docker/Dockerfile docker

python 3.7 build

Score: 10.562043288326453

Draco

Keywords

Keywords from Contributors

Repository metadata

README.md

Draco

Overview

Resources

Install

Requirements

Download and Install

Data Format

Target Times

Readings

Turbines

CSV Format

Quickstart

1. Load and split the demo data

2. Exploring the available Pipelines

3. Fitting the Pipeline

4. Make predictions

5. Evaluate the goodness-of-fit

What's next?

Owner metadata

GitHub Events

Total

Last Year

Committers metadata

Committer domains:

Issue and Pull Request metadata

Top Issue Authors

Top Pull Request Authors

Top Issue Labels

Top Pull Request Labels

Package metadata

pypi.org: draco-ml

Dependencies