<h1>Preprocessing of the Aconity data</h1>
<p>First, the aconity data needs to be processed so that it is easily ingestible by the machine learning model.</p>
<p>This involves</p>
<lo>
<li>Reading and parsing the Aconity data files</li>
<li>Calibrating the data to real units</li>
<li>Masking and labelling the data</li>
<li>Recontructing a time column from the indeces</li>
<li>Saving the processed data for later use in the experiment</li>
</lo>

In [1]:
from MTPy.meltpool_tomography import MeltpoolTomography
from dask.distributed import Client, LocalCluster
import pickle
from tqdm.auto import tqdm

In [2]:
# Prepare a dask cluster and client
cluster = LocalCluster(n_workers=12, threads_per_worker=1)
client = Client(cluster)
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:34089/status,

0,1
Dashboard: http://127.0.0.1:34089/status,Workers: 12
Total threads: 12,Total memory: 15.27 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:43953,Workers: 12
Dashboard: http://127.0.0.1:34089/status,Total threads: 12
Started: Just now,Total memory: 15.27 GiB

0,1
Comm: tcp://127.0.0.1:42697,Total threads: 1
Dashboard: http://127.0.0.1:39477/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:46605,
Local directory: /tmp/dask-scratch-space/worker-kv94k4sa,Local directory: /tmp/dask-scratch-space/worker-kv94k4sa

0,1
Comm: tcp://127.0.0.1:36513,Total threads: 1
Dashboard: http://127.0.0.1:43571/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:35659,
Local directory: /tmp/dask-scratch-space/worker-u1r_1ar7,Local directory: /tmp/dask-scratch-space/worker-u1r_1ar7

0,1
Comm: tcp://127.0.0.1:42551,Total threads: 1
Dashboard: http://127.0.0.1:40447/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:44567,
Local directory: /tmp/dask-scratch-space/worker-f2_122y9,Local directory: /tmp/dask-scratch-space/worker-f2_122y9

0,1
Comm: tcp://127.0.0.1:44047,Total threads: 1
Dashboard: http://127.0.0.1:38767/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:33165,
Local directory: /tmp/dask-scratch-space/worker-q06r043l,Local directory: /tmp/dask-scratch-space/worker-q06r043l

0,1
Comm: tcp://127.0.0.1:46665,Total threads: 1
Dashboard: http://127.0.0.1:39777/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:36763,
Local directory: /tmp/dask-scratch-space/worker-pgd4rnzb,Local directory: /tmp/dask-scratch-space/worker-pgd4rnzb

0,1
Comm: tcp://127.0.0.1:45981,Total threads: 1
Dashboard: http://127.0.0.1:35991/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:36277,
Local directory: /tmp/dask-scratch-space/worker-nq82av5q,Local directory: /tmp/dask-scratch-space/worker-nq82av5q

0,1
Comm: tcp://127.0.0.1:43527,Total threads: 1
Dashboard: http://127.0.0.1:42027/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:46821,
Local directory: /tmp/dask-scratch-space/worker-eh5d023d,Local directory: /tmp/dask-scratch-space/worker-eh5d023d

0,1
Comm: tcp://127.0.0.1:36929,Total threads: 1
Dashboard: http://127.0.0.1:32829/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:37841,
Local directory: /tmp/dask-scratch-space/worker-be1ry_8r,Local directory: /tmp/dask-scratch-space/worker-be1ry_8r

0,1
Comm: tcp://127.0.0.1:39893,Total threads: 1
Dashboard: http://127.0.0.1:43143/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:37707,
Local directory: /tmp/dask-scratch-space/worker-xg7dapke,Local directory: /tmp/dask-scratch-space/worker-xg7dapke

0,1
Comm: tcp://127.0.0.1:39555,Total threads: 1
Dashboard: http://127.0.0.1:32807/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:41309,
Local directory: /tmp/dask-scratch-space/worker-8lf6c_2k,Local directory: /tmp/dask-scratch-space/worker-8lf6c_2k

0,1
Comm: tcp://127.0.0.1:42853,Total threads: 1
Dashboard: http://127.0.0.1:37649/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:36079,
Local directory: /tmp/dask-scratch-space/worker-fwtn1xyx,Local directory: /tmp/dask-scratch-space/worker-fwtn1xyx

0,1
Comm: tcp://127.0.0.1:44609,Total threads: 1
Dashboard: http://127.0.0.1:37923/status,Memory: 1.27 GiB
Nanny: tcp://127.0.0.1:36963,
Local directory: /tmp/dask-scratch-space/worker-tjnalsqx,Local directory: /tmp/dask-scratch-space/worker-tjnalsqx


In [3]:
# Create an MTPy object with the client
mpt = MeltpoolTomography(client=client)

In [None]:
# Read the aconity data
mpt.read_layers("data/pyrometry_data")

In [None]:
# Examine the raw data
raw_plot = mpt.scatter2d()
raw_plot

In [2]:
# Import the pre-prepared masks
with open("data/masks2.pkl", "rb") as f:
    masks = pickle.load(f)

In [None]:
# Apply the masks
mpt.rotate_xy(-45.0)
mpt.mask_xyrectangles(masks)
mpt.rotate_xy(45.0)

In [None]:
# Compare the raw and masked data
mpt.scatter2d() + raw_plot

In [9]:
# Add a time column to the data
delta_time = 0.0001  # 10kHz sampling rate
mpt.data["time"] = mpt.data.index * delta_time

In [None]:
# Save data for use in the experiment
mpt.save("checkpoint.mtp")

In [7]:
mpt.data

Unnamed: 0_level_0,x,y,z,t,sample,time
npartitions=46,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
,float64,float64,float64,float64,int64,float64
,...,...,...,...,...,...
...,...,...,...,...,...,...
,...,...,...,...,...,...
,...,...,...,...,...,...


In [8]:
samples = mpt.data["sample"].unique().compute()

for future in tqdm(
    (
        mpt.data.loc[mpt.data["sample"] == sample_num]
        .drop("sample", axis=1)
        .astype("float32")
        .repartition(
            partition_size="10MB"
        )  # Break up into small chunks ingestible by an ML model
        .to_parquet(
            f"sample_X/{sample_num}",
            compression="lz4",
            write_metadata_file=True,
            compute=False,
        )
        for sample_num in samples
    ),
    total=len(samples),
):
    future.compute()

  0%|          | 0/81 [00:00<?, ?it/s]

Task exception was never retrieved
future: <Task finished name='Task-1318357' coro=<Client._gather.<locals>.wait() done, defined at /home/cianh/Programming/Git_Projects/Aconity_ML_Test/.venv/lib/python3.11/site-packages/distributed/client.py:2212> exception=AllExit()>
Traceback (most recent call last):
  File "/home/cianh/Programming/Git_Projects/Aconity_ML_Test/.venv/lib/python3.11/site-packages/distributed/client.py", line 2221, in wait
    raise AllExit()
distributed.client.AllExit
Task exception was never retrieved
future: <Task finished name='Task-1318370' coro=<Client._gather.<locals>.wait() done, defined at /home/cianh/Programming/Git_Projects/Aconity_ML_Test/.venv/lib/python3.11/site-packages/distributed/client.py:2212> exception=AllExit()>
Traceback (most recent call last):
  File "/home/cianh/Programming/Git_Projects/Aconity_ML_Test/.venv/lib/python3.11/site-packages/distributed/client.py", line 2221, in wait
    raise AllExit()
distributed.client.AllExit
Task exception was n

KeyboardInterrupt: 