diff --git a/README.md b/README.md index d873ed0..52ab99a 100644 --- a/README.md +++ b/README.md @@ -6,13 +6,32 @@ This is a testing sandbox for developing various methods of injecting symbolic k ## Experiment 1 - Semantically weighted loss +### Planning + - Simple semantic loss based on intuitively noticeable properties of numeric characters - Makes use of manually made reward matrices to weight rewards depending on how "close" the model was - Very rudimentary example of semantic loss -- Appeared to work perfectly -## Experiment 2 - Semantic loss function +### Results +- Training loss +![Training loss plot for experiment 1](./results/Experiment1/train_loss.png) +- Validation loss +![Validation loss plot for experiment 1](./results/Experiment1/val_loss.png) +- Test loss +![Test loss plot for experiment 1](./results/Experiment1/test_loss.png) + +### Conclusions + +- Seems to have worked + - Clear improvement in training rate with semantics added + - Similarity cross entropy in particular shows clear signs in validation loss of being on a similar complementary CDF to the normal cross-entropy loss, but training faster +- Interestingly: the "garbage" cross entropy seems to have also produced a very good result! This is likely because it hasn't been normalized to 1, so it may be simply amplifying the gradient by random amounts at all times. Basically acting as a fuzzy gradient booster. +- I would consider this experiment a success, with some interesting open questions remaining worth further examination + +## Experiment 2 - Dataset qualitative characteristic derived semantic loss functions + +### Planning - Makes use of known physics equations that partially describe the problem to guide the model - Reduces the need for the model to learn known physics, allowing it to focus on learning the unknown physics - Should accelerate training @@ -25,3 +44,63 @@ This is a testing sandbox for developing various methods of injecting symbolic k - [Molecular Properties](https://www.kaggle.com/datasets/burakhmmtgl/predict-molecular-properties) - [Nuclear Binding Energy](https://www.kaggle.com/datasets/iitm21f1003401/nuclear-binding-energy) - [Body Fat Prediction](https://www.kaggle.com/datasets/fedesoriano/body-fat-prediction-dataset) +- Decided to use Molecular Properties dataset as it is quite familiar to me +- Training with semantics added to renationship between molecular energy and differential electronegativity + - Semantics being injected are: + - These values should be positively corellated + - These values should be weighted towards a high r^2 with an adaptive penalty + - Multiple attempts carried out: + - Simple penalties. Variations tested include: + ```math + Loss = ( Softplus( -m ) + 1 ) * Smooth_L1_Loss + ``` + ```math + Loss = ( Relu( -m ) + 1 ) * Smooth_L1_Loss + ``` + ```math + Loss = ( \frac{1}{Sech(|r|)} + 1 ) * Smooth_L1_Loss + ``` + ```math + Loss = ( {r}^2 + 1) * Smooth_L1_Loss + ``` + - Adaptive, self training penalties tuned by various methods. Best method found was optimisation by a random forest regressor. These tunable variants include: + ```math + Loss = ( Softplus( \alpha * -m ) + 1 ) * Smooth_L1_Loss + ``` + ```math + Loss = ( Relu( \alpha * -m ) + 1 ) * Smooth_L1_Loss + ``` + ```math + Loss = ( \frac{ 1 }{ Sech( \alpha * |r| ) } + 1 ) * Smooth_L1_Loss + ``` + ```math + Loss = ( \alpha * { r }^2 + 1) * Smooth_L1_Loss + ``` + - Final adaptive semantic loss function tested was the following: + ```math + Loss = ( \alpha * { r }^2 + 1) * ( \frac{ 1 }{ \beta } * log( 1 + exp( \beta * \gamma * -m ) ) + 1 ) * Smooth_L1_Loss + ``` + +### Results + +- Training loss +![Training loss plot for experiment 2](./results/Experiment2/train_loss.png) +- Validation loss +![Validation loss plot for experiment 2](./results/Experiment2/val_loss.png) +- Test loss +![Test loss plot for experiment 2](./results/Experiment2/test_loss.png) + +### Conclusions + +- Method didn't appear to work too well because: + - Simple loss functions tested were likely suboptimal for effectively influencing model + - Guesses at parameters in simple functions need to be optimised, basically turning this into a hyperparameter optimisation problem, which defeats the purpose of semantic loss + - Adaptive, ML based loss functions do not appear to be converging quickly enough to train the model faster than the normal loss functions +- For this reason, I would conclude this experiment as a failure + +## Experiment 3 - Physics informed semantic loss functions + +### Planning + +- Attempt to use more mathematically rigorous, formalised, and literature based approach to semantic loss functions. +- Although understudied, semantic loss has had a lot of theoretical maths done exploring the concept, and we need to figure out how to put this into code. diff --git a/results/Experiment1/test_loss.png b/results/Experiment1/test_loss.png new file mode 100644 index 0000000..f92db90 Binary files /dev/null and b/results/Experiment1/test_loss.png differ diff --git a/results/Experiment1/train_loss.png b/results/Experiment1/train_loss.png new file mode 100644 index 0000000..ef138c7 Binary files /dev/null and b/results/Experiment1/train_loss.png differ diff --git a/results/Experiment1/val_loss.png b/results/Experiment1/val_loss.png new file mode 100644 index 0000000..ab531a2 Binary files /dev/null and b/results/Experiment1/val_loss.png differ diff --git a/results/Experiment2/test_loss.png b/results/Experiment2/test_loss.png new file mode 100644 index 0000000..d637a73 Binary files /dev/null and b/results/Experiment2/test_loss.png differ diff --git a/results/Experiment2/train_loss.png b/results/Experiment2/train_loss.png new file mode 100644 index 0000000..75e7665 Binary files /dev/null and b/results/Experiment2/train_loss.png differ diff --git a/results/Experiment2/val_loss.png b/results/Experiment2/val_loss.png new file mode 100644 index 0000000..425de14 Binary files /dev/null and b/results/Experiment2/val_loss.png differ