Updated readme use effectively as a lab journal

This commit is contained in:
2024-06-18 14:09:31 +01:00
parent b113c21862
commit cb74559222
7 changed files with 81 additions and 2 deletions

View File

@@ -6,13 +6,32 @@ This is a testing sandbox for developing various methods of injecting symbolic k
## Experiment 1 - Semantically weighted loss
### Planning
- Simple semantic loss based on intuitively noticeable properties of numeric characters
- Makes use of manually made reward matrices to weight rewards depending on how "close" the model was
- Very rudimentary example of semantic loss
- Appeared to work perfectly
## Experiment 2 - Semantic loss function
### Results
- Training loss
![Training loss plot for experiment 1](./results/Experiment1/train_loss.png)
- Validation loss
![Validation loss plot for experiment 1](./results/Experiment1/val_loss.png)
- Test loss
![Test loss plot for experiment 1](./results/Experiment1/test_loss.png)
### Conclusions
- Seems to have worked
- Clear improvement in training rate with semantics added
- Similarity cross entropy in particular shows clear signs in validation loss of being on a similar complementary CDF to the normal cross-entropy loss, but training faster
- Interestingly: the "garbage" cross entropy seems to have also produced a very good result! This is likely because it hasn't been normalized to 1, so it may be simply amplifying the gradient by random amounts at all times. Basically acting as a fuzzy gradient booster.
- I would consider this experiment a success, with some interesting open questions remaining worth further examination
## Experiment 2 - Dataset qualitative characteristic derived semantic loss functions
### Planning
- Makes use of known physics equations that partially describe the problem to guide the model
- Reduces the need for the model to learn known physics, allowing it to focus on learning the unknown physics
- Should accelerate training
@@ -25,3 +44,63 @@ This is a testing sandbox for developing various methods of injecting symbolic k
- [Molecular Properties](https://www.kaggle.com/datasets/burakhmmtgl/predict-molecular-properties)
- [Nuclear Binding Energy](https://www.kaggle.com/datasets/iitm21f1003401/nuclear-binding-energy)
- [Body Fat Prediction](https://www.kaggle.com/datasets/fedesoriano/body-fat-prediction-dataset)
- Decided to use Molecular Properties dataset as it is quite familiar to me
- Training with semantics added to renationship between molecular energy and differential electronegativity
- Semantics being injected are:
- These values should be positively corellated
- These values should be weighted towards a high r^2 with an adaptive penalty
- Multiple attempts carried out:
- Simple penalties. Variations tested include:
```math
Loss = ( Softplus( -m ) + 1 ) * Smooth_L1_Loss
```
```math
Loss = ( Relu( -m ) + 1 ) * Smooth_L1_Loss
```
```math
Loss = ( \frac{1}{Sech(|r|)} + 1 ) * Smooth_L1_Loss
```
```math
Loss = ( {r}^2 + 1) * Smooth_L1_Loss
```
- Adaptive, self training penalties tuned by various methods. Best method found was optimisation by a random forest regressor. These tunable variants include:
```math
Loss = ( Softplus( \alpha * -m ) + 1 ) * Smooth_L1_Loss
```
```math
Loss = ( Relu( \alpha * -m ) + 1 ) * Smooth_L1_Loss
```
```math
Loss = ( \frac{ 1 }{ Sech( \alpha * |r| ) } + 1 ) * Smooth_L1_Loss
```
```math
Loss = ( \alpha * { r }^2 + 1) * Smooth_L1_Loss
```
- Final adaptive semantic loss function tested was the following:
```math
Loss = ( \alpha * { r }^2 + 1) * ( \frac{ 1 }{ \beta } * log( 1 + exp( \beta * \gamma * -m ) ) + 1 ) * Smooth_L1_Loss
```
### Results
- Training loss
![Training loss plot for experiment 2](./results/Experiment2/train_loss.png)
- Validation loss
![Validation loss plot for experiment 2](./results/Experiment2/val_loss.png)
- Test loss
![Test loss plot for experiment 2](./results/Experiment2/test_loss.png)
### Conclusions
- Method didn't appear to work too well because:
- Simple loss functions tested were likely suboptimal for effectively influencing model
- Guesses at parameters in simple functions need to be optimised, basically turning this into a hyperparameter optimisation problem, which defeats the purpose of semantic loss
- Adaptive, ML based loss functions do not appear to be converging quickly enough to train the model faster than the normal loss functions
- For this reason, I would conclude this experiment as a failure
## Experiment 3 - Physics informed semantic loss functions
### Planning
- Attempt to use more mathematically rigorous, formalised, and literature based approach to semantic loss functions.
- Although understudied, semantic loss has had a lot of theoretical maths done exploring the concept, and we need to figure out how to put this into code.

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 343 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 140 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 93 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB