There haven’t been too many updates as of late because I’ve been working on a paper and presentation…and I’m happy to announce that both were accepted for SPIE’s Smart Structures and NDE for Industry 4.0 ! So if you’ve got time to kill in March, you should drop by and listen to me blather for 20 minutes about a Big Data problem and how an Actor-based architecture makes it easier to handle. Hope to see you there!
[Update Saturday, 17-Mar-18 19:10:33 UTC]: here’s the presentation + some bonus content if you missed it.
I’ve also added initial support for oversampling to correct imbalances in classes based on the SMOTE ( Synthetic Minority Over-sampling TEchnique) algorithm that may be of interest. Myriad was written to help with Region Of Interest (ROI) detection applications, which often have many more negative (i.e. not ROI) samples than positive (are ROI). The Myriad toolset currently uses RUS (Random UnderSampling) to try to correct this imbalance by basically randomly discarding members of the majority class to reduce the imbalance. The SMOTE algorithm tries to correct the imbalance by oversampling-generating synthetic members of the minority class.
I haven’t fully integrated SMOTE into the toolset yet but it’s available for the DIY-er crowd. My game plan is to make it an option available for cross-validation in Trainer. Intuitively to me it makes sense to apply SMOTE after we’ve split the data into testing and training subsets, i.e. we apply SMOTE to the training set rather than the original dataset. Otherwise our synthetic samples would make it into the testing set and we’d be testing models for their ability to classify real and “fake” data when what we really want is to test for ability to classify real data alone.