As previously promised I’ve updated the Myriad samples with a concurrent demo that more or less follows the same workflow as the 60 day technical review video. Included in the project for your convenience is a pre-trained machine learning model that’s learned to recognize indications of structural damage in C-scan maps. Although I originally trained it on ultrasonic sensor data, it did show some promise when I tried it out with microwave and X-ray data.
You’d most likely want to train it a bit more on data representative of your inspection to get the best results.
It’s a little amazing to me that ~250 lines of Java gets you a concurrent fault-tolerant damage detection app, but that’s very much courtesy the very excellent Akka framework. Well worth a look for your next project!
As promised, I’ve put together a short video on how to build your own cluster for sensor data analysis with Myriad. Myriad uses Akka’s Remoting feature to (hopefully) make it relatively straightforward to link up several computers in a DIY processing pipeline. If you’re using the GUI tools I wrote for NASA, you just start the GUI on each machine, then point the next machine in the processing pipeline to the remote machine and you’re good to go.
The main use cases I see for this feature are for handling big datasets and for resource-intensive ROI code. If you’ve got enough data to analyze that just reading it might take up all your available RAM, or if your ROI code needs all the CPU/GPU/RAM it can get, you can split the processing up among multiple systems. Have one machine responsible for reading the data and sending subsets to an analysis machine, which is then free to use all its resources for your ROI code. The analysis machine does its work and sends the results to another machine for reporting and compiling the results, and so on.
Experience test-driving NDIToolbox in the field (or the depot / hangar to be more accurate) showed me that there is a ton of NDE sensor data out there and that it can take forever and a fortune to analyze manually. I’d experimented with algorithms to automatically flag possible indications of damage in the data when I was working on NDIToolbox and a project for the Air Force, but I’d never really gone beyond the proof of concept stage. Until recently I didn’t have a good handle on how to make it multi-processor and/or distributed, either – sitting in a depot for an hour waiting for a file to load has taught me single-threaded analysis isn’t feasible.
Enter Akka. I’ve written code in Spark and Storm, but Akka seems to impose fewer restrictions on development. Throw in some handmade pyramid, sliding window, and convolution implementations; add Apache Mahout for machine learning; and a Fault-Tolerant NDE Data Reduction Framework is born.
Two months or so in to the project and there’s a rather long demonstration of “Myriad” in action available, in which we train a machine learning model to automatically detect indications of damage in ultrasonic sensor data. I hope to have a few more demos of calling external apps or building a Myriad P2P cluster soon, stay tuned!
If you haven’t updated NDIToolbox since last time, it’s worth doing it now. Here’s where we are today.
- Better support for UTWin data files, including preliminary support for compressed waveforms. That last one’s still highly experimental but let me know if it works for you; I don’t have access to a lot of sample data files for testing.
- Squashed bugs, which includes better handling of memory errors running a plugin.
- (Developers) A new report module which provides a quick-and-easy way of generating simple PDF reports.
Source code has already been updated, binaries will follow shortly. I’ll have more to say on the report module in a later post.
Development on TRI‘s nondestructive evaluation data analysis software NDIToolbox has slowed of late as we’ve gotten closer to our goal for functionality and as we get ready to do an honest-to-goodness field test later this year on a QA line. Nevertheless I’m still plugging away at it whenever I get the chance, and today I’ve got the latest and greatest available with two new features: support for multiple datasets in Winspect data files and a new “batch mode.”
The batch mode feature lets you run an NDIToolbox plugin on a set of input files, optionally spawning multiple processes to speed things up. If you have a ton of data files and you’re doing the same number crunching over and over, just point NDIToolbox to the files and the plugin and let it do it for you. You don’t have to convert your data files to HDF5 before using batch mode; as long as the file format(s) are supported by NDIToolbox it’ll fetch the data and run the plugin automatically. More info on batch mode available here from my mirror of the NDIToolbox docs. If you’re going to use batch mode’s multiprocessing, be sure to read up on the requirements (basically, don’t have really huge data files).
As usual, I’d recommend using the conventional Python version of NDIToolbox if you can. If you’re on Windows and don’t want to install Python (or you want to run from a thumb drive), the Downloads section of NDIToolbox’s Bitbucket page has a Windows installer and a compiled version available, no Python required.
If you’re writing a plugin there’s one additional step required to support the new batch mode. Since more than a few nondestructive testing system file formats like UTWin’s CSC or WinSpect’s SDT can have multiple datasets in a single file, batch mode will send your plugin a dict of all the datasets it finds in a given input file. So you’ll need a bit of code to see if you’ve been passed a single dataset (conventional user interface) or a container full of datasets (batch mode). There’s a few ways to do this but one of the most straightforward is to look for a “keys” attribute like so.
if hasattr(self._data, "keys"): # Dict of data provided - batch mode for dataset in self._data: # Execute plugin on every dataset self._data[dataset] = your_analysis_function(self._data[dataset]) # You could alternatively execute on one particular type of data # e.g. # if dataset == "waveform": # self._data = your_analysis_function(self._data[dataset]) else: # A single dataset was provided self._data = your_analysis_function(self._data)
You could also just check to see if you were passed an actual dict, courtesy isinstance(). I’d recommend against doing that for now though – better to just assume it’s an associative container of some sort rather than hard-wiring an expectation of an actual dict.
Press the Scan button and you can take a point cloud scan, e.g.
Here’s the same scan as an interpolated wireframe if you’re having trouble making Abe out:
I haven’t had the chance to do much NDIToolbox work in the past month or so while I’ve been working on another project in the lab – it did involve lasers and a chance to play with C++ after many years’ absence so I’m not complaining. I did just push out an update this week that might be of interest if you’ve been running into memory problems. Hopefully this version’s a little more thoughtful when it comes to releasing memory it no longer needs.
Also in this version, I’ve added preliminary support for ultrasonic gate functions in the MegaPlot presentation. The functionality’s always been there but I’ve had it disabled until now while I was working out how to apply gates to three-dimensional data; I’m not 100% satisfied with the implementation but thought I’d enable it and come back to it later.
Update Wed Dec 26 12:23:44 CST 2012: managed to sneak some more work in on the project before the end of the year. It’s not in the documentation yet, but I’ve added exporting slices of a data file. Handy if you’re only interested in a subset of a much larger data file.
Finally, if you’ve ever wanted to just see screenshots and read about all of this NDIToolbox stuff instead of having to download everything, I’ve put a mirror of the current documentation up on the site. Have a look at the Quick Start for a primer on what NDIToolbox does, and the Plugins page to find out about…plugins. Developers might also be interested in how to write plugins. Sample plugin code is available that demonstrates how to write a server-based plugin, and how to combine Python with Java or C++.
NDIToolbox has been able to generate B-scans from ultrasonic data for a while now, but you had to know how to take slices of data. I just added a switch in the Megaplot presentation that will now do it for you automatically. Here’s what I’m talking about:
For comparison, here’s the usual Megaplot presentation:
Also available for testing is a new NDIToolbox installer for the Windows binary distribution. Tested to work under Windows 7 and 8. Available from the NDIToolbox Downloads page. As always I recommend downloading the Python version rather than the precompiled binary since it’s so much easier to keep up to date, but it might come in handy if you don’t want to install a bunch of dependencies and just want to get started right away.
Another update to NDIToolbox today, I’ve just added the ability to import data from a couple of ultrasonic NDT systems. These imports are still a little flaky because a) we haven’t finalized the HDF5 format we’ll be using in NDIToolbox and b) proprietary binary file formats being what they are, but for what it’s worth I’ve used them on the data I could get my hands on from some immersion tank scans done here @ TRI World HQ and elsewhere and they will at least let you display your data, so it’s a start. Hopefully I’ll be able to improve their functionality and add a few more importers as the project goes on.
Other recent but decidedly less interesting changes:
- Support for manual garbage collection – if you’re playing with large data and you get warnings about being out of memory, you can opt to clear some out to keep working. I’ll be implementing HDF5 slicing at some point so this is a temporary work-around.
- Fixed a bug in plugins-plugin support folders can now contain Python modules.
- All data retrieval functions now in a separate module (models/datio.py) so your code can use them directly.
As always, the source code is up on Bitbucket and a Windows binary is available as well. These changes will also find their way into NDIToolbox Labs – we’re still plugging away on integrating the Automated Data Analysis (ADA) Toolkit into Labs but making progress.