Career Profile

I am a PhD physicist who for the last 7 years has been a client facing applied data scientist in various industrial domains. Problem solving and analytics software development for large scale deployment are at the core of my skillset. Given my academic and industrial experience, my domain expertise is diverse, including but not limited to Aviation, Agriculture, Transportation, Power Generation, Oil and Gas and Manufacturing. My technical expertise is currently focused on machine learning methods for multivariate time series. However, I work towards proficiency in the many other corners of the broad and deep field of data science. Commensurate with technological acumen, I understand the vital need of clear communication skills for culminating a productive and on-going relationship with industrial clients. Solving a technical problem is often only half of the battle.


Experience

Senior Staff Data Scientist

11-2016 to Present
General Electric Digital, San Ramon, CA

As a Senior Staff Data Scientist, I focus on proof of concept analytics development with direct client collaboration. My role offers a mixture of engagements of varying duration. The shorter projects are typically an evaluation of the feasibility of producing a solution to a client's challenges given the available data. This allows a quickly growing and diverse exposure to the variety of challenges our client's face and the aspirations they have. The longer projects require a more robust analytic development and typically follow more closely the experimentation, development and deployment process.

During this period I have developed 9 solutions to problems in 8 industrial domains: risk of pneumonia health ranking for beef cattle (4 months), root cause analysis of hypoxic pilots in FA-18 Navy fighter jets (6 months), anomaly detection and isolation in Army transport vehicles (1.5 months), anomaly detection in food packaging manufacturing (4 months), anomaly/theft detection in multivariate event data for a large oil refinery (5 months), root cause analysis for generator control units on Navy FA-18 fighter jets (3 months), maintenance optimization for a US nation wide nuclear power producer (12 months), edge based / on-premise fault mitigation analytic for different aspects of microchip manufacturing (pump health and robot arm calibration) (1.5 years) and an efficiency maximization tool for a very large solar field (7 months).

1) One of the leading killers of beef cattle in the United States is bovine pneumonia typically acquired during extended and cramped transportation. Early identification and treatment of the ailment is vital to an optimal feed to beef conversion. A further benefit to identifying sick animals is a targeted application of antibiotics. This not only reduces the cost but avoids unnecessary dosing. A rolling window ensemble binary classification method (random forest) allowed a temporal probabilisitc ranking for identifying cattle at risk. This daily list can help cowboys identify those animals from sometimes thousands that they are responsible for.

2) Navy pilots have recently been experiencing potentially life threatening side effects from flying F18 fighter jets. The cause is unclear, but by definition of the hypoxic and hyperoxic symptoms, it must be related to either the quality or quantity of available breathing air. As part of a team of 3, we produced a collection of potential causes for subject matter experts in the Navy to evaluate. Our approaches were a mixture of standard but targeted hypothesis testing between healthy and unhealthy populations, partial dependence plotting from extensive feature creation, topic modeling and text summarization of pilot descriptions (LDA/Gensim), targeted modeling of the ventilation system using a comparison of upstream and downstream predictions from shallow neural networks. We also attempted various attempts methods of supervised classification with the goal of investigating the driving factors during an event using local interprettable models (LIME) to deduce potential causes. However, since the onset of symptoms could range from immediate to a few days after the incident, supervised classifiers were not generalizable. The validity of our recommendations are being explored by Navy personnel.

3) Army transport vehicles are very expensive to operate and maintain. In order to reduce their cost and ensure asset performance, they want to move away from scheduled maintenance towards a more proactive and optimal approach to maintenance and full updating of these assets. This work leverages the succes of our unsupervised anomaly detection framework that we developed for commercial jet engines and is described below. A per vehicle temporally resolved anomaly score as well as a feature ranking are provided as outputs. This work is slated to be part of a larger fleet wide monitoring and planning tool.

4) A food product plastic sheet and container manufacturer needed to reduce scrap. As a part of a team of 2 for 4 months we focused on thier plastic sheet production line to identify leading indicators for tearing and hole formation. The process starts with a particular recipe of plastic ingredients which works it's way through heating and stretching stages to finally wind around a spool. The client wanted to reduce plastic sheet breakage mid spool. Given information about the ingredients, spool speeds/torques, oven temperature and cycles and known breakages we developed applied an information based approach leveraging Graphical Granger Causality. An unsupervised approach was chosen since there are limited number of breakages (although financially significant). We did use the breakage records to evaluate our performance (if retroactively employed - we would have caught roughly 70% with ~40% false positives - although separation of false positives vs close calls was difficult to agree upon. The approach offers anomaly scores per input channel, which allows some loose root cause identification. One of the common causes of breakage was simply a slight mismatch in consecutive roller speeds.

5) A large and somewhat antiquated oil refinery was in need of a better accounting system to both be sure that the amount produced agreed with the amount sold. A natural but secondary outcome was the ability to monitor for leakage or theft along the production and delivery network. The data consisted primarily of storage tank levels, product type, density, pressure and temperature, delivery schedules as well as some flow measurements. The typically effective classical approach for this type of work is to leverage data validation and reconciliation but this requires a significantly thorough set of flow measurements and an operational state model that ultimately could not be fulfilled. In lieu of this we made use of the sensor event data (10 minute cadence over 5 years for ~10 sensors) to build 2 main analytics over multiple interconnected storage tanks and delivery lines. The first analytic was to more accurately determine the phase (e.g. idle, tank being filled, tank being emptied, etc.). We used a combination of two methods, the first, a simple variable window rolling slope change detection. This was possible because the cycles and rates of filling and discharging were fairly regular. We also made use of an open source algorithm called Greedy Gaussian Segmentation (i.e. multivariate change point detection). Both worked well, but the slope detection was less computationally intensive. The second portion of the analytic was for multivariate anomaly detection. We made use of a serial approach. Hotelling T2 did a fine job and was very low noise, but does not provide a multi-channel output. A high wasted stacked denoising autoencoders had slightly higher noise but did offer the multichannel anomaly score. The final solution was to monitor the low noise Hotelling score and if breached look into the details of the autoencoder output. The two approaches were mostly in agreement. This project is in the process of being productionized. I led the data science portion of this project and directed the efforts of 3 others across 3 continents.

6) Due to our successes and experience in the previous Navy engagement (described above) we were awarded another contract to perform a root cause investigation and identify mitigation opportunities for faults in the generator control unit (GCU) of FA-18 fighter jets. Simply put, these assets provide stable electrical power to the jet. We had access to primarily 2 forms of data, usage (e.g. 10 Hz full flight sensor measurements) and maintenance records - both from the Navy and our GE depot (as GE is the OEM). In terms of analytics this was not a complex project requiring state of the art data science techniques. Instead the challenge was in data connectivity. Naturally the Navy wants to create a holistic view for an asset to be efficiently proactive with maintenance for fleet readiness. This requires reliably stitching these disparate data sets together over the life of the asset. Once the multiflight usage and maintenance history is available many relatively simple but highly valuable things can be done. For examples, having a multi-flight history of the asset will allow the application of well established health indicators (e.g. percentage of time used above some temperature in some mode). Having the maintenance history will allow a few key things, 1) a clear story around parts replaced over time (e.g. avoid repeatedly replacing the same part), 2) a measure of maintenance effectiveness both within and across repair sites and finally 3) improved triage (if part X fails then typically the maintenance SME should consider inspection or replacement of parts Y and Z). While analytics were not central the last point around triage did make use of a combination of text mining and clustering. This was a 3 month engagement led and managed primarliy alone and is currently in the proposal process (in conjunction with the Navy) for a phase 2, the goal of which would be to automate the results from phase 1 and extend them to other asset types beyond the FA-18 GCU.

7) Nuclear power production facilities organize their assets into functional equipment groups which are helpful when creating the complex preventive maintenance schedules required by the NRC and other regulatory entities. Our client is interested in identifying preventive maintenance tracts whose schedule is unecessarily frequent and expensive. The current practice is manual in that an inspection SME picks various assets and submits recommendations. As there are 10's of thousands of assets, they want to develop a method to find these cost saving opportunities automatically. The maintenance architecture and scheduling of nuclear power production facilities is very complex in that you must not only consider asset reliability but also safety, regulation, and interrelatedness of assets (bringing one asset to shop requires a shut down of others). The solution to this challenge is a combination of opportunity ranking, asset reliability determination, careful filtering and schedule optimization. At the time of the project, the only data avilable was maintenance and organizational records - unfortunatley no sensor data - therefore no asset degradation analytics. The ranking is performed by relatively simple logic that captures inspection SME work processes. The reliability portion is more challenging due to limited fault data. The typical approach to this type of problem (given only maintenance records to determine the MTBF) is to use a Weibull regression on true asset faults (corrective maintenance) and then find the balance between corrective and preventive maintenance using the cost per unit time. However, given the lack of true fault events we have relied on the more general reliability growth approach (which has less restrive assumptions and allows for mixed assets and failure modes - thus aggregation). We are currently (July 2019) beginning phase 2 of this effort which will solidify the ranking and reliability portion into our product. Assuming we continue to meet our milestones, Phase 3 will extend this framework to run as a probablistic classifier (based on historically approved and denied manual recommendations) so that the rule based logic and it's relation to maintenance extension approval/denial will be captured. Finally, Phase 4 will develop and implement the schedule optimization probably using a typical tool such as linear programming. I am the lead data scientist on this effort over two others and working closely with other skill sets to productionize our work and integrate it with the clients existing tools. This is to be used over many sites.

8) The avoidance of unexpected down time is particularly important for microchip manufactureres. With the help of one other data scientist we developed two analytics to mitigate faults during production. The first was to monitor and alert on poor performance of a fleet of vacuum pumps. As these are common highly vibratory assets outfitted with accelerometers, we were able to leverage/adapt an existing approach based on a Fourier decomposition. With our analytic in place, planned maintanance can be better scheduled. The second task was to develop an analytic to identify when certain stepper motors ("robot arms") start to become uncalibrated / misaligned. During the manufacturing process, these arms pass the product around within a extremely clean environmnet. When they become misaligned, the product is dropped and destroyed creating both a loss in physical ingredients as well as a significant loss in time due to clean up. The analytic used spatial information from accelerometers to identify slight deviations in motion and potential misalignment. Healthy operation data was used to train an Random Encoder/Decoder Forest which worked extremely well for this somewhat oscillatory data. Both solutions were implemented on-premise in Python with significant collaboration with both data scientists and engineers on the client side.

9) Saudi Arabia is both a perfect place and a difficult place to have a 32 square kilometer solar field. Obviously, there is a multitude of sun available but there is also a lot of sand and wind, which is problematic for solar panels. I am developing a regression based analytic using XGBoost to alert on underperforming panels. The solution will allow maintenance personel to target these broken or fouled panels for replacement or refurbishing. This will replace the current method which is a continuous manual field wide inspection. This project is currently underway.

Further, I was able to evaluate the feasability of analytic creation for 3 other potential projects - RFIs. These include, shipping port crane failure prediction (1 month), passenger train door failure risk ranking (2 weeks) and finally, the identification of operator training opportunities for manufacturing in the skin care sector (1 week).

Other additional duties include junior staff mentoring, the development of reusable analytic tools (e.g. a feature creation tool kit for industrial assets) and consultancy for GE Digital product development (Asset Performance Management - APM and Operational Performance Management - OPM).

Interim Manager of Data Science Services Team

02-2016 to 11-2016
General Electric Digital, San Ramon, CA

Due to an unexpected departure of leadership, I was offered the opportunity to lead my colleagues in an acting capacity until a replacement could be recruited. I led the vision and strategy for team success as well as orchestrated data science project execution and employee career trajectories for a team of 18. Working with different groups within the company (cyber security and finance) I was able to provide Amazon Web Service access to our team for project execution for the first time.

Staff Data Scientist

06-2013 to 02-2016
General Electric Digital, San Ramon, CA

During this introductory period to industry, I contributed heavily to fault detection and isolation and remaining useful life prediction for commerical jet engines. By design, engine faults in the commercial aviation industry are rare. Therefore, applicable analytic techniques are primarily unsupervised, semi-supervised or transfer-based in nature.

The few but more abundant faults are typically mundane yet important for maintenance scheduling versus cataclysmic failure. The identification of these faults using semi-supervised learning and change point detection allows a lead time for maintenance planning. Concept drift, class imbalance and asset healing from partial maintenance activities is a continual challenge during analytic development. Portions of this work went on to be used in the GE Aviation jet engine monitoring analytics suite. The work was also captured in an internal GE report.

Methods for engine maintenance planning were also developed for new product introduction, using transfer learning and domain adaptation from a similar fleet of engines to new engine test data. We leveraged approaches both requiring no labeled events (MMDE) as well as boosting on a few as they become available (Tradaboost). This combination allows maintenance planning to bridge the gap during the transition from old product to new.

As a result of the fact that there are, by design, few failures in industrial equipment, unsupervised anomaly detection with feature importance is a highly valuable tool to have available. As a team of 3, we developed a reusable framework to do this relatively easily. The approach combined preprocessing, model training, validation and testing into one pipeline. The result is an anomaly score as a function of time and a breakdown of the contributing features at each time step. Not knowing the signature of the anomalies required an ensemble of techniques, which look for various types of anomalies independently. These include, cosine distance to a learned manifold (Matrix Sketching), temporally based distance (Dirichlet Process Gaussian Mixture Model over a rolling temporal window of the time series covariance), a non-distance based isolation method (temporal Isolation Forest) and an information based (temporally changing Graphical Granger Causality) approach. All of them (except for Isolation Forest) return a score which can be broken down into contributing portions allowing not only detection, but isolation, offering some insight into potential causality. This is highly valuable for maintenance planning and proactive fault avoidance.

In addition to the work in commerical aviation, I was a leading member of a team of 6 to develop an early warning solution for power generating turbines and heat exchangers for a large chemical corporation. Our work augmented legacy GE monitoring and reactive analytics with predictive capability. The output of the autoassociative legacy analytic acted as an input to the predictive ARIMAX model. This work was part of a larger offering making this analytic available to their fleet of industrial equipment in general.

Postdoctoral Scientist

07-2009 to 08-2012
Lawrence Livermore National Laboratory (LLNL), Livermore, CA

My postdoctoral appointment allowed me to both lead and participate in multiple projects as a physicist using world class x-ray sources (LCLS at Stanford, DESY in Hamburg and OMEGA in New York). The applications of the work were diverse and included, materials science (ultrafast degradation physics in carbon), astrophysics (development of a stellar temperature diagnostic), fundamental atomic physics (free electron laser induced photoionization and excitation in Argon), high energy density physics and biology (protein crystallography). This work resulted in numerous peer reviewed journal articles (e.g. three in Nature and one in Physical Review Letters).

Visiting Graduate Student / Research Assistant

12-2005 to 12-2008
MIT Plasma Science and Fusion Center, Cambridge, MA

My PhD project concerned the development and implemenation of a diagnostic technique for measuring plasma flows related to the efficient operation of the experimental magnetic confinement fusion device known as a tokamak. This work resulted in one publication in the peer reviewed journal Physics of Plasmas.

Visiting Graduate Student / Research Assistant

12-2002 - 12-2005
Lawrence Livermore National Laboratory (LLNL), Livermore, CA

The research assistant portion of this role required hands on experiemental work at the LLNL electron beam ion trap. The purpose was fundamental studies in atomic physics primarily applicable to astrophysical diagnostics. One main publication captured this work in the peer reviewed Astrophysical Journal.


Projects

This section contains relevant topics of interest which do not fit yet into my professional daily tasks.

NLP Topics for Times Series Analysis?

Always of interest to me is finding alternative uses for mature and proven analytics. This often means taking an approach in one field and applying it in another. For example, I am currently exploring a method to use the idea of distributed word representations, but in a time series setting. Time series can be converted to text using SAX, which opens up the world of NLP and text mining techniques for consideration. The challenge is to adapt this historically univariate approach to a multivariate setting. One way to do this is to capture the interactions of the signals using covariance or an empirical mutual information (although non trivial) and then apply something like word2vec or GloVe to each univariate value. Another approach is to learn the word vectors across sensors for a given time point. The order of sensors matter in this case, but as long as you are consistent between training and testing, there should be no problem. You could also imagine shuffling the column order and using some ensemble of the result (the same random seed should be used between training and testing). The value is that one can leverage an unlabeled data set (which are typically easily had) for pretraining for use in a supervised model with many fewer labels. This is often the setting for high value asset failure as mentioned many times above, but is applicable in general for detecting rare events. This is a work in progress.

GPU based Computer Build

I recently assembled a modest GPU work station from scratch to experiment with deep learning technology. It houses an NVidia GeForce GTX Titan 12 GB video card with an Intel Core i7-6700k 4.0 GHz Quad Core processor, 32 GB of RAM and 500 GB of solid state memory, all running on Ubuntu 16.04. A full list of the workable build can be found here. So far it works quite well.


Honors and Awards

GE Crotonville invitation for Persuasive Leadership Skills (upcoming - March 2020)

Aviation Digital Recognition (1 of ~50 in all of GE) (2018)

Graduate of GE Crotonville Leadership Development Course (2015)

Executive Equity Award (2015)

Numerous (~10) internal GE "Over and Above" recognitions (2013 to present)


Patents

(Submitted - May 2020) Methodology for Data Driven Identification of Optimized Asset Strategy Opportunities Across Functional Location Hierarch

(Submitted - May 2020) Methodology for the Development and Deployment of Digital Asset Strategies

(Filed - App # : 20200210859) Predicting Fatigue of an Asset that Heals

A Framework for Unsupervised Anomaly Detection on Industrial Time Series Data (US 15/474,563)

Creating Predictive Damage Models by Transductive Transfer Learning (US20170300605A1)


Publications

A. Graf, “The Development of Benchmark Data Sets for Industrial Assets”, Internal GE White Paper (2015)

A. Graf, “A Visible Spectral Survey in the Alcator C-Mod Tokamak”, Canadian Journal of Physics 89:(5), p. 615 (2011)

A. Graf, “Measurement and Modeling of Na-like Fe XVI Inner-shell Satellites Between 14.5 Å and 18 Å”, Astrophysical Journal, 695, 818 (2009)

A. Graf, “Multichannel Doppler Transmission Grating Spectrometer at the Alcator C-Mod Tokamak”, Review of Scientific Instruments, 79, 10F544 (2008)

A. Graf, “Spectroscopy on Magnetically Confined Plasmas using Electron Beam Ion Trap Spectrometers”, Canadian Journal of Physics 86, 307 (2008)

A. Graf, “Lifetime of the 1s2p 1P1 Excited Level in Fe24+”, Spectral Line Shapes: Volume 12, 16th ICSLS, AIP Conf. Proc. CP645, edited by C. A. Back (American Institute of Physics, New York, 2002), p. 74-78

Co-Author Publications

T. Burian et al, Subthreshold erosion of an organic polymer induced by multiple shots of a high-fluence x-ray free-electron laser (Accepted Physical Review Applied - 2020)

W. Wierzchowski et al., "Synchrotron topographic evaluation of strain around craters generated by irradiation with x-ray pulses from free electron lasers with different intensities", Nuclear Instruments and Methods in Physics Research Section B, 364, p.20-26 (2015)

A. Levy et al., “The creation of large-volume, gradient-free warm dense matter with an x-ray free-electron laser”, Physics of Plasmas, 22 (3), 030703 (2015)

M. Hunter et al., “Fixed-target protein serial crystallography with an x-ray free-electron laser”, Scientific Reports (Nature) 4, 6026 (2014)

D. Garvey et al., “Development of Next Generation Anomaly Detection & Isolation for GE90 Engines” GE Report GRC286 (2014)

M. Frank et al. “Femtosecond X-ray diffraction from two-dimensional protein crystals”, IUCrJ, v.1, pt.2, 95 (2014)

C. Weninger et al., “Stimulated Electronic X-ray Raman Scattering”, Physical Review Letters, 111, 233902 (2013)

J. Rudolph et al., “X-ray resonant photoexcitation: line widths and energies of K-alpha transitions in highly charged Fe ions”, Physical Review Letters, 111, 103002 (2013)

S. Bernitt et al., “Tackling the Astrophysical Fe XVII Emission Problem with a Free-Electron X-ray Laser”, Nature Dec. (2012)

S. Hau-Riege et al., “Ultrafast Disintegration of X-ray-Heated Solids”, Physical Review Letters 108, 217402 (2012)

N. Rohringer et al., “First Realization of an Atomic Inner-shell X-ray Laser at 1.46 nm Wavelength”, Nature, 481, 7382, p. 488 (2012)

J. Gaudin et al., "Amorphous to Crystalline Phase Transition in Carbon Induced by Intense Femtosecond X-ray Free-Electron Laser Pulses", Physical Review B, 86 024103 (2012)

J. Clementson et al., “Atomic data for the ITER Core Imaging X-ray Spectrometer”, Proc. of the 39th European Physical Society Conference on Plasma Physics (2012)

J. R. Crespo et al., “Photoionizing Trapped Highly Charged Ions with Sychrotron Radiation”, Proc. for Atomic Processes in Plasmas (2011)

J. Dunn et al., “Spectroscopic Studies of Hard X-ray Free-Electron Laser-heated foils at 1016 Wcm-2 Irradiances”, SPIE Proc. X-ray Lasers and Coherent X-ray Sources (2011)

K. Chu et al., “In-plane Rotation Classificiation for Coherent X-ray Imaging of Single Biomolecules”, Optics Express, 19, 12, 11691 (2011)

F. Graziani et al., “Large-scale Molecular Dynamics of Dense Plasmas: The Cimarron Project” (High Energy Density Physics June issue 2011)

S. Hau-Riege et al., “Interaction of Low-Z Inorganic Solids with Short X-ray Pulses at the LCLS Free-Electron Laser”, Optics Express 18, 23 p. 23933 (2010)

B. Labombard et al., “Critical Gradients and Plasma Flows in the Edge Plasma of Alcator C-Mod”, Physics of Plasmas, 15, 056106 (2008)

G. V. Brown et al., “Simulating Cometary and Stellar X-ray Emission in the Laboratory Using Microcalorimeters and an Electron Beam Ion Trap”, 14th APS Atomic Processes in Plasmas, AIP Conf. Proc. CP730, edited by J. Cohen, S. Mazavet, and D. Kilcrease 730, 203 (2004)