High Tech | Projects

Bad Seed is an artificial intelligence based autonomous scheduler used at Brookhaven National Laboratory (BNL) in the National Synchrotron Lightsource II. As with the majority of technology used at BNL, there is no such thing as “optimized enough”. The only thing holding back science is time and resources, so, by working on this project, I hoped to mitigate some of those limitations.

The problem

Beamlines generate an enormous amount of data, much of which needs to be remeasured in order to create clear data. Originally, clearer data was gathered by remeasuring all samples 100 times.

The Goal

Get an algorithm to learn when to remeasure unclear samples (aka bad seeds) and reduce time dedicated to remeasuring samples.

The Solution

I used human psychology reward schedules in a custom Python TensorFlow environment to find bad seeds, using an Advantage Actor Critic agent.

Why it is Important

I was able to improve the amount of measurements by ten times. It means that the beamline scientist does not have to spend as much time glued to the beamline, looking at how clear the rings are on a sample, and can dedicate themselves to other work.

My Role

I was the front runner on this project, and was mentored extensively by Daniel Olds and Joshua Lynch. I created my own environment and lead the research in terms of determining the reward system, structuring the goals for the algorithm to acheive, and designing an associated interface. This research was taken and included in larger projects at the beamlines, such as “Gamifying the Beamline”.

‍

Want to take a deeper dive into the research?

A deeper look at the problem...

At the beamlines, samples frequently need to be remeasured in order to make sure that we can get a clear reading on them. But there is so much data generated in even one sample reading! How can we know which need to be remeasured?

‍

The original answer:
Just measure all of them 100 times!

Well.. that wastes a lot of time....

‍

The Process...

The goal is to get an agent (the computer’s decisions) to learn the rules of a particular environment. The environment will reward or penalize the agent for its decisions. For example, one of the requirements for our environment is that we do not want the agent to pick the same sample over and over again. It might look something like this:

‍

In the actual coding environment, it looks more like this:

‍

Picking and Agent:

Based on previous experimentation with cartpole and other basic reinforcement learning paradigms, the two agents that seemed most promising were the Advantage Actor Critic Algorithm (A2C) and the Double Deep Q-Learning (DDQN) algorithms. After testing on similar environments to the one described above, A2c seemed like the best choice because it learned faster:

‍

Environment Goals: