February 12, 2020
It was the middle of the night, and a NIF Target Chamber diagnostic tool was not responding to computer commands. Suspecting a software glitch, the control room paged the on-duty software engineer, Lyle Beaulac.
The software, as it turned out, was working as designed. But Beaulac’s sleuthing to resolve the issue is one example of the ever challenging, yet always rewarding, task of keeping one of the world’s largest scientific control systems in tune.
They may not always get the spotlight, but these control systems workers stand with pride alongside scientists and researchers to ensure the National Ignition Facility is able to carry out its vital mission of helping ensure the nation’s nuclear stockpile is safe, secure, and reliable, and that its systems are primed to keep advancing the boundaries of high energy density (HED) science, including the pursuit of ignition.
Sometimes that means challenging themselves to find a way to make it work.
“We’re not allowed to say, ‘I don’t know,’ ” says software architect Chris Estes. “The key is having good tools to do your job, coming up with creative solutions, and having the tenacity to not give up. I like to say there’s a lot of people here who are smarter than me, but I don’t know if there’s many who are more stubborn.”
Keeping NIF humming is no easy task considering the control systems run on about 4 million lines of constantly evolving computer code orchestrating about 66,000 control elements, such as motors, diodes, and cameras. The systems also control more than 2,000 front-end and embedded computer processors.
The standards needed to make the world’s largest and most energetic laser work precisely on cue are extremely high — the systems have to deliver 192 laser beams hitting a target within a 50-micron tolerance with a timing accuracy of 30 picoseconds (trillionths of a second). Operations such as laser beam pulse shaping, alignment, and energetic tuning require the kind of autonomous controls that run NIF.
That keeps the dedicated NIF&PS control systems teams constantly innovating, improving, and testing.
“It requires all of that orchestration to keep NIF fully pumping away on all cylinders on a 365-day basis,” says control systems lead Gordon Brunton.
When there’s a problem to be solved, the teams stand ready, even in the wee hours.
Eye of a Mechanic
Beaulac was on call the night the Target Chamber diagnostic would not move into its proper position despite repeated commands sent from the Control Room. After determining the problem wasn’t due to a software error, he turned into a detective, diligently following the trail along the diagnostic’s hardware all the way back to the Target Chamber.
That’s when his experience in auto mechanics and robotics came into play: The device intended to move the diagnostic into proper position employs a clutch that is similar in nature to those used in manual-transmission vehicles.
“The motor was spinning, but the clutch was slipping,” Beaulac says. “This just shows how we need to know about more than just software. We need to have a broad base of knowledge about cross-system analytics.”
Mikhail Fedorov, Integrated Computer Control System (ICCS) group leader, praises Beaulac’s intuition and mechanical know-how.
“I didn’t even know this had a clutch,” Fedorov says. “But Lyle knew.”
In NIF’s early days, shot cycles ran more manually and were tracked on spreadsheets. But Brunton says as the system scaled up, the control systems became more automated to increase efficiency and allow NIF to raise the number of experimental shots to about 400 per year.
Now, there are about 2 million automated operations for every 4- to 8-hour shot cycle. The system is designed to monitor itself as much as possible.
“There’s a lot of checks and balances as we go through a sequence,” Brunton says, “The software is continuously checking whether a device is in the right position.”
The result is a “more robust” NIF, he says. “Operations have this cadence where they fire back to back to back, any time of the day. They’re firing off a very regular, well-planned series of shots. Now it’s a humming machine.”
Needing a Human Touch
While the systems are designed to mostly monitor themselves, they require a human touch. Control systems software is updated about every quarter. Development of each release starts about six months before implementation, allowing time for extensive simulations in controlled test-bed environments separated from NIF’s operational systems. More software changes are made as NIF experiments discover new scientific realms to explore.
The software team processes about 2,000 unique software change requests per year, including about 40 percent to fix problems or for scheduled maintenance, 30 percent for upgrades, and 30 percent for adding new capabilities. That means each software developer is responsible for maintaining or upgrading about 120,000 lines of code per year.
“We are constantly adding new target diagnostics, because that’s what we do,” Brunton says. “It’s all about the Target Chamber and collection of data associated with the targets to increase our learning and understanding of physics.”
Dry runs in offline test-beds help catch software glitches before they can delay or disrupt shots, says Suzanna Townsend, software quality assurance lead.
The testing process is “the final gauntlet the software has to go through before we release it to operations,” Townsend says. “We also run a minimum of 10 different types of regression test shots manually to make sure we replicate what the operators do. The majority of testing and configuration management takes place behind the scenes.”
In many cases, they can’t rely on standard operating manuals or techniques refined in other facilities.
“There’s some technology in NIF that doesn’t exist in any other place in the world,” Estes says.
For example, NIF uses cameras connected by high-speed FireWire cables that had to be custom built to span longer distances and more rigorous conditions. “They didn’t exist in the real world, so we invented them here,” says Russell Johnson, Fabrication Technology Team lead.
That also means Johnson’s team has to be ready to fix the cables, even if it’s midnight or beyond.
Team members often go above and beyond the call of duty to make sure the work comes together properly.
When smoke from Northern California wildfires caused unhealthy air conditions throughout Livermore in November 2018, the Lab sent most employees home for their safety. A crucial ICCS software patch was being installed, so Townsend and Software Configuration Team lead Dan Koning stayed an additional five hours to make sure everything worked.
Townsend praised Koning for choosing to stay “even though virtually everyone else had left the building. That was an extra special effort he put in.”
For Paul Zapata, Controls Hardware System manager and Electronics Tech Team lead, going the extra mile is not unusual.
“What appears to be above and beyond is routine for us,” he says. “To look at anything we do as above and beyond is difficult because it’s not. It’s just what we do.”
The systems extend far beyond software. In addition to the front end and embedded processors, there are more than 100 virtual servers and 350 unique points of control, such as motors, cameras, and photo diodes. There are also 18 systems with unique requirements and constraints.
When a glitch surfaces, it’s easy to perceive it as a computer bug “because our software is controlling everything,” says Fedorov. Working on the control systems teams “requires knowledge of all systems, not just software,” he says. “We are the first line of defense.”
For example, Karl Wilhelmsen — an electrical engineer who’s also versed in robotics — recalled one 2 a.m. call about problems with an alignment camera cooling system. He was able to track down the source.
“It was the water line to one of the beam alignment cameras in the middle of the chamber,” he says. He advised crews to shut off the water supply and turn off the camera so it wouldn’t overheat.
Each Problem is Unique
Estes says finding a software glitch can take weeks of “banging your head against the wall” looking for something as simple as a minor code tweak. Because of his proven problem-solving abilities, Estes has come to expect calls at home from the Control Room. But the nature of those calls has shifted to more than just software.
“It went from, ‘Chris, you need to come fix your software,’ to ‘Chris, we know it’s not the software, but we need your help fixing and diagnosing the hardware problem,’” Estes says.
Shortly after Jeremy Dixon joined the Lab about five years ago, he was asked to troubleshoot a lift used to bring workers inside the Target Chamber.
“When it broke down, it left people stuck in the Target Chamber, which is not a good place to be stuck,” says Dixon, now Controls Engineering and Maintenance Group leader. “It was a common failure that the team had been battling for many years.”
Dixon brought a fresh perspective to the problem and made a key discovery — a burned-out connector on the lift. After it was replaced, the constant lift failures “haven’t returned and the team has felt much better about entering the Target Chamber,” he says.
Project Preparation Time
Dixon is now responsible for all NIF industrial control systems, including devices that monitor factors such as hardware temperature within a beamline, the pressure of argon gas in ducts, and the Safety Interlock System (SIS) equipment that controls badges and safety systems.
Depending on the project, he says, “we spend weeks to months to sometimes years working on a project before we deploy it in the facility.”
He spoke during a break in one of those rare but extraordinarily long days that would go from 5:30 a.m. to about midnight for a scheduled SIS software upgrade. This was during a scheduled long maintenance shutdown of NIF.
“We spent months preparing for this one day,” he says.
Dixon’s team installed a new Precision Diagnostic Sensor Package designed to better understand laser performance on one of four NIF’s beamlines. The work entailed two crews working staggered double shifts to get the job done within a 15-day maintenance window.
“It was really exciting to finally see that coming together,” he says.
There’s a certain amount of pressure on the team because they want to get the job done correctly,” Dixon says. “If anybody fails, we all fail.”
But knowing NIF’s mission for the nation makes all the challenges and hard work rewarding, especially “once you start seeing some of the experimental data come through and you hear all of the success that we’re having,” he says.
And that’s what he tells potential Lab employees during recruiting events.
“It’s so much fun being able to tell potential new employees about the work we do here and the exciting things that we get to do that really go to improving the nation overall,” he says. “Here, the harder we work, the more we push, the better the nation does.”
“There’s a sense of pride in participating in the mission,” adds Robert Mich, the Maintenance and Field Services Team lead.
Helping Cutting-Edge Science
Johnson, the Fabrication Tech Team lead, enjoys the challenge of his job, which can be compared to how scientists tackle problems.
“I can imagine the process of interpreting shot data to try to find that needle in a haystack to help their science move along; it is the same process with us,” he says. “When a system goes down, it’s the same thing. We’re picking through that system, trying to understand the data and the failures to understand what caused the problem.”
“I’ve always loved it because they’re doing cutting-edge science,” says Johnson. “It’s like I’m watching some documentary on the Discovery Channel every day as you’re working with these scientists and engineers, battling to try to figure out how to make this big project keep clicking along.”
Follow us on Twitter: @lasers_llnl