Why Astros' sophisticated database would be worth hacking

ByJOHNETTE HOWARD
June 20, 2015, 7:00 PM

— -- Speculation continues about the true motive of the St. Louis Cardinals employees being investigated for allegedly hacking into the Houston Astros' database. But there are numerous practical reasons that secretly accessing such a treasure of information would be desirable to a rival big league club, even one as good as the Cardinals have been.

Was the hacking episode motivated by a personal vendetta against Jeff Luhnow, the polarizing former Cardinals executive who left to become then-division rival Houston's general manager, as federal law enforcement suggested? Or will the investigation also show the Cardinals on an illegal search for a competitive advantage? Either way, what unique information or useful secrets might be found in the Astros' Ground Control system?

The details of exactly what's contained in proprietary team databases are fascinating and show just how sophisticated, essential and downright granular big data crunching has become. It also explains why computer analysis often surpasses baseball's old reliance on the eyeball test or human intuition as a tool.

Databases such as Ground Control work mostly by mining advanced stats and the same kinds of in-house scouting information that a lot of teams compile. Then -- this is where the magic lies -- in-house algorithms and analytic models that can predict success are developed.

Recognizing links between various information points, spotting trends or crunching an astonishing array of data could tell the organization where to put its emphases, which players to pass on or to chase and how much they're worth paying going forward. Sometimes the research folks are tasked with a specific request: Identify, say, the best left-handed relievers in the market. Then they go to work. The human element on the back end is having people smart or confident enough to jump on the insights created.

Better yet, systems like Ground Control do all of this quickly, in real time, by constantly processing and updating an immense volume of information on every player and metric on the team's radar. And all of this resides in a database that team employees can access 24/7 from their smartphones or laptops from anywhere in the world.

Luhnow's first experience with such a system came with the Cardinals, who hired the former McKinsey consultant and tech company executive, who had no baseball experience, in 2003. While in St. Louis, Luhnow oversaw the development the Redbird database.

Other clubs have confidential systems, too. But the Cardinals' and Astros' successes under Luhnow were especially swift and uncanny. No other team in baseball had more draft picks make the majors than St. Louis did in Luhnow's seven years as farm director. Sixteen of the 25 players on the Cardinals' 2013 World Series team had been drafted during Luhnow's tenure.

The Astros were a 106-loss laughingstock when they hired Luhnow after the 2011 season. It was controversial when he stripped the roster and payroll down until the franchise was akin to a car on cinderblocks.

Now look: Just four years later, the Astros have undergone a remarkable revival and lead the AL West.

To illustrate just how wide-ranging the available information is, and how it all works in practice, Luhnow described to Bloomberg last year a made-up example of a college prospect the Astros might be considering.

"Let's say he's played two summers in a wood-bat league," Luhnow began. "He's got hundreds of Division I at-bats with a composite bat but against a wide variety of competition. You've got scouts' input on his potential. Your video analyst says his swing is in the top quartile of swings he's seen that lead to success in the major leagues. Your area scout says his character is in the top 10 percent of players. But he's a C-minus student. Not academic, doesn't learn well. Your doctor says he's got a slightly above-average risk of sustaining an injury.

"I've just given you nine pieces of information. How do you weight them?

"I can't do that in my mind. It's overload for any human being."

Even saying "top quartile of swings" three times fast is hard.

The info gathering doesn't stop there. Anyone hacking into a database like Ground Control could also find the detailed physical, psychological and statistical profiles compiled for the amateur draft, notations on trade talks and analytics-based wish lists on whom to pluck off the waiver wire or promote from the minors.

It reaches down onto the field and guides strategies like what defensive shifts to play (the Astros often employ them more than any other team). The Astros even look at what science says the team's pitching coach should add or subtract from a certain pitcher's repertoire.

In short, analytic systems can connect the dots and provide insights where none were noticed before. They pan the enormous river of information for gold and unearth trends, calculate probabilities. Often, the conclusions reached can buck years of conventional baseball wisdom.

Sig Mejdal, Houston's director of decision sciences, helps the Astros reach those conclusions. Luhnow hired the former NASA engineer in St. Louis and brought him along to Houston. Mejdal gave Bloomberg another concrete example of Ground Control's power.

When the Astros plucked Colorado's Collin McHugh off the waiver wire after the 2013 season despite his career 8.94 ERA, the move might've surprised some folks. But today's major league stadiums are wired with systems such as PitchF/X and TrackMan that use Doppler radar to track the ball in three dimensions. For every pitch thrown in every game, teams now know the location, acceleration, movement, velocity and the axis of rotation of the ball. The Astros grabbed McHugh because they saw that while his sinker didn't play well at Coors Field, he had a superior curveball that rotated about 2,000 times a minute, or 500 times more than an average curve spins.

It was the baseball equivalent of noticing a needle in the data haystack.

Once he was in Houston, the coaches told McHugh to change his arsenal by throwing that terrific curve more and replacing the sinker with a high fastball.

The result? His ERA nosedived to 2.73 in his first season with the Astros.

"If you believe, as we do, that this data has predictive ability, then you're in an arms race to learn it and take advantage of it," Mejdal said.

What if you're a franchise that's not as good at the data crunching?

How much is that kind of info worth to you?

"Clearly it could be worth millions and millions of dollars and a significant competitive edge to the team doing the hacking," said a former big league GM who worked for one of the first teams to embrace baseball's Information Age.

"Houston has a lot of good scouts. So suddenly maybe you get information that your weaker evaluator didn't see. Maybe the trade information they hacked into could make you aware of players you didn't know were available, but Houston did. Now you make a call, maybe execute a trade. Or you pass on someone in the draft."

And that's just a start.

"There's also all the proprietary info in those databases," the ex-GM continued. "All the medical records, lots of other confidential information. Things like any positive drug tests prior to the announcement of baseball's drug testing. You might've done more background homework in terms of a player's senior year that nobody else knows. Maybe got arrested or thrown out of school or [in a] fight. Most organizations do homework on every single player that includes psychological tests they consent to. And how much training room treatment they get, not just injury histories. Only your trainer knows that. Sometimes we traded a guy because our trainer would say, 'This guy is ready to blow out.' Vision tests are big, too.

"If you have someone with bad depth perception, which means he doesn't hit a curveball, that can de-value a player. Or it can value a player who has the depth perception the level of Barry Bonds. Again, that's not public information. But it can help you mitigate risk, and that's important with the money these teams are spending on players."

It's been known for a while that teams have become pretty good at predicting when a player will physically break down, as the Astros did after choosing Brady Aiken with the No. 1 overall pick in the draft. They later gave him a low-ball offer to sign because his post-draft physical revealed he had an unusually small ulnar collateral ligament in his pitching elbow. And that led to a public skirmish with Aiken's agent, Casey Close. But Luhnow didn't apologize or back down.

"And I was there in April when Aiken blew his elbow just 10 or so pitches into his first start at the IMG Academy," said a National League scout. "The Astros were right in that case.

"In my experience, the psychological coaches have been right quite a bit at predicting things, too."

No wonder the temptation to get a better look at what Luhnow is doing -- or even just give him a comeuppance by hacking the Ground Control database he was so fond of talking about -- was apparently too delicious to resist.

If you're going to hack, hack the best.

"It's like you want to get their magic formula," the ex-GM said, "and it's highly guarded for the same reason Coke doesn't want RC Cola to know its formula."

Many folks in the IT world reacted to Monday's news that the Astros had been hacked with little surprise. The only real surprise, they said, is that cyberespionage took so long to surface in sports.

On Tuesday, an IT expert who worked with Major League Baseball in the years leading up to the Astros' discovery that they'd been hacked last year says MLB teams' commitment to IT and security varies wildly.

"It all depends on what they want to invest in and how paranoid they are," the IT expert said. "Most teams don't want to spend the money on IT. The Yankees and San Francisco have the best IT departments. Boston is pretty good. But when I went to see the Baltimore Orioles in 2012, they had one IT guy and he had [computer] servers I hadn't seen since 2000. I was like 'Whoa, dude, really?' It was the equivalent of chicken wire and duct tape.

"Even Detroit by their 2012 World Series had crap. And the Colorado Rockies? The last time they were in the World Series and the press couldn't get wireless in the press box, their IT guy said it was because the broadcast blimp overhead was jamming his space and eating up their bandwidth. That was how he tried to save face for building a bad network."

Droll pause.

"Blame the blimp," the IT expert repeated, breaking into a laugh.

Neither the FBI nor the Justice Department has said if the Astros' proprietary information reached the upper levels of Cardinals team management, let alone informed any of the team's decisions.

But if it did, baseball is expected to punish the Cardinals harshly in addition to whatever criminal charges may result.

Luhnow now says he uses a pen and paper to write down the trade conversations he has.

Yankees general manager Brian Cashman admitted this week that his team was among the many that moved to re-examine its security protocols since last year when the stolen Astros' info was first leaked online.

So what did the Yankees change?

"I don't want to go through it," Cashman said. "I don't trust the people out there who are smarter than us."