You probably saw a number of clever commercials throughout the NCAA Tournament about the partnership between Google Cloud and the NCAA. We did some digging to see what this is all about.
What is this?
The NCAA is handing over all of their data to Google to parse out and sort through to make something useful out of it. On the surface, this is a really good thing. Some organizations, like StatGeek, have tried to do this on a smaller scale but we don’t have all the data the NCAA has and our focus is just on one sport, basketball.
When do they start?
They’ve already started. If you go here, you can see some of their work already. Right now, it looks pretty and looks pretty geared towards fans. For instance, teams with cat mascots cause the most upsets. That’s just an observation, the cat mascots haven’t CAUSED anything, but it’s a cool thing to talk about if you have a college basketball podcast and you’re helping someone fill out their bracket. They had some more interesting stats about the number of expected possessions or three-pointers but that’s fairly basic stuff to calculate. My hope is this morphs into something that could actually be useful to coaches.
What’s the downside?
The downside is Google is huge and makes its money outside of this project. So, while it made this look awesome during the NCAA tournament, what does this look like by the time next season starts? Further, does the focus just stay on Division I Men, as it always has been or do we see more data on the women or lower divisions. They did calculate strength of schedule for all divisions which is unique, but may not be telling in D2 or D3 where most people play heavily regional schedules that could skew their win/loss record relative to how good they actually are.
Another thing is there are a ton of other sports with data. You better believe come September we’ll see ads telling us that left-handed quarterbacks throw for an average of 8.6 yards on second down in the rain (totally made that up). But again, it’s a random factoid that tells you really nothing about any of these teams.
The biggest problem with big data is you don’t know how they’re weighting data from different eras. Clearly, teams shoot a lot more three-pointers now than they used to, but if we’re aggregating decades-worth of data, you may come up with an observation filled with data from a bygone era that isn’t applicable now.
Where do go from here?
I wonder if they’re going to take this to the next level. I looked up one of the Trinity(TX) men’s teams from when I was a student there, they had the cumulative season stats, in the form of a picture of a form that was submitted at the end of the season. Will this information be digitized and used? I hope so, but we have no indication of that.
Also, will Google start to compile advanced stats or do any of the things that Lineup can do so we know the most efficient lineup of all-time? I have my doubts, but I’ll let you guys know if we start to get a few google.com email addresses as subscribers.