K12 school districts need local data warehouses

In Utah, the state office maintains a data warehouse that enables record exchange between districts and smaller systems, as well as populating a longitudinal database for analysis. Both efforts still feel like they are in the early days, especially the longitudinal database. As far data warehousing for K12 education, the state is the only game in town and their DW efforts are almost entirely shaped by legal and logistical necessity.

This leaves a gap that districts must fill. How can districts structure and store data for their own analyses? Currently, district personnel make do with pulling reports from their SIS (student information system) or exporting tables from the SERF disk, an ACCESS database which is USOE’s method of returning accountability datasets to the district level. These sources are clunky, often redundant, and slow.

In an era of real-time analyses, machine learning, and Big Data, Utah K12 districts are still struggling to compile data that is months old. It’s 2015, shouldn’t we be able to predict which students will likely not graduate?  Which students need multi-year reading interventions as they transition from elementary to secondary? We have all of the data, but we don’t have a way to put all the pieces together consistently and automatically. That’s where a data warehouse comes in.

Three reasons K12 districts need local data warehouses:

Reason #1: We are data rich, but information poor.

Knowing what works in your district shouldn’t be as much of a guessing game as it is. For example, our CTE department asked me to compile a list of students who had completed multiple CTE credits and then compare that to other populations in terms of other academic achievements, such as AP scores and pass rates.  The query to gather the CTE credits per students had to be broken up into school-by-school, year-by-year pieces and even then each query took nearly seven hours to run. We have all of the data needed to explore many questions about program effectiveness, but the data is stored in such a hard-to-get way that most people ignore it.


Reason #2: The SIS system is not built for analysis

As I said previously, the SIS is slooowww. Queries across the district take upwards of a day to run. It has to do with the way the data is stored. The tables in the SIS database aren’t set up to allow the data to be easily analyzed. They’re set up to allow the data to be easily entered. That’s a key difference.  An SIS is an operational system: the focus is updating and recording the data. In a data warehouse, the focus would be on compiling and analyzing the data, which would involve a different data structure. Another way to think of the difference? An SIS deals with one transaction at a time (student marked absent, one set of scores added, a class dropped). A data warehouse would almost never deal with one student at a time. Instead the focus would always be counting absences, grades, or changes for large groups across time.

As Ralph Kimball and Margy Ross (gurus of data warehousing) stated, “Simply speaking, the operational systems are where you put the data in, and the [Data Warehouse] system is where you get the data out.”

K12 districts need a system that allows them to easily get the data out.

#3 There’s no pre-packaged solution.

New start-ups like Mastery Connect, Amplify, and Imagine Learning all promise high-level, fast reporting. To be sure, these applications are better than anything that we’ve had before. Nevertheless, the reports you can pull from these vendor systems are narrow, incomplete, and inflexible. And it’s not their fault. How could they build a system that anticipated the needs of every school district? They simply can’t. They can build least-common-denominator reports and that’s all.

To keep up with evolving reporting and analysis needs, K12 districts need their own data warehouses, built from the ground up, using tools like SQL Server, Cognos, or Pentaho.


Leave a Reply