The Democratic National Committee Ups Its Data Game for Election 2020
November 3 is here with anticipation and trepidation in not only the US, but throughout the world. The proverbial elephant and donkey have been battling it out for the hearts, minds and votes of the American people on this day for what seems like years. If you believe, as I do, that data changes everything, that surely must apply here.
While complicated, some aspects of the previous Democratic campaign may have been subpar due to a lack of, or erroneous, data. Famously, the Democratic National Committee (DNC) castigated the data operation it inherited that cycle as “crumbling”, “bankrupt”, and “crashing”.
Getting access to voter files and campaign activity proved challenging at best. Without this data available whenever a staffer wanted it, results would be subpar. When every vote mattered that year, it could easily be said this was a difference maker. No doubt the usage demands would have grown in time, as well as the sophistication of that usage. What had worked so well for Obama just 4 years prior now looked dated.
Technology selection was partly blamed.
Data professionals know about obsolescence. Data platforms have a lifespan. Data platforms demand strong maintenance and occasional augmentation and replacement. Today that is manifest as a move to cloud, new approaches to manage the higher levels of data being ingest from new sources and at new levels of granularity, and the need for all data to be under management to support precision in artificial intelligence.
What we do know about the DNC is that in 2018-9, they moved to the Google BigQuery cloud data warehouse. This move was obviously accompanied by a move to the cloud, a move that would be giving the DNC relief from maintenance demands and access to limitless resources. A move like this was necessary.
The cloud offers opportunities to differentiate and innovate at a much more rapid pace than ever before. Further, the cloud has been a disruptive technology, as cloud storage enables rapid server deployment, and offers scalability when compared with on-premise deployments.
Google BigQuery has a distinctive approach to cloud analytic databases, with an ecosystem of products for data ingestion and manipulation.
The back end is abstracted. BigQuery acts as a front end to all the Google Cloud storage needed, with all data replicated geographically and Google managing where queries execute. The customer can choose the jurisdictions of their storage according to their safe harbor and cross-border restrictions, although this would not apply to domestic use like the DNC would have.
There are some query language limitations that can be worked around, such as a time delay to modify data loaded via streaming and some concurrent update limitations. Additionally, extract of data only occurs within Google Cloud, causing additional steps should mass amounts of BigQuery be needed elsewhere.
BigQuery is a hands-off database without performance artifacts to build and manage. Defragmentation and system tuning are not required. DNC Data Scientists will see it as a plus that a Database Administrator is not required, while others would like some knobs to turn for performance should they see something the database does not, which is very possible.
It is truly serverless. Google Cloud manages the servers in a fully hands-off manner to the customer, dynamically allocating storage and compute resources. The customer does not define nodes and capacity of the BigQuery instance. The provisioning of compute resources is particularly fast and seamless.
The ecosystem consists of seamless data preparation and data visualization tools and alternative storage formats suited for larger amounts of still important, but less active, data, and a growing library of automated pipelines of data and artificial intelligence pipelines. Included batch ingest is a growing way many are managing data loading.
Google Marketing Platform data (including the former DoubleClick), Salesforce.com, AccuWeather, Dow Jones, and 70+ other public data sets out there can be included in the BigQuery dataset, automating parts of the pipeline and eliminating complex data ingest for this data. Additionally, the out-of-the-box geospatial analytics are impressive with a new GEOGRAPHY data type, new geography functions, and BigQuery GeoViz. This opens up possibilities for the DNC to connect locally.
The world of data is rapidly changing. Data is the prime foundational component of any meaningful initiative. Managing and evaluating this prime asset is ongoing continually in competitive situations such as what the DNC finds itself in today.
Google BigQuery has a distinctive approach that the DNC hopes will lead to victory and change this month.