Data is core to any business’s digital journey; we have all known that for some time. What is more pressing is how data professionals organise themselves to keep that data journey moving forward efficiently, especially when the volume of information keeps increasing.
With this growth, the mission becomes how to transform this data into trustworthy metrics that the business can confidently use to guide decisions.
In efforts towards this mission, analytics practitioners like myself have been battling with an ever-expanding toolkit in recent years. The number of ETL (Extract, Transform and Load) tools and modelling software on the market has grown colossally and is rapidly evolving. However, tooling aside, you cannot build a high-performant analytic team without mixing in other key ingredients like effective ways of working and a strong team dynamic. Below we unpick some of these constituent parts of high performing analytics teams.
Ways of working: Treating analytics output like software engineering
Like the main code base of any application, business analytics code is also highly collaborative. Here at OnBuy we have developed a first party product experimentation pipeline, customer lifetime value models and machine learning pipelines for auto categorisation of products. All of these had significant input from domain experts across the organisation and have seen several iterations. Making sure projects like the former are versioned, documented and well tested enhances how we contribute analytical assets to the company’s objective. Therefore, these assets should be treated just like our counterparts in software engineering.
Flexible tooling: Leveraging on-demand analytics
In the information revolution, we can now amass the power of cloud data storage solutions like Google Big Query, Amazon Redshift, and Snowflake. These are often cheap and scalable allowing us to harness large amounts of compute with small overheads. On demand cloud data products are fast becoming the most valuable players of the modern data stack.
With this highly scalable storage, landing data in its rawest format from third party applications is now commonplace. Open-source self-hosted tools like AirByte give us immediate access to a catalogue of source connectors from the third parties we work with. The challenge then shifts to ‘how do we create models that make sense of all this raw data by joining, deduplicating, filtering and enriching it for downstream modelling of our business metrics’?
Focusing on the T in ETL
Previously the analytics workflow has been somewhat broken. Working in isolation analysts can often build up knowledge silos and more often than would like to repeat analyses others have already completed. The nuances of datasets must be re-learned and therefore it hits the organisation as insight generation is slowed.
DBT Labs Inc, previously FishTown Analytics from Philadelphia, have set out to support us analysts with these issues, and they have done just that. Armed with familiar structured querying language (SQL) skills, analysts can now contribute to business modelling without needing low level programming knowledge. DBT coupled with continuous integration tools means we can automate away a lot of previously mentioned problems easily improving the shareability of work and quality of dataset documentation.
Being available as both an open-source command line service and cloud hosted version, DBT is now used by more than 9,000 companies worldwide and has built a Slack community of over 36,000 data professionals. This high adoption rate clearly shows the gap in the analytics workflow that the team have set to plug.
Team dynamic: Growing new subdisciplines
DBT has helped fuel the rise of a valuable new sub-discipline in the data industry, ‘analytics engineering’. Bridging the gap between data engineers and data scientists, analytics engineers focus not only on moving data into accessible formats but combining raw sources into clean data products for downstream consumers. This subdiscipline boosts team cooperation as it allows analysts to work on the core problems without worrying about having to navigate a minefield of dirty datasets.
Putting it all together
At OnBuy, we are constantly evolving, and our data platform journey is certainly evident in this. We have embraced cloud data storage solutions and embedded automation tools like Data Build Tool into our architecture. This has enabled us analysts across the business to enrich our application data and quickly provide value. As the platform matures, we look to weave the output from our analytics platform back into our marketplace to help manage our relationships with thousands of retailers and maximise ecommerce performance of our growing 35 million product listings.