Review



image

Summary

From the README:

✔ An open-source, CLI tool and Python library for data quality testing ✔ Compatible with the Soda Checks Language (SodaCL) ✔ Enables data quality testing both in and out of your data pipelines and development workflows ✔ Integrated to allow a Soda scan in a data pipeline, or programmatic scans on a time-based schedule

Soda Core is a free, open-source, command-line tool and Python library that enables you to use the Soda Checks Language to turn user-defined input into aggregated SQL queries.

When it runs a scan on a dataset, Soda Core executes the checks to find invalid, missing, or unexpected data. When your Soda Checks fail, they surface the data that you defined as bad-quality.

Pros

Cons

In my haste to get something working, I went to soda core docs, which then led me to a quickstart, which included a cloud-account creation step, which I avoided. I then misinterpreted an error meaning that the quickstart would not work without a cloud account. That’s not the case, though! You will still have all the soda-core functionality, but you will receive errors for any of the extended functionality that soda-library provides. So use soda-core all you want, and then when you’re ready to use any of the soda-library features, like time-windowed features, you’ll need to set up an account. Free accounts currently only last for 45 days after which you’ll need to sign up.

When I raised this with the team, the bizdev and pm both reached out to clarify- super helpful, proactive, and informative.

Response on GitHub was also quick and helpful. I created a postgres quickstart that avoids the cloud account creation issue and runs all in one script. It’s available at https://github.com/sodadata/soda-core/blob/main/examples/postgres_example.md.

Finally, for an issue I’d called out, they clearly have an internal JIRA tracking issues and ensuring bugs are followed up on.

All in all, this is a helpful tool and I’m going to stay tuned to soda. Soda excels is in customer support and engagement- they clearly want to build a winning product and I’ve observed motivation across multiple teams. That will lead to long-term success and reliable support for enterprise customers.