In this blog post, we’ll walk you through how data science and data engineering are complementary disciplines. We’ll also delineate a third category: data analysis. We’ll explore how both data engineering and data science should be marshaled to make better decisions. Organizations often struggle to strike the right balance between engineering, … [Read more...] about Reframing “Data Engineering vs Data Science”
Data Science
Data Engineering Glossary
If you’re new to data engineering or are a practitioner of a related field, such as data science, or business intelligence, we thought it might be helpful to have a handy list of commonly used terms available for you to get up to speed. This data engineering glossary is by no means exhaustive, but should provide some foundational context and … [Read more...] about Data Engineering Glossary
Clustering Analysis for Market Segmentation: Making It (Actually) Useful
In this blog post, we provide some tips on bridging the gap between clustering analysis and real-life business value. This post will be most useful to machine learning practitioners that want their output to resonate with non-technical audiences (in this author's opinion, this should be a goal for everyone building machine learning … [Read more...] about Clustering Analysis for Market Segmentation: Making It (Actually) Useful
Why the Notebook Interface is Preferred by Data Engineers and Data Scientists
This blog post takes a look at how the popular notebook interface has gotten traction as the go-to front-end for data professionals and data companies alike. Notebooks can make data teams more productive by obscuring cumbersome configurations. How will the notebook evolve to meet the demands of the tech world? DATA PEOPLE LOVE NOTEBOOKS If … [Read more...] about Why the Notebook Interface is Preferred by Data Engineers and Data Scientists
Technical Tutorial: Random Forest Models With Python and Spark ML
In our first code-centric blog post, we provide a step-by-step introduction to Spark’s machine learning library. This post is intended for a more technical audience that has a solid grasp of Python, understands the basics of machine learning, and has an interest in learning about Spark’s machine learning capabilities. OVERVIEW – PREDICTING HOME … [Read more...] about Technical Tutorial: Random Forest Models With Python and Spark ML