Category: Data Modeling
-
Automation of Data Wrangling
Ask any data scientists how they spend most of their time and they will tell you “understanding the data, and then cleaning, and organizing that data into a useable format,” or just plain “data wrangling.” Bottom line is that most of the data wrangling problem is caused by the lack of metadata management, or just…
-
Data Modeling in the Big Data Era: HDFS (Part 2)
From discussions that spun off from the last article on “Data Modeling in the Big Data Era,” it became apparent that a discussion of the Hadoop Distributed File System (HDFS) was warranted as this is basically the physical implementation of any Hive, or Impala model, and design considerations here also impact a few security concerns.…
-
Data Modeling in the Big Data Era
Over the years of evolving technologies, data modeling has become less and less important as a fundamental skill set. It’s impossible to say how many times the phrase has been uttered: “we no longer need data models because now we have _____________ (fill in the blank)”. Everything from a data lake, to NoSQL database, to…
-
Anscombe Quartet
Anscombe’s quartet actually has nothing to do with music, but when I hear the word quartet I associate it with music. However, this particular quartet refers to four datasets with very similar descriptive statistics. When these data are plotted you will see that they are obviously very different data sets. The idea was developed by…