Category: Data Science
-
Automated Data Wrangling 2
As discussed in the original post, we are trying to answer the question, “can we automate the data wrangling process, and more specifically can data and words be associated dynamically through the use of ontologies?” On the surface, this appears to be an extremely complicated problem, and it is. However, NASA has done the same…
-
The Next Giant Leap in Data Management: Automated Integration
This article is short and to the point and proposes an answer for one question; what is the next giant leap in analytics? I have argued and debated endlessly for the last 20 years on the necessity of data models. These models provide the structures for enforcing data integrity, and result in critical artifact byproducts.…
-
Logistic Regression Case Study: The Challenger
As most of us know, Challenger is the name for one of NASA’s space shuttle orbiters that experienced a catastrophic failure on January 28, 1986 resulting in the death of all seven on board: 5 astronauts, and 2 payload specialists. It was determined that the catastrophic failure resulted after all five O-ring seals in its…
-
Graphs, what are they, and can they help us associate Words with Data?
A network is a collection of things, like computers, that interact with one another. This can be a physical network with routers, switches, and hubs, as well as the World Wide Web where documents are linked to one another. While the documents themselves might be interesting, a vast amount of information can be extracted by…
-
Logistic Regression: Modeling an Expert
MIT, Johns Hopkins, Stanford, Harvard, and several other prominent universities provide training through Massive Open Online Course (MOOC) sites. If you haven’t tried one yet, you need to check them out. The courses are taught by some brilliant professors, and provide excellent training tools for employees, as well as sources of well documented methodologies for…
-
Anscombe Quartet
Anscombe’s quartet actually has nothing to do with music, but when I hear the word quartet I associate it with music. However, this particular quartet refers to four datasets with very similar descriptive statistics. When these data are plotted you will see that they are obviously very different data sets. The idea was developed by…
-
Predictive Analytics Platform for Overcoming Memory Limitations
IF you’re interested, with emphasis on the IF, I provide an easy to follow step-by-step guide for building a predictive analytics, or data profiling platform utilizing Hadoop, and Spark to overcome some of the memory limitations that come with using a desktop version of R for extremely large data sets. Also, it provides notes that…