Tag: Data Science
-
The Next Giant Leap in Data Management: Automated Integration
This article is short and to the point and proposes an answer for one question; what is the next giant leap in analytics? I have argued and debated endlessly for the last 20 years on the necessity of data models. These models provide the structures for enforcing data integrity, and result in critical artifact byproducts.…
-
Anscombe Quartet
Anscombe’s quartet actually has nothing to do with music, but when I hear the word quartet I associate it with music. However, this particular quartet refers to four datasets with very similar descriptive statistics. When these data are plotted you will see that they are obviously very different data sets. The idea was developed by…
-
Machine Readable Ontologies
Ontologies have been the topic of several posts recently, and reading through them again, I realize that someone unfamiliar with ontologies will most likely still find the subject somewhat elusive, and, for them, having a high level understanding of ontologies is enough. For those people, hopefully, it is interesting how the ontology was used to…
-
Natural Language Processing and Sentiment Analysis
Why is it there is no “Boring” button for posts on Facebook, Twitter, LinkedIn, and most other social media sites? It would make sentiment analysis so much easier. My opinion, it is to maintain some semblance of decorum and civility, as well as sparing people’s feelings. It’s bad enough when you get a 1,000 views,…
-
NLP – Resume Analysis in R
Natural Language Processing (NLP) is a field of study involving the interaction between computers and the human spoken and written languages, which implies both understanding and communicating. This can become very complicated as you might have already guessed, and many people have simplified it to the point of being about data driven word clouds, or…
-
Predictive Analytics Platform for Overcoming Memory Limitations
IF you’re interested, with emphasis on the IF, I provide an easy to follow step-by-step guide for building a predictive analytics, or data profiling platform utilizing Hadoop, and Spark to overcome some of the memory limitations that come with using a desktop version of R for extremely large data sets. Also, it provides notes that…