Tag: Python
-
Automated Data Wrangling 2
As discussed in the original post, we are trying to answer the question, “can we automate the data wrangling process, and more specifically can data and words be associated dynamically through the use of ontologies?” On the surface, this appears to be an extremely complicated problem, and it is. However, NASA has done the same…
-
Automation of Data Wrangling
Ask any data scientists how they spend most of their time and they will tell you “understanding the data, and then cleaning, and organizing that data into a useable format,” or just plain “data wrangling.” Bottom line is that most of the data wrangling problem is caused by the lack of metadata management, or just…
-
HDF Portable File System
While doing some deep learning neural network analysis for image recognition I ran across data that was provided in a file structure called HDF5. It was easy to use, and, as I discovered, the file system worked across many platforms, and was easily accessible using libraries and packages provided in both R and Python. This…
-
Latent Dirichlet Allocation (LDA): Topic Models
Latent Dirichlet allocation (LDA) is an unsupervised learning topic model, similar to k-means clustering, and one of its applications is to discover common themes, or topics, that might occur across a collection of documents. In a nutshell, the distribution of words characterizes a topic, and these latent, or undiscovered topics are represented as random mixtures…
-
Logistic Regression Case Study: The Challenger
As most of us know, Challenger is the name for one of NASA’s space shuttle orbiters that experienced a catastrophic failure on January 28, 1986 resulting in the death of all seven on board: 5 astronauts, and 2 payload specialists. It was determined that the catastrophic failure resulted after all five O-ring seals in its…
-
Graphs, what are they, and can they help us associate Words with Data?
A network is a collection of things, like computers, that interact with one another. This can be a physical network with routers, switches, and hubs, as well as the World Wide Web where documents are linked to one another. While the documents themselves might be interesting, a vast amount of information can be extracted by…
-
Natural Language Processing and Sentiment Analysis
Why is it there is no “Boring” button for posts on Facebook, Twitter, LinkedIn, and most other social media sites? It would make sentiment analysis so much easier. My opinion, it is to maintain some semblance of decorum and civility, as well as sparing people’s feelings. It’s bad enough when you get a 1,000 views,…
-
Predictive Analytics Platform for Overcoming Memory Limitations
IF you’re interested, with emphasis on the IF, I provide an easy to follow step-by-step guide for building a predictive analytics, or data profiling platform utilizing Hadoop, and Spark to overcome some of the memory limitations that come with using a desktop version of R for extremely large data sets. Also, it provides notes that…