Retrieval-based Deep Learning with TensorFlow v1.0+ and Python

In this post I will cover code from a Github repo that I forked (detailed in this post) that trains a machine learning model based on IRC chat logs (the Ubuntu Dialog Corpus) to select the correct response out of a set of potential responses, given a context. The code was created last year with

Cleaning Transcript Data with Python

Performing an analysis of text data or using text data to train machine learning models oftentimes requires a lot of data. Usually people look to Wikipedia for large amounts of text data, but occasionally scholars will make use of less traditional sources of data, like movie reviews for performing sentiment analysis on sentences or Ubuntu IRC chat

How-to: Scrape Data with Python’s BeautifulSoup

In this post I'll show how to scrape semi-structured data from a target webpage with Python's BeautifulSoup module. BeautifulSoup is indeed beautiful. It is the go-to package for scraping data and working with HTML. We'll also use requests to grab the HTML from the target URL. The page I use in my example should

How to setup IPython for Python 2.7 and 3+ kernels

IPython is an interactive notebook that is accessed from your browser. It is extremely useful because it is designed with code sharing in mind, supports up to 49 different languages/versions of languages, and has "cells", or blocks of code that are interpreted one at a time (or all at once). Another reason to use IPython

How to Scrape Data from Webpages with Python’s Scrapy

In this post I'll show how to gather unstructured information that exists on webpages using Python's open source web crawling framework, Scrapy. Web crawlers have been around since the conception of the internet, in fact Google started out by visiting links from Stanford's homepage until all 10 million of them had been explored. In the

Statistical Programming with R and Python

R and Python are two popular languages for those who want to do data analysis. In this post, I will cover some libraries, packages and resources that will help you quickly learn how to become proficient with these statistical and scripting languages. This post is intended for the beginner-intermediate level, though you may find some