This lesson is still being designed and assembled (Pre-Alpha version)

Introduction

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • What is text mining?

Objectives
  • Gain a basic understanding what text mining is.

  • Learn some text and data mining terminology.

Introduction

Welcome to this hands-on lesson to learn some text and data mining skills. We will first run through some of the basics that you will need when exploring and analysing text.

What is Text Mining?

FIXME

Terminology

To start with here is a bit of basic terminology that will be used in this lesson:

Token: a single word, letter, number or punctuation mark.

String: a group of characters comprised of words, letters, numbers, punctuation.

Integer: a positive or negative whole number without a decimal point.

Stop words: generally the most common words in a language (e.g. “the”, “of”, “and” etc.) which are sometimes filtered out during text analysis in order to focus on the vocabulary that conveys more of the content of a piece of text.

Document: a single file containing some text.

Corpus: a collection of documents.

Questions:

  • How many people have used text mining for their work before?
  • Who wants to use it in future?
  • What types of text analyses would you want to do?

Key Points

  • Text mining refers to different methods used for analysing text automatically.