Notes on Data Mining and Text Mining

Red separator bar

Well, now we are touching on some real, complicated digital tools. These are both pretty new fields of investigation, and historians have only done a very little with these tools. Let's start with some definitions.

The Wikipedia entry on data mining is always a good place to start, and so I've come up with a simplified definition, the "process of discovering patterns in large sets of numerical data." Going a bit further, we might say, "the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data." The wiki entry has a great list of tools (both paid and free) and references. Also take a look at, Doug Alexander, Data Mining. From that, you might conclude that this does hand-in-hand with data visualization, and you would be right.

Text mining is similar to data mining except the source being analyzed is text and not numerical data. The computer analysis helps to "find relationships and patterns in set of textual data." The Wikipedia entry on text mining is good, and there is a ten-year-old definition that is still useful, Marti Hearst, 2003, What is text Mining?

Both of these, data and text mining, require some real, high-level software algorithms that are beyond what we can do in this course.

Beyond, those simple definitions, go next to

For data mining, you must start with a data set. Now we have already covered some in the unit on data visualizations, but I also found:

Also to explore: