What is unstructured information?

Text about the coronavirus outbreak from a web page.
An example of unstructured information – digital text in a web page.

Unstructured information

Unstructured information is “information that either does not have a pre-defined data model or is not organized in a pre-defined manner” (Wikipedia). In practice that is all the information, you as a human can make sense of by reading a text, listening to a podcast, watching a video or an image, or creating a 3D model. By saying it is unstructured we are basically saying that in the file itself there is no information letting you know if it is a person, a piece of equipment or a specific place that is represented. That is why we often decide to use metadata information like labels and attributes to describe what kind of document it is, what kind of picture it is or what the video or audio file is about.

Structured information

The opposite is known as structured information which means that we have a table or field definition saying that the word “Stockholm” refers to a city in a database or spreadsheet. It means that we can know or at least assume that every word in that attribute, field, or column represents a city.

What computer software needs

When we want to create computer software to sort them, automate tasks or visualize a set of data it is often required that information comes in structured form because that is how we can assign them to the particular axis in a graph, locations on a map or do math on the values and get a meaningful result.

It is much more challenging to create software to automate or visualize unstructured data because we need to process that dataset to find the structure in it before we can do clever things with software. Traditionally that has been done manually by labeling parts of a picture or labeling names of cities in texts so they can be visualized or sorted. The rise of new algorithms for machine learning and various semantic technologies means that we now can automatically process texts, images, video, and audio to label information in each file.

Applying structure and then sharing it

The key to managing unstructured information efficiently is to apply structure to them either manually or automatically with software that can recognize words in speech and identify what these words represent. Combining manual and automatic methods can be very effective because it makes the models continuously learn and improve with human input.

At Parsd we believe we can empower analysts with the latest software combined with sharing domain-specific knowledge structures so they can be reused by analysts covering the same domain.

Alexandra Kafka Larsson, Founder and CEO