Skip to main content

Workshop: Media Types and File Sharing

This workshop will cover different types of files and media. We will look at audio, video and image files, and how they are made, learn how to convert between them and how to find them online. We will use the command line to make tiny media libraries, and share them with one another!

Intro: file formats

At the most basic level, information on computers is represented by strings of ones and zeroes. The ones and zeroes are grouped together into 'files' (containers for information), and those files arrive in a 'format'. To differentiate the ones and zeroes that make up an image, from those that make up text, audio or video files, computers need a lot of structuring information. You can think of a 'file format' as a set of standardised rules for how the computer should read and display the information it finds in a file, and how it should structure that information for files that can be read in the same way.

^ an image displayed in 'hex code'

Many of the distinctions between different kinds of media file formats are based on mathematical differences about how information in them is compressed. For example, the difference between the audio formats .mp3, .wav and .ogg, or the image formats .jpg, .png and .bmp are mostly to do with how much space they take up on a computer. There are also political and legal differences -- for example, the differences between the the Microsoft Word Binary File Format (.doc), the Open Document Format (.odt), and the Open Office XML format (.docx) are that the first file format belongs to Microsoft, the second was explicitly intended to be an open standard, and the third became the standard interchange with heavy influence from Microsoft.

^ A protest in Bangalore, India, against the ECMA's adoption of the .docx standard. The banner in the background reads "Liberate Your Documents"

File formats are not naturally occurring phenomena -- there are large bureaucratic bodies dedicated to the design, development and standardisation of different file formats. Historically, there have been huge debates about file formats -- particularly around questions of proprietary file formats and licensing.

Recognising file formats

File names give you core information about the files themselves. The part you are interested in is the end, or extension. Normally the letters and numbers in a file's extension relate to a longer sequence of words that describe the process of formatting that file -- so ".pdf" stands for "portable document format", and ".jpeg" stands for "joint photographic experts group", the name of the organisation who came up with the standard for JPEG formatting.

Plain text file formats

A file is referred to as "plain text" if it can be represented just as characters (letters, numbers, punctuation, symbols), and not its graphical representation.

  • Almost all code files are plain text (.py, .js, .c, .json)
  • The .txt file format is plain text -- so are formats like .md and .html, which are called 'markup'
  • There are some plain text data formats like .csv, which represent information like spreadsheets

Not all files which are about text are 'plain text' files! Word documents (.docx),

Binary file formats

A 'binary' file is any non-text file. To view the file in the way it was intended to be displayed, it would need to be opened in software that is designed to read that kind of file.

For example, to view a .jpg file, one needs to open it in an image viewer, a web browser, or another type of software designed to open images.

However -- all binary files, just like all text files, are still composed of strings of binary data. Unlike a text file, the characters that make up a binary file rarely form a good representation of the file's contents. This doesn't mean we can't still look at them!

Activity 1: Glitch a JPG

For the first activity, we are going to use a text editor. On a mac, the best text editor to use is TextEdit, and on Windows, is Notepad. Both of these should come installed by default.

Note: some more advanced text editors like Visual Studio Code will display images as images rather than as their text. This is probably useful if you're using images in a code project (like a website!) but not at all helpful for us today.

Step 1: Find a JPG file on your computer. If you can't find one, then feel free to use this example image of a Pallas' Cat from the Wikipedia page Pallas' Cat. Right click to save the file -- you can save it anywhere, but the easiest place to find it is your Downloads folder.

Step 2: Get some info about the file! One really helpful thing a computer can do is tell you loads of information about the files. On a Mac, the way to do this is to right click the file and select Get Info. On Windows, right click and select Properties. Look for a file that's between 800kB and 1.5MB. This will work with other files, but files in this range tend to work best!

Step 3: We will now open this file using a text editor. First of all, make a copy of the file if you'd like to keep it! Then, to open the copy:

On Mac:
This time, right click the file again and select open with, scroll to the end and click 'other'

When the selection of applications opens, navigate to the bottom and change the drop-down to All applications

In the list of applications, scroll down until you reach Textedit, and click to open the file. If it's greyed out, make sure you've completed the previous step.

You should see a text file full of characters appear!

On Windows
Right click on the file and choose "Open with" > "Choose another app". Scroll down and select Notepad. You should see a text file full of characters appear!

Step 4: Scroll through the file to get a sense of the shape of it. Avoid editing characters in the first section -- this often contains information about the file, rather than the contents, and can make the file no longer able to open.

Once you've scrolled down a way, try deleting a couple of characters -- save the text file, and see if any changes happen to the image. You will notice that information about the image tends to be organised in left-to-right rows -- edits earlier in the file will affect higher up rows. Here's a set of consecutive edits made to a JPEG as an example:

Demo 1: find and copy

This session will build on the command line commands from last weeks' session. If these are unfamiliar or you need more practice, a cheatsheet is available at the end of this page, and the full session is linked here.

1. find

The first command we will look at is the find command. Like the name suggests, this is used for finding things, and can be a useful and precise search tool.

Let's open up a command line and try and use it. First of all, I am going to navigate to the desktop using cd:

Next, I want to try and find a file. I'm looking for an image called "shrimp.jpg" -- I can use the find tool to search for it. Notice that it found it even though it's not in the same folder! That's because find searches subfolders of the folder you are in.

What if we just want to find jpg files? We can do that too, using the wildcard symbol *. * often stands in in computing for "anything you like". Here we use it to say, we want all the things that end in .jpg:

You'll notice that I had to specify I wanted to find by name! That's because find can be used to find all sorts of things. We can also try finding things that are larger than a certain size! Here's the command to list files bigger than 100MB -- notice the list is a lot shorter, and just includes things like screen recordings and photoshop files!

There's a good guide here to a list of common use cases for the command.

2. copy

The second command we're going to look at is the copy command. This is used for making a copy of a file in one location, and storing the duplicate in another. This can be really useful if we don't want to touch the original files!

Say I want to make a copy of some of the jpeg files from before in a new folder. First of all, let's list them! Maybe I only want ones less than 300kB in size:

The copy command takes the form:
cp /path/to/a.txt /other/path

This means -- copy the file a.txt into the new location /other/path. In this case, I'm going to make a folder on my Desktop called lores-jpeg where I'm going to keep all these images.

In order to make sure the file paths are preserved, I'm going to wrap all the paths in quotation marks, or use the tab key:

Note that without much extra effort, we could make a tool that would do this for all these tools automatically. In this instance, we can use a bash script!

This is just for demo purposes to show you how these things can become useful: it's not within the scope of today's workshop to show you this in detail!

find . -name "*.jpg" -size -300k|while read i; do
  cp "$i" lores-jpeg/
done

this code runs the find command, reads it in line by line, and then runs the copy command on each line in turn!

Activity 2: Files Treasure Hunt

Using the find command, try and find at least one of each of the following types of file on your computer. Use the cp command to copy each of these files into one folder, with all the types in it. Don't use mv, because that will remove the files from their original location!

  • .txt
  • .docx
  • .png
  • .jpg
  • .gif
  • .pdf
  • .epub
  • .mp4
  • .mp3
  • .txt

Example for finding jpg files:

$ mkdir my-file-storage
$ cd my-file-storage
$ find ~/Desktop -name *.jpg
$ cp ~/a/b.jpg ./

Challenge: rare format try and find as many file types as you can on your computer that don't appear on this list! See Wikipedia for a full list of file formats

Demo 2: move

The next command we will look at is mv, or move. One way to think about move is like copy, but the file is removed from its original location. We want to be a little more careful with this command, as we need to make sure we're still able to find the files that we are moving.

Like copy, move takes the following form:

mv path/to/file.txt new/path

This will move the file "file.txt" into the folder "new/path"

Let's try it on our jpgs -- I made a new folder called 'shrimps' and moved 'shrimp.jpg' inside of it.

A really common use of the move command is renaming things. Say we want to rename this image, which currently has a name (pic040-medium.jpg) to something more catchy.

I'm going to rename it like this:

Activity 3: Gallery/Mixtape

For the third activity, we are going to use the find, mv and cp commands to create a mixtape or gallery for someone. This can be anyone you like, they don't need to be in class! Your mixtape or gallery (or anthology if you like) should be a collection of files united by a theme, and arranged and named with some kind of intention. It's totally up to you what that is!

To do this, we will need to find some files. There are lots of places on the internet to find different kinds of media -- here are a few sources I like. Some are very broad, others are quite specific!

Writing a README

An important part of sharing files with other people is explaining what the files are for, what they contain, and how to read, use and open them. This is considered good practice in a lot of disciplines, but is particularly important when writing code, as the interaction of files can be complex. Within software development, there's a convention to title these files 'README' -- which functions as a command.

README files are always written in a plaintext format, so no special software is required to read them. Normally this is either a .txt ("text file") or .md ("markdown file").

Task: Write a short README file in txt format for your mixtape / gallery. Instruct the listener / viewer what software they should open the files in, and feel free to include artistic notes, thoughts, etc.

Tools and Further Reading

There's a nice list of common file formats here.

Reading

File Browser Commands

Show hidden files:

  • on a Mac: ⌘ + shift + .
  • on Windows: "View" > "Hidden Items"

Show path bar:

  • on a Mac: "View" > "Show Path Bar"
  • on Windows: shown by default

List of Command Line Commands

In this session, we are going to use the following commands:

command name meaning
mv a.txt b/ "move" move the file "a.txt" into the folder "b"
mv a/ b/ "move" rename the folder called "a" to "b"
mv a.txt b.txt "move" rename the file "a.txt" to "b.txt"
cp a.txt b/ "copy" copy the file "a.txt" into the folder "b"
cp "~/Downloads/c.mp3" ./ "copy" copy the file "c.mp3" from "Downloads" into the current folder
find . -name "*.txt" "find" find all files ending in .txt inside the current folder

Commands we learned last week:

command name meaning
pwd "print working directory" print out a path to my current location
ls "list" list the files and folders in the current location
cd "change directory" move to a different folder
cd .. "change directory" move UP one folder
cd ~ "change directory" navigate to the Home directory
touch file.txt "touch" make a new, empty file called file.txt
cat file.txt "concatenate" print out the contents of a file called file.txt
mkdir my-folder "make directory" make a new folder called 'my-folder'
echo "some text" "echo" write out the text "some text"
echo "some text" > file.txt "write" overwrite the contents of file.txt with "some text"
echo "some text" >> file.txt "append" add the text "some text" to the end of file.txt

Neat tricks:

  • Type cd (cd and then a space), and then drag and drop the folder you want to navigate to onto the command line. Press enter and it will take you there
  • Use the Tab key to autocomplete folder and file paths
  • Use the up and down arrow keys to find and select commands you already ran
  • on a Mac: hold down the 'option' key to place your cursor anywhere on the command line

Watch out for:

  • make sure you use spaces to separate between a command, and the thing you are using it on. For example, cd.. won't work, but cd .. will!
  • if you get a message that says 'command not found', double check that it's spelt right and that you used spaces in the correct place!
  • for the same reason that spaces are used to separate out commands, they will cause issues if you use them in the names of files and folders! The best way to do this is to avoid it completely -- use a dash or underscore instead!