Alternative Solutions to this (my) problem?

BrilliantFocus · Nov 3, 2014

Hi guys,

Background: I have 5500 files that did not come with extensions (.jpg, pdf, docx, etc) and I would like to add extensions to the end of the base filenames for ease of use, if nothing else.

For my purposes, I used the file command in Bash in order to determine the output, parsed down the output after cutting and sorting came up with the following list of results.

ASCII text

Audio file with ID3 version 2.3.0, contains

Composite Document File V2 Document, corrupt

Composite Document File V2 Document, Little Endian, Os

Composite Document File V2 Document, Little Endian, Os

Composite Document File V2 Document, No summary info

data

GIF image data, version 89a, 2416 x 877

GIF image data, version 89a, 2416 x 908

GIF image data, version 89a, 2431 x 3168

GIF image data, version 89a, 2435 x 3168

HTML document, ISO-8859 text, with CRLF line terminators

HTML document, UTF-8 Unicode text, with very long lines

ISO Media, Apple QuickTime movie

ISO Media, MPEG v4 system, version 2

JPEG image data, EXIF standard

JPEG image data, EXIF standard 2.21

JPEG image data, EXIF standard, comment

JPEG image data, JFIF standard 1.00

JPEG image data, JFIF standard 1.00, comment

JPEG image data, JFIF standard 1.01

Macromedia Flash Video

Microsoft ASF

Microsoft Excel 2007+

Microsoft PowerPoint 2007+

Microsoft Word 2007+

OpenDocument Spreadsheet

OpenDocument Text

PDF document, version 1.0

PDF document, version 1.2

PDF document, version 1.3

PDF document, version 1.4

PDF document, version 1.5

PDF document, version 1.6

PDF document, version 1.7

PNG image data, 1186 x 383, 8-bit/color RGBA, non-interlaced

PNG image data, 1268 x 1649, 8-bit/color RGBA, non-interlaced

PNG image data, 1275 x 1750, 8-bit/color RGBA, non-interlaced

RAR archive data, v1d, os

Rich Text Format data, version 1, ANSI

Rich Text Format data, version 1, unknown character set

TIFF image data, little-endian

UTF-8 Unicode text

UTF-8 Unicode text, with very long lines

Zip archive data, at least v1.0 to extract

Zip archive data, at least v2.0 to extract

I pasted in the filename into the start of the line like so:

File1 File1: PDF document, version 1.0

And then I used a sed command to modify the documented comment from the file command like so:

sed 's|:||g;s|ASCII text|.txt|g;s|Audio file with ID3 version 2.3.0, contains|.mp3|g;s|Composite Document.*$|.docx|g;s|data||g;s|GIF.*$|.gif|g;s|HTML.*$|.html|g;s|ISO Media, Apple QuickTime movie|.mov|g;s|ISO Media, MPEG.*$|.mpeg|g;s|JPEG.*$|.jpg|g;s|Macromedia Flash Video|.flv|g;s|Microsoft ASF|.asf|g;s|Microsoft Excel 2007+|.xlsx|g;s|Microsoft PowerPoint 2007+|.pptx|g;s|Microsoft Word 2007+|.docx|g;s|OpenDocument Spreadsheet||g;s|OpenDocument Text|.odt|g;s|PDF.*$|.pdf|g;s|PNG.*$|.png|g;s|RAR archive data, v1d, os|.rar|g;s|Rich Text Format.*$|.rtf|g;s|TIFF image data, little-endian|.tiff|g;s|UTF-8 Unicode.*$|.txt|g;s|Zip.*$|.zip|g;s| .|.g' FILENAME

And then ran a while loop in order to change the file names like so:

while read file ; do mv $file ; done < FILENAME

-------------

I understand that I was able to accomplish what I wanted in the end, but there HAS to be a more elegant solution to this problem, right? I have spent hours googling and reading man pages for possible candidates, but have not discovered anything that works the way I would like it to automatically.

Am I missing something? Searching for the wrong keywords? Or does automatic extension addition not appear to be a tool that is commonly-enough needed to be present in common libraries?

Thanks!

drmike · Nov 3, 2014

See nothing wrong with your approach.

Extensions are a human friendly or should be these days.

Most software will open up whatever, read the header info and make determination if it is known filetype. Not adhering to your whatever filename extension.

There is probably some BASH wizard out there or another tool to wrap. But why bother? What you created appears to conceptually work, so run with it.

zzrok · Nov 3, 2014

You are the first person I have ever encountered with this problem. I'm guessing the market for such a tool is pretty small, so one probably hasn't been made.

BrilliantFocus · Nov 3, 2014

Thanks to @drmike and @zzroc for the quick follow-up.

Turns out that I have verification that I am unique (or at least have unique issues), after all

Cheers!

drmike · Nov 3, 2014

Big question is how did this messup with mislabeled / not labeled files originate

?

BrilliantFocus · Nov 4, 2014

I was asked to consult on a e-learning platform instance that hashes all of the file names upon upload automatically. They experienced some "unusual" activity in the past days with the platform and all of the symbolic links from realfilename.ext to the hashes were erased- well, at least, I couldn't find them even though I know what a handful of the files are supposed to be called.

So, I was left with the files in hashed form, but no way to convert them back to their original filename.ext format- what makes matters worse is that their most recent backup was taken over a year ago. I was able to re-acquire the symbolic links for the files older than 1 year ago and convert them with some more bash tooling back into the proper format by removing the hashed name and restoring the documents to their original filename.ext format.

Applying the .ext was requested as part of the RFP, as the users of the system are not very computer-literate and the School wanted to restore as closely to original as possible.

Thanks again!

Alternative Solutions to this (my) problem?

BrilliantFocus

New Member

drmike

100% Tier-1 Gogent

zzrok

New Member

BrilliantFocus

New Member

drmike

100% Tier-1 Gogent

BrilliantFocus

New Member