OK, so there's always the programmatic approach. And for that I turn to Python. Let's see what the state of the art holds for us. (Hint: It's a bumpy ride.)
First, a couple of constraints. I develop on a Windows 10 machine, largely because that's the same computer all my other goodies are on. Yes, LINUX might be better for development, but not for desktop use. (Cue old debate.)
Second, because I am indeed living in this century, I prefer to use Python 3, the version that broke backwards compatibility. It's been around for 8 years, so is not exactly new.
What is metadata?
Metadata is simply a list of strings stored in an image file. These strings are carried along with the image, and can identify the author, camera characteristics, copyright information, and so on. There are two main metadata standards.
Exif, the Exchangeable image file format, has been around since 1995. It works with media such as WAV sound files, TIFF images, and JPG.
IPTC, defined by the International Press Telecommunications Council, is designed to standardise data for news gathering and journalism. There are two main parts, IPTC Core and IPTC Extension.
The remainder of this article will investigate methods of reading this data, in Python.
Take a PIL
When we think of images and Python we think first of the Python Imaging Library (PIL). Or, rather, it's more current fork, named Pillow.
You can install this useful library using the simple mechanism of typing at the command line:
pip install PillowThis works across platforms.
In fact, if pip fails, I usually give up right away. Not because there aren't other ways to install. But if pip fails, it is a good indication that the library is not well maintained. As we shall see.
In any case, here's my test code. It relies on the fact I have defined a path to a good test file.
fn = 'path/to/some/file/tester.jpg' def test_PIL(): # test PIL from PIL import Image from PIL.ExifTags import TAGS print( '\n<< Test of PIL >> \n' ) img = Image.open(fn) info = img._getexif() for k, v in info.items(): nice = TAGS.get(k, k) print( '%s (%s) = %s' % (nice, k, v) )
Interrogating the image for Exif information returns a dictionary. We can iterate over this to see all the meta-tags. In this case a useful TAGS dictionary converts the numeric keys to English equivalents. So, instead of wondering what tag 315 means, we know that it is "artist".
Unfortunately, with my test data I noticed problems. (My programme output is at the bottom of this post, for convenience.) First, the "copyright" field contained scrambled text. Second, the "comment" field did not show up at all. This could perhaps be because Pillow reports only Exif and not IPTC. In any case, it is insufficient and unreliable.
Some dead ends
At this point I did a web search and came up with several likely candidates. But they soon proved frustrating.
The library pyexiv2 is deprecated in favour of GExiv2, part of the Gnome set and hence without a Windows installer nor any way to easily compile.
IPTCInfo is recommended in certain blog articles, like this one, already out-dated, though only four years old.
The automatic install for IPTCInfo failed, so I checked and discovered that the last code update was back in 2011. As a single module, it was easy enough to install manually. But then I discovered that it was not at all Python 3 compatible. My attempts to change the code manually ended in failure.
A piece of the Piexif
Piexif has been tested across platforms and has no dependencies. The documentation is a bit terse, but helpfully indicates that the main "load" function returns several dictionaries, plus a byte dump that forms a thumbnail. I wrote my code to avoid this.
def test_piexif(): # test Piexif import piexif print( '\n<< Test of Piexif >>' ) data = piexif.load(fn) for key in ['Exif', '0th', '1st', 'GPS', 'Interop']: subdata = data[key] print( '\n%s:' % key ) for k, v in subdata.items(): print( '%s = %s' % (k, v) )
I really don't know what "0th" and "1st" mean as dictionary names, but it does appear that I get out all of the meta tags I expect. In particular, the tag marked 37510 contains my comment.
Like PIL, this library has a dictionary to map the obscure codes to names. I thought I should interrogate this.
def test_piexif_inspect(): # display all metadata names import piexif print( '\n<< Inspect piexif >>\n' ) info = piexif.ImageIFD.__dict__ l = ['%s = %s' % (v, k) for k, v in info.items()] l.sort() for item in l: print(item)
The result is missing a mapping for tag 37510, the very one I want to use!
OK, not such a big deal in this case. But what if I start using other tags and have to decipher the codes manually? Rather annoying.
You will also notice an odd encoding problem. Rather than contain my comment as is, the tag reads...
b'ASCII\x00\x00\x00MY TEST COMMENT!'The b marks the string as binary, which is some odd Python 2 designation. The smart thing to do is decode this to a proper code page, but then we have the prefix cruft.
The following will do the trick, but I am again disliking the arbitrary nature of this decoding.
def test_piexif_use(): import piexif print( '\n<< Usage of piexif >>' ) data = piexif.load(fn) exif = data['Exif'] comment = exif.get(37510, '').decode('UTF-8') comment = comment[8:] print( comment )
Try exifread
Finally, I stumbled upon the library exifread.
Here again is my test script. As before, I skip past some tags that are going to be long boring byte strings. And I progress in sorted order, just for convenience.
def test_exifread(): import exifread print( '\n<< Test of exifread >>\n' ) with open(fn, 'rb') as f: exif = exifread.process_file(f) for k in sorted(exif.keys()): if k not in ['JPEGThumbnail', 'TIFFThumbnail', 'Filename', 'EXIF MakerNote']: print( '%s = %s' % (k, exif[k]) )
The result? All of the tags I expect are present, in human-readable encoding. It seems that this obscure project is the winner. Some of the more popular libraries need to do some catching up!
Though, one big limitation exists even here. This library does not support editing the tags. For that, you will need to use one of the previous choices and work around the cruft.
Nonetheless, I hope this article saves you the time I unfortunately spent.
Output
Here follows my test output, for reference:
<< Test of PIL >> ExifVersion (36864) = b'0230' ShutterSpeedValue (37377) = (9965784, 1000000) ExifImageWidth (40962) = 600 DateTimeOriginal (36867) = 2011:06:09 01:20:59 DateTimeDigitized (36868) = 2011:06:09 01:20:59 MaxApertureValue (37381) = (0, 256) SceneCaptureType (41990) = 0 MeteringMode (37383) = 5 LightSource (37384) = 0 Flash (37385) = 24 FocalLength (37386) = (77, 1) CFAPattern (41730) = b'\x02\x00\x02\x00\x00\x01\x01\x02' Make (271) = OLYMPUS IMAGING CORP. Model (272) = E-P1 Orientation (274) = 1 ExifImageHeight (40963) = 600 Contrast (41992) = 0 Copyright (33432) = Robin Parmar mar ExposureBiasValue (37380) = (-3, 10) XResolution (282) = (720000, 10000) YResolution (283) = (720000, 10000) ExposureTime (33434) = (1, 1000) DigitalZoomRatio (41988) = (100, 100) FocalLengthIn35mmFilm (41989) = 116 ExposureProgram (34850) = 3 ColorSpace (40961) = 65535 BodySerialNumber (42033) = H52502123 ResolutionUnit (296) = 2 WhiteBalance (41987) = 0 GainControl (41991) = 1 Software (305) = Adobe Photoshop CS5 Windows DateTime (306) = 2011:08:22 21:39:05 LensMake (42035) = Pentax LensModel (42036) = smc Pentax F A77 Limited Saturation (41993) = 0 Artist (315) = Robin Parmar Sharpness (41994) = 0 FileSource (41728) = b'\x03' CustomRendered (41985) = 0 ExposureMode (41986) = 1 ExifOffset (34665) = 268 ISOSpeedRatings (34855) = 200 << Test of Piexif >> Exif: 36864 = b'0230' 37377 = (9965784, 1000000) 40962 = 600 36867 = b'2011:06:09 01:20:59' 36868 = b'2011:06:09 01:20:59' 37381 = (0, 256) 37510 = b'ASCII\x00\x00\x00MY TEST COMMENT!' 37383 = 5 37384 = 0 37385 = 24 37386 = (77, 1) 41988 = (100, 100) 41986 = 1 40963 = 600 37380 = (-3, 10) 41730 = b'\x02\x00\x02\x00\x00\x01\x01\x02' 33434 = (1, 1000) 41728 = b'\x03' 41989 = 116 34850 = 3 42033 = b'H52502123' 40961 = 65535 41990 = 0 34855 = 200 41987 = 0 41991 = 1 41992 = 0 42035 = b'Pentax' 42036 = b'smc Pentax F A77 Limited' 41993 = 0 41994 = 0 41985 = 0 0th: 283 = (720000, 10000) 296 = 2 34665 = 11444 306 = b'2011:08:22 21:39:05' 270 = b'' 271 = b'OLYMPUS IMAGING CORP.' 272 = b'E-P1' 305 = b'Adobe Photoshop CS5 Windows' 274 = 1 33432 = b'Robin Parmar' 282 = (720000, 10000) 315 = b'Robin Parmar' 1st: 513 = 878 514 = 10416 259 = 6 296 = 2 282 = (72, 1) 283 = (72, 1) GPS: Interop: << Test of exifread >> EXIF BodySerialNumber = H52502123 EXIF CVAPattern = [2, 0, 2, 0, 0, 1, 1, 2] EXIF ColorSpace = Uncalibrated EXIF Contrast = Normal EXIF CustomRendered = Normal EXIF DateTimeDigitized = 2011:06:09 01:20:59 EXIF DateTimeOriginal = 2011:06:09 01:20:59 EXIF DigitalZoomRatio = 1 EXIF ExifImageLength = 600 EXIF ExifImageWidth = 600 EXIF ExifVersion = 0230 EXIF ExposureBiasValue = -3/10 EXIF ExposureMode = Manual Exposure EXIF ExposureProgram = Aperture Priority EXIF ExposureTime = 1/1000 EXIF FileSource = Digital Camera EXIF Flash = Flash did not fire, auto mode EXIF FocalLength = 77 EXIF FocalLengthIn35mmFilm = 116 EXIF GainControl = Low gain up EXIF ISOSpeedRatings = 200 EXIF LensMake = Pentax EXIF LensModel = smc Pentax F A77 Limited EXIF LightSource = Unknown EXIF MaxApertureValue = 0 EXIF MeteringMode = Pattern EXIF Saturation = Normal EXIF SceneCaptureType = Standard EXIF Sharpness = Normal EXIF ShutterSpeedValue = 1245723/125000 EXIF UserComment = MY TEST COMMENT! EXIF WhiteBalance = Auto Image Artist = Robin Parmar Image Copyright = Robin Parmar Image DateTime = 2011:08:22 21:39:05 Image ExifOffset = 11444 Image ImageDescription = Image Make = OLYMPUS IMAGING CORP. Image Model = E-P1 Image Orientation = Horizontal (normal) Image ResolutionUnit = Pixels/Inch Image Software = Adobe Photoshop CS5 Windows Image XResolution = 72 Image YResolution = 72 Thumbnail Compression = JPEG (old-style) Thumbnail JPEGInterchangeFormat = 878 Thumbnail JPEGInterchangeFormatLength = 10416 Thumbnail ResolutionUnit = Pixels/Inch Thumbnail XResolution = 72 Thumbnail YResolution = 72
Thanks for writing this! When I needed this I relied on ImageMagick's "identify" command-line tool - I call it with subprocess from Python and parse its output. It's definitely surprising how painful basic image information is with Python in the modern era.
ReplyDeleteAh yes, a good method as well. I was going to add an example like that using exiftools, the Perl library.
ReplyDeleteMy next article will be quite positive. I think that the ecosystem around Python leaves me rather spoiled, so when I find a gap, I am amazed.
Code has been added here for your convenience:
ReplyDeletehttps://gist.github.com/robinparmar/2e19037e728b6783769598c9e62f4f3b
you can found the GExiv2 for windows in this package:
ReplyDeletehttps://wiki.gnome.org/action/show/Projects/PyGObject
I've not try it, only the old version (pyexiv2)
Thanks for that info, which I am sure will help some readers.
ReplyDelete"The result is missing a mapping for tag 37510, the very one I want to use!"
ReplyDeleteChecking _exif.py in piexif-1.0.12 that field is defined:
37510: {'name': 'UserComment', 'type': TYPES.Ascii},