In this Twitter thread, Kristina Spurgin describes a pattern of errors in MARC records that I’ve been running into lately as well: a lowercase L in place of a one in dates, like “l905”. I’d also been assuming it was an artifact of OCR.
Her coworker explained that typewriters used to not have a one digit, and even after they did, people had strong muscle memory to type l (ell) instead of 1.
Now, if I can only figure out why this title started with “0il”…
It turns out there is a field specifically for imprint statements for films, with dedicated subfields for producing company and releasing company. And it goes waaaay back, to AACR1, pre-1976.
Before 1976, AACR apparently instructed catalogers to create an imprint statement for films that is totally different than for books. And that went into MARC 261. In addition to subfields for producing and releasing company, it also has one for “contractual producer,” a category I’m not sure I understand (since I haven’t yet gotten my hands on a pre-1976 copy of AACR). The order of the subfields is different than 260, as well – for example, the place of production, release, etc. is 261 $f, as opposed to 260 $a. Additionally, the subfields for date and place are repeatable, if there are multiple producing companies, releasing companies, and contractual producers recorded (which is a total nightmare for thinking about translating that data). Here’s the actual field definition:
El Quijote universal : 150 traducciones en el IV Centenario de la muerte de Miguel de Cervantes / editado por José Manuel Lucía Megías. (OCLC #985966961)
The OCLC Bib Formats documentation says for field 041:
For works in multiple languages, the codes for the languages are recorded in the order of their predominance. If predominance cannot be determined, record the codes in English alphabetical order. If the code mul (Multiple languages) is recorded in Lang (meaning the item is multilingual with no predominant language), the code for the title (or the first title, if there are more than one) and the code mul are recorded. Alternatively, any number of specific language codes may be recorded in repeating occurrences of subfield ǂa.
The first option is used for this book:
041 1_ ǂa spa ǂa mul ǂh spa
So where to draw the line? How many languages are too many to list? This decision may vary by institution; for example, National Library of Medicine used the mul code for titles in more than six languages. I don’t know that my library has a limit for how many languages we will list, but with translations in 150 languages, mul seems far more reasonable!
While cleaning up some records today, I noticed a few videos whose titles appeared in the catalog as:
I wondered if this was some sort of intense art film with no title, or if someone was intentionally or unintentionally messing with catalogers by giving their film a title more often seen as a GMD. (Was this a film about the history of the GMD??)
I clicked through and found that it was a perfectly normal film, with the title:
$100 a day : justice and reparation in California's
And this had somehow ended up in the MARC record as:
245 00 ǂ1 00 a day : justice and reparation in California's
The dollar sign is a particularly dangerous one to have in your data if you’re not careful with your processing. In many languages it signals the beginning of a variable, so “$100” may have unpredictable (or erroneous) behavior. It’s also a common convention for representing the subfield delimiter in a MARC record, so:
245 00 $a $100 a day
might have looked like it contained an empty subfield $a and been cleaned up in text to form:
Did you know that ISO 2709 (the standard of which MARC21 is an instance) is fairly general? It allows for:
tags which include letters and numbers (though they must still be of length three, like MARC21′s numeric tags)
up to nine indicators (MARC21 always has two)
subfield codes of length up to nine (MARC21 always has this length as two, as subfield codes like “ǂb” are two characters long)
The indicator count and subfield code length are encoded in every MARC record’s leader in positions 10 and 11; note that the spec says these should always be “2″ and “2″.
I revisited this standard again recently to determine why some vendor records were causing trouble; their leaders had position 11 set to “0″, indicating that the subfield code length was zero:
01710nam a2000385 a 4500
though the record was full of subfield codes of length 2 (like ǂa).
This was easy enough to fix in batch (that leader position should always be “2″, so I just overwrote what was there) and I’ve contacted the vendor to let them know about the strangeness in their records. Hopefully they’ll be fixed on the vendor’s site before they cause any more trouble!
Oxidative stress and biomaterials / edited by Thomas Dziubla and D. Allan Butterfield. (OCLC #938383040)
We found brief copy for this title in OCLC, and upgraded it to RDA. One change we made was converting the 260 field to a pair of 264s.
The 260 field may be used to record places, dates, and parties responsible for the publication, distribution, and manufacture of the title, as well as copyright information. This field may be used in RDA records, though several elements map to the same subfields; the RDA toolkit’s MARC Bibliographic to RDA Mapping says that ǂa, ǂb are for publication and distribution information, and ǂe, ǂf are for production and manufacture information.
The 264 field may also be used to record the data above, but in a more granular/specific way: the second indicator specifies whether its 264 field is about production (0), publication (1), distribution (2), manufacture (3), or is a copyright date (4).
While upgrading records to RDA, I always convert existing 260s to 264s to record the most specific information I can while I have the piece in hand, using multiple fields if needed: