Logo

dev-resources.site

for different kinds of informations.

XML, making everything just a little bit harder.

Published at
7/8/2021
Categories
python
xml
standards
Author
anthonyb
Categories
3 categories in total
python
open
xml
open
standards
open
Author
8 person written this
anthonyb
open
XML, making everything just a little bit harder.

So here's a fun exercise in XML, standards, and data catalogs.

I'm working on ingesting a bunch of records from a variety of data catalogs, of a variety of types. One I'm looking at now uses OAI-PMH. Fortunately there's a nice little Python library called Sickle that abstracts most of the pain away. Until you're dealing with non Dublin Core datasets.

Sickle does allow you to plug in a parser for other types (oh hi XPath, I haven't missed you at all)

The dataset I'm using uses the ANZLIC profile for OAI-PMH (fun side note, the official repo for the info had a broken link to the standard, because bitrot even comes for ISO committees.) It's catchier name is "AS/NZS ISO 19115.1:2015 Metadata".

So I need to write a custom parser for this.

Then I hit this, when looking for keywords in a data record.

<gmd:descriptiveKeywords>
  <gmd:MD_Keywords>
    <gmd:keyword>
      <gco:CharacterString>040104</gco:CharacterString>
    </gmd:keyword>
    <gmd:thesaurusName>
      <gmd:CI_Citation>
        <gmd:title>
          <gco:CharacterString>Australian and New Zealand Standard Research Classification</gco:CharacterString>
        </gmd:title>
        <gmd:alternateTitle>
          <gco:CharacterString>ANZSRC</gco:CharacterString>
        </gmd:alternateTitle>
        <gmd:date>
          <gmd:CI_Date>
            <gmd:date>
              <gco:Date>2008</gco:Date>
            </gmd:date>
            <gmd:dateType>
              <gmd:CI_DateTypeCode codeList="http://asdd.ga.gov.au/asdd/profileinfo/gmxCodelists.xml#CI_DateTypeCode" codeListValue="creation">creation</gmd:CI_DateTypeCode>
            </gmd:dateType>
          </gmd:CI_Date>
        </gmd:date>
      </gmd:CI_Citation>
    </gmd:thesaurusName>
  </gmd:MD_Keywords>
</gmd:descriptiveKeywords>
Enter fullscreen mode Exit fullscreen mode

What on earth is this nonsense? So we can ignore the date code bit (fun sidenote, the asdd.ga.gov.au domain no longer exists, lucky it's not important - I reckon I can interpret "2008" as a date without an xsd file)

So obviously 040104 is a reference to something.

A bunch of googling and staring at the wall finally led me to the Australian Bureau of Statistics, in particular to standard 1297.0 Australian and New Zealand Standard Research Classification (2008) is the current version.

From there, you can go to the downloads tab, and find the table where it maps the 2008 codes to 2020 codes. In a 1.5Mb Excel file. So I exported out the relevant bits of the 2008 codes, throw them in a small sqlite db, and end up with 040104,Climate Change Processes

So all that XML up above? It could have just been

<keyword>Climate Change Processes</keyword>

But nooooo.

standards Article's
30 articles in total
Favicon
PSR-6: Caching Interface in PHP
Favicon
PSR-4: Autoloading Standard in PHP
Favicon
PSR-3: Logger Interface in PHP
Favicon
PSR Standards in PHP: A Practical Guide for Developers
Favicon
PSR-1: Basic Coding Standard in PHP
Favicon
Anvil: An attempt of saving time
Favicon
Wednesday Links - Edition 2024-03-27
Favicon
The TAG, and Responsible Innovation on the Web
Favicon
2023 Industry Trends in Mobile Application User Interface
Favicon
Becoming W3C Games Community Group co-chair
Favicon
Jim's Guide to CockroachDB Naming Standards
Favicon
Writing Code with Standards and generate report on PreCommit : PHP / Laravel Project
Favicon
Creating a code style guide
Favicon
Call for Mentors for the Web Mapping Code Sprint: 29/11 - 01/12 2022 ๐ŸŽ“
Favicon
Apeleg join the W3C
Favicon
2. Green Mode Design: Implementation Strategies
Favicon
Call for Mentors for the Vector Data Code Sprint: 12-14 July 2022 ๐ŸŽ“
Favicon
Re-evaluating technology
Favicon
Web development is like assembling IKEA furniture
Favicon
Introducing the Email Markup Consortium (EMC)
Favicon
Pizza Code
Favicon
Estilo de cรณdigo no PHP: as recomendaรงรตes da PSR-1 e PSR-12
Favicon
Today, the distant future
Favicon
PHPArkitect: Put your architectural rules under test!
Favicon
Cross browser speech synthesis - the hard way and the easy way
Favicon
Foundations
Favicon
XML, making everything just a little bit harder.
Favicon
Coding Standards and Naming Conventions
Favicon
Portals and giant carousels
Favicon
Continuous partial browser support

Featured ones: