arc2ogm - Initial Thoughts

The Struggle

One of the things I’ve struggled with as a hobbyist mapper is finding and keeping track of GIS data sources on the internet. And because ESRI is such a giant, a large number of state GIS departments end up using ArcGIS. And while there are a large number of ArcGIS REST Services out there, they are not well advertised, and the interface for browsing them isn’t very… mappy.

The Inspiration

I recently did some work for NYU on their Spatial Data Repository which is built on GeoBlacklight. GeoBlacklight is backed by Solr and ingests documents using the OpenGeoMetadata Aardvark Schema. The documents describe mapping resources of various kinds.

A minimal Aardvark file looks something like this:

{
  "id": "foo-trails",
  "dct_title_s": "Trails",
  "gbl_resourceClass_sm": [
    "Web services"
  ],
  "dct_accessRights_s": "Public",
  "gbl_mdModified_dt": "2025-07-09T00:57:51Z",
  "gbl_mdVersion_s": "Aardvark",
  "schema_provider_s": "Testing",
  "dct_references_s": "{\"urn:x-esri:serviceType:ArcGIS#FeatureLayer\":\"https://services6.arcgis.com/9QlSLDqa0P1cHLhu/ArcGIS/rest/services/WRD_WMA_Public/FeatureServer/13\"}",
  "locn_geometry": "ENVELOPE(-85.41306473369147, -81.12480564035837, 34.975019430744, 30.786567131737005)",
  "gbl_resourceType_sm": "Line data"
}

Ingesting that file results in a GeoBlacklight search results that look like this:

geoblacklight-results.png

And a record page that looks like this:

geoblacklight-record.png

It’s a nice visual way to browse and filter map data in order to find what you’re looking for. Imagine if there was a GeoBlacklight server that cataloged all the public ArcGIS REST Servers out there?

The Idea

My goal is to create a tool (tentatively called arc2ogm) that you can point at a collection of ArcGIS REST Service URLs, and it will crawl them and turn each compatible resource into an Aardvark document. Then I’d stand up a GeoBlacklight instance and ingest all those documents for others to browse and search.

It’s likely that the cost of the server is not something I can warrant right now, so at least the tool and the resulting data could be useful to others.

The Minimal Viable Project

The Aardvark schema only has a few required fields, though a few more make the record much more usable in GeoBlacklight:

Required Fields

Title

This is the name of the resource which should be pretty straightforward. A challenge here is that they can often be named somewhat generically because they are only thought about in the context of a county or state.

Resource Class

This field will likely be hard-coded because we’re always talking to a “web service.” There might be some subtlety here moving forward though.

Access Rights

This will likely always be Public since we’re crawling publicly available resources.

ID

This is meant to be a unique identifier which I will need some kind of strategy for generating. I suspect this value should be based on the URL (assuming they never change), but this requires further thought.

Modified

This will just be a timestamp for when the URL was crawled.

Metadata Version

This will be hardcoded to Aardvark.

Helpful Fields

Creator, Publisher, Provider

One of these fields will likely be necessary to identify (and make filterable) the source of the data. It could also identify the title or owner of the ArcGIS REST Server being crawled.

Resource Type

There’s a canned collection of values for this field, and for the most part, it will just map to whether it contains point, line or polygon data. As I move beyond Feature Services, I assume additional mappings will be necessary.

References

This field is a collection of URLs related to the data — where you might download or access it from. There are 4 ESRI-specific values for DynamicMapLayer, FeatureLayer, ImageMapLayer and TiledMapLayer.

The value here makes it possible to display the “Open in ArcGIS Online” link and “Web services” button. You could plug this value into QGIS to load that data on the fly.

Geometry

This field represents the extent of the data and powers the ability to search by the pannable/zoomable map in GeoBlacklight. This feels like a high priority field to me.

From my initial development, there will be a challenge here as the Spatial Reference Identifiers (SRIDs) used by some layers aren’t available in pyproj. From a cursory investigation they do show up in QGIS, but I suspect it’s less than trivial to bring QGIS’ data sources into a standalone CLI.