After developing custom connectors for some InRiver projects, I wanted to make my own open sourced inbound connector to demonstrate how to make a simple, yet powerful, integration for loading large XML documents with product data into iPMC.
In past InRiver and iPMC projects, I have built customized integrations for specific customers. However, they all needed to import or export data in XML or JSON formats, which in hindsight could have been done simpler and more generic. Therefore, I recently decided to create a generic inbound connector, starting from scratch and incorporating some of the ideas and learnings I got from those projects.
The solution is open sourced and available at this GitHub repository.
Some of the ideas were not entirely new. Here is an article I once wrote about similar ideas for an efficient outbound connector.
Setting a goal
For this project I wanted to create a connector that should be:
- Generic
- Simple
- Configurable
- Efficient
I guess that everybody would like their connectors to meet all of those four goals. But it is not that easy to make a connector that can be applied to various different XML documents, while being configurable (rather than hardcoded) and also be really performance optimized.
I also considered this project to be a personal exercise in optimizing .Net applications, using profiler tools and high-performance optimization strategies.
Solution outline
My inbound connector implements an inbound data extension class, supporting adding and updating InRiver entities from an XML document using XPath parsing.
As opposed to other schema-based connectors, this one does not require that the inbound data is matching the iPMC data models. The inbound XML data can be structured as a list of products and nested lists of child items, or as a long flat list of both products and items. And the connector can be configured to know which entities and fields to parse.
In order for this connector to be configurable, I created an XML mapping document in which all entities, fields, field sets and links are mapped to the inbound XML data, by specifying names, XPath expressions and settings. Configuring a new field mapping is easy, and only involves adding an XML element to an entity mapping element.
By configuring all of the parsing and mapping in an XML document, makes it easy to update the data model without having to develop and deploy such a change. And if the time comes where standard parsing is not enough, then a developer can extend the connector by adding a new field type definition, a field parser class and a new field mapping. That’s it.
As mentioned, I decided to base the parsing entirely on XPath. I know that the solution could have been even faster (and much less flexible), if I had worked with XmlDocument and hardcoded the whole parsing thing. However, even though it parses all of the data at initialization, the XPathDocument class is really fast when looking up data from XPath expression afterwards.
While developing and optimizing the solution, I also made some choices of:
- Overriding the Entity.GetField method with an optimized version
- Opting for dictionaries over lists
- Caching of each and every XPath expression and field parser instance
- Caching of certain model and data instances
- Parsing and importing one entity at a time, using enumerable methods
- Thoroughly measuring and investigating call time and call counts, using JetBrains dotTrace