Syncing Data Via File Protocols

Overview

This explains how retailers provide data files for feeding information of products, orders and customers to CitrusAd. First, the data structures and the formats of each type of data will be represented. Then, the protocols and constraints for system integration between CitrusAd and retailers will be described.

In this section, we describe the protocols that CitrusAd support and how to name files so that CitrusAd can download the data files automatically from the server of retailers.

Protocols

CitrusAd support several ways for retrieving data files from retailers. The data files should be put on a retailer's server and the files can be downloaded via one of the standard protocols.

Currently, CitrusAd support downloading of data files via the below protocols: FTP, FTPS, SFTP, SCP, HTTP, HTTPS, and AWS S3. If you want to feed data files in a protocol that we have not supported yet, contact our support team.

In general, retailers will need to provide information of protocol, host, port, and the file path of a data file so that CitrusAd can download it. When authentication is required to download data files, each retailer needs to provide CitrusAd a credential (e.g username and password) to authenticate with the retailer's system.

In the case retailers are using SFTP protocol, CitrusAd support two types of authentication to download data files from retailers. The first type of authentication is via a username and a password. The other one is via a public key. In the second type of authentication, CitrusAd will provide retailers with a public key of CitrusAd and the public key needs to be installed into the SFTP server of retailers.

Data Compression and Encryption

Retailers may need to compress and encrypt data before uploading the to their servers for syncing. When both compression and encryption are used on data files, CitrusAd assume that the data files are compressed prior to encryption.

When a data file is encrypted, CitrusAd will decrypt the data file after it is downloaded. Currently, we support data decryption for files that are encrypted by PGP programs. More information about PGP cryptography can be found at https://tools.ietf.org/html/rfc4880. If retailers choose to use this type of cryptography for data files, CitrusAd will provide a PGP public key and retailers will the public key to encrypt data files before retailers upload the files to retailers’ servers, and only CitrusAd can decrypt the encrypted data files.

When a data file is compressed, we will decompress the data file prior to processing. Currently, we support decompression of two types of compression formats. They are zip and gzip. If retailers want to support other types of format, contact our support team.

File Naming

As previously stated, retailers will need to provide information of the protocol, host, port, and file path to a data file so that CitrusAd can download and process the data file. CitrusAd will download data files on daily basis. Retailers can choose a daily time when it is convenient for them to ensure that the data file is ready on the server everyday.

By default, retailers need to provide an explicit file name that CitrusAd will download everyday. This is the simplest way to specify a target file. Retailers just provide a specific file name and CitrusAd will use the file name to retrieve the data file from the server of retailers everyday.

When retailers use FTP, FTPS, and SFTP protocols for the communication between CitrusAd and the servers of retailers, we support some other options for retailers to specify target files for CitrusAd to download. The other options for the target files are ROLLING_EARLIEST, ROLLING_EARLIEST_24_HOURS, ROLLING_LATEST, and ROLLING_LATEST_24_HOURS. They are also called as target file modes that retailers can choose.

When retailers choose one of the options, they need to provide CitrusAd with a textual template for the names of data files. In the textual template, there is a special string, which is “{*}”. CitrusAd will use the template provided by retailers to match the file names on the server of retailers to choose and download a target file everyday.

An example of a template can be “CitrusAdCatalogData_AU_{*}.txt”. The template defines that the matched file names must start with the prefix “CitrusAdCatalogData_AU_” and end with the suffix “.txt”. When the template “CitrusAdCatalogData_AU_{*}.txt” is used, the below file names will match with the template.

CitrusAdCatalogProduct_AU_20190315.txt

CitrusAdCatalogProduct_AU_20190314.txt

CitrusAdCatalogProduct_AU_20190312.txt

In order to avoid downloading data files that retailers are uploading, we only download data files that have been modified more than one minute from the time we access the server. Although there are several file names match the template, we choose only one file to download and process at a time. In order to choose a file from a list of candidates, we define different target file modes that retailers can choose. Target file modes are discussed below.

Rolling_earliest

In this target file mode, we use the template to filter files by using their names. Then we sort the results by ascending file names and return the first result.

For example, if the template for file names is “CitrusAdCatalogData_AU_{*}.txt” and the list of file names are filtered by the template is as below, the file “CitrusAdCatalogProduct_AU_20190312.txt” will be chosen to download in this target file mode.

CitrusAdCatalogProduct_AU_20190312.txt

CitrusAdCatalogProduct_AU_20190313.txt

CitrusAdCatalogProduct_AU_20190314.txt

Rolling_earliest_24_hours

In this target file mode, we firstly use the template to filter files by using their names. Then we only choose files which are modified within recent twenty four hours. Finally, we sort the results by ascending file names and return the first result.

For example, we assume that the current time is 2019-03-15 10:30:07 and the template for file names is “CitrusAdCatalogData_AU_{*}.txt”. If the list of file names that are filtered by the template is in the table below, the file “CitrusAdCatalogProduct_AU_20190314.txt” will be chosen to download in this target file mode.

An example of file with names and last modified information:

File name

Last modified

CitrusAdCatalogProduct_AU_20190312.txt

13/03/19 15:35:11

CitrusAdCatalogProduct_AU_20190313.txt

13/03/19 15:35:08

CitrusAdCatalogProduct_AU_20190314.txt

14/03/19 15:35:10

CitrusAdCatalogProduct_AU_20190315.txt

15/03/19 10:05:07

Rolling_latest

In this target file mode, we use the template to filter files by using their names. Then we sort the results by descending file names and return the first result.

For example, if the template for file names is “CitrusAdCatalogData_AU_{*}.txt” and the list of file names are filtered by the template is as below, the file “CitrusAdCatalogProduct_AU_20190314.txt” will be chosen to download in this target file mode.

CitrusAdCatalogProduct_AU_20190314.txt

CitrusAdCatalogProduct_AU_20190313.txt

CitrusAdCatalogProduct_AU_20190312.txt

This target file mode is similar to Rolling_earliest. However, instead of sorting the files by ascending file names, we sort them by descending file names.

Rolling_latest_24_hours

In this target file mode, we firstly use the template to filter files by using their names. Then we only choose files which are modified within recent twenty four hours. Finally, we sort the results by descending file names and return the first result.

For example, we assume that the current time is 2019-03-15 10:30:07 and the template for file names is “CitrusAdCatalogData_AU_{*}.txt”. If the list of file names that are filtered by the template is in the table below, the file “CitrusAdCatalogProduct_AU_20190315.txt” will be chosen to download in this target file mode.

This target file mode is similar to Rolling_earliest_24_hours. However, instead of sorting the files by ascending file names, we sort them by descending file names.

An example of files with names and last modified information:

File name

Last modified

CitrusAdCatalogProduct_AU_20190312.txt

13/03/19 15:35:11

CitrusAdCatalogProduct_AU_20190313.txt

13/03/19 15:35:08

CitrusAdCatalogProduct_AU_20190314.txt

14/03/19 15:35:10

CitrusAdCatalogProduct_AU_20190315.txt

15/03/19 10:05:07

Conclusion

We have presented how catalog product, customer, and order data, can be downloaded retailer servers and processed in CitrusAd. While TSV format is supported for all data types, XML format is only supported for products. Several protocols for retrieving data files from the retailer servers are supported by CitrusAd, including FTP, FTPS, SFTP, SCP, HTTP, HTTPS, and AWS S3.

Moreover, we have mentioned how we support securing data and improving the performance of downloading files via our ability to decrypt and decompress data files from retailers. To alleviate the process of feeding files and downloading from servers, we provide different options for naming files on the server. We have described the options and explained via examples in this document so that retailers can choose for their purposes.

Currently, XML format is only supported for product data. An extension of the current system in CitrusAd will be completed to support the format on other types of data as well as any other new protocols or formats when there are any requirements from retailers.

You should now be ready to review how to sync catalog, customer, and order data.