A simplified proof-of-concept project for managing ads over-delivery utilizing Spring Cloud and Apache Kafka. Based on the following article:
- Java 8
- Maven
- Apache Kafka. See section with binary downloads and the recommended stable version. At the moment of writing the version used is: kafka_2.11-0.10.1.0
- A modified version of JSON Data Generator originally provided by [ACES,Inc] (http://acesinc.net/)
tar -xvzf kafka_2.11-0.10.1.0.tgz
kafka-install-dir/bin/zookeeper-server-start.sh kafka-install-dir/conf/zookeeper.properties
kafka-install-dir/bin/kafka-server-start.sh kafka-install-dir/conf/server.properties
Clone/fork the modified version of JSON Data Generator and follow the instructions provided here
Clone or fork the repo
git clone [email protected]:kmandalas/overdelivery-mgmt
cd overdelivery-mgmt
Create all the topics required by the examples
kafka-install-dir/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ad-insertion-input
kafka-install-dir/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic predicted-spend-output
kafka-install-dir/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic impressions
Run each service in different console/terminal. The recommended order is the following:
cd <dir>/config/
mvn spring-boot:run
cd <dir>/registry/
mvn spring-boot:run
cd <dir>/gateway/
mvn spring-boot:run
cd <dir>/budget-service/
mvn spring-boot:run
cd <dir>/inventory-service/
mvn spring-boot:run
cd <dir>/predicted-spend-consumer/
mvn spring-boot:run -Drun.arguments="--partition-no=0"
cd <dir>/predicted-spend-consumer/
mvn spring-boot:run -Drun.arguments="--partition-no=1"
cd <dir>/impressions-consumer/
mvn spring-boot:run -Drun.arguments="--partition-no=0"
cd <dir>/impressions-consumer/
mvn spring-boot:run -Drun.arguments="--partition-no=1"
cd <dir>/spend-aggregator/
mvn spring-boot:run
Once you have started all the services, check Eureka dashboard
Keep in mind that the Service Discovery mechanism needs some time after all applications startup. Any service is not available for discovery by clients until: the instance, the Eureka server and the client all have the same metadata in their local cache so it could take 3 heartbeats. Default heartbeat period is 30 seconds.
If everything is started OK, you should have a view similar to the following one:
For this paradigm, two (2) streams of events are needed:
- insertions
- impressions
When a new ad spot (i.e. an opportunity to display an ad) appears in a "website", the "frontend" sends an ad request to ads inventory. Ads inventory then decides whether to show ads for advertiser X based on their remaining budget. If budget is still available, the ads inventory will make an ad insertion (i.e. an ad entry that’s embedded in a user’s app) to the frontend. After the user views the ad, an impression event is sent to the spend system.
In our case, we simulate the ad requests sent by a website/frontend with a stream of generated by the JSON Data Generator via HTTP POSTs towards the inventory-service.
This microservice checks if actual_spend + inflight_spend > daily_budget
and if this is false, it sends a message to the
ad-insertion-input
kafka topic. The message looks like:
{key: adgroupId, value: inflight_spend}, where
- adgroupId = id of the group of ads under same budget constraint.
- inflight_spend = price * impression_rate * action_rate
The configuration for both impression_rate
and action_rate
is in inventory-service.yml
For starters the selected values are global and the same for all advertisers:
- 0.5 (i.e 50%) for the impression rate
- 1 (i.e. 100%) for the action rate since we assume that all advertisers are paying by impression
A kafka-stream spend-aggregator using a tumbling window of 10
seconds acts as the "Spend aggregator" and sends the sums to the predicted-spend-output
kafka topic.
A kafka consumer predicted-spend-consumer consumes the
messages of the predicted-spend-output
topic and updates the inflight_spend
accordingly.
All budget retrieval and update actions are done via the budget-service. An H2 database is used, with a single table keeping the data for this simple scenario. The data-ownership belongs to the budget-service. In this way it's not possible to bypass API and access persistence data directly. Kafka consumers, inventory-service etc. all interact with the budget-service using Feign in order to retrieve/update data.
In order to access the in-memory database, view the schema etc. go to your H2 console. As jdbc-url, enter: jdbc:h2:mem:budget-db
You may find some sample configuration files for the json data generator within the folder streaming-workflows.
Have in mind that the values in these sample files are over-simplistic. Even in a near real-world simulation scenario, multiple streams may need to
be started with different eventFrequency / varyEventFrequency / varyRepeatFrequency
parameters etc.
In order to start the streams, first copy the json config files to json generator conf directory:
cp streaming-workflows/* <dir>/json-data-generator-1.2.0/conf
Then, begin by starting the insertions event stream:
java -jar json-data-generator-1.3.1-SNAPSHOT.jar insertions-config.json
Continue by initiating the impressions event stream:
java -jar json-data-generator-1.3.1-SNAPSHOT.jar impressions-config.json
This time the generator sends the messages directly to the impressions
kafka topic. Since this topic is partitioned by adGroupId
and the number of consumers is equal to the number of the partitions (in our scenario this number is: 2), we guarantee that there will not be concurent modifications of the budget of a single advertiser.
An impressions-consumer instance, consumes the
messages of the impressions
topic and updates the actual_spend
accordingly (by calling the budget-service via Feign).
A single html page is used to display a "live" dual-series chart for a single advertiser. This mini Spring Boot web-app resides in the gateway module. It uses web-sockets to send periodically to the client pairs of actual spend and in-flight spend for a single advertiser (for the moment the one with adGroupId: 101).
In order to view the page navigate your browser to:
Todo:
- Enable hystrix for inventory-service and budget-service
- Enable turbine with the monitoring app
- Add integration tests for inventory-service and budget-service
- Enable security with OAuth2
- Enable the aodm-common module in order to avoid code duplicating shared DTOs among the microservices
- Enrich the README.md with information about the selected kafka partitioning scheme and provide the list of known issues and limitations of the current project as-is
I try add things and improve this project during my... "free" time. I would greatly appreciate your help. Feel free to contact me with any questions/corrections and suggestions.