https://ift.tt/30bt4xS Keep your data pipelines safe while changing configurations and the entire setup around them. Playing around with a...
Keep your data pipelines safe while changing configurations and the entire setup around them.
Playing around with a docker-hosted Apache NiFi and testing out its many capabilities and 288 different processors (as of version 1.14) is only fun as long as you don't have to restart your docker containers and - worst case - loose the process groups you created. Maybe you were clever enough to create and download templates as safety copies. Maybe you even saved the process groups to NiFi's registry - but is your registry fully persistent? When it comes to data pipelines, it sometimes takes long hours to get them exactly right — meaning that the worst thing is to loose and having to painstakingly recreate them afterwards.
Instead of having to rebuild processors, reconnect the registry or reconfigure entire process groups — after every single docker restart — there is an easy way to never having to worry about it again.
The following two chapters will enable you to set it up once and then forget about it. The only time you will need to revisit this article is when you create a new setup alltogether.
Maybe you also came here because you wanted to get to know Apache NiFi and do it right from the get-go, in which case: don’t worry. All steps are explained in detail, no previous knowledge is needed.
Creating a fully persisted NiFi service, includes persisting the Apache NiFi registry — NiFi’s very own version control system for versioning process groups. We will start with the NiFi registry as it is a quick setup and then move on to NiFi itself which is a bit more tricky, but not impossible to persist even when hosted as a docker container.
We will use the following docker images as they are the latest as this article is being written — feel free to use different / newer ones, but be aware of any changes since these images:
- Zookeeper: bitnami/zookeeper:3.7.0
- NiFi registry: apache/nifi-registry:1.15.0
- NiFi: apache/nifi:1.14.0
While our core application is NiFi, we will include the NiFi registry to have a detailed version control of our flows available. Zookeeper is a fundamental part of the NiFi cluster since 1.x — enabling distributed coordination and communication within our cluster in case we want to scale up.
Note: The images are specified by their exact version and will be downloaded autoamtically if you use the provided docker-compose.yml file. The entire file can be found here or at the end of this article.
Preparation
This article is aimed at windows users but is still valid if you are on another system — but your mileage may vary here and there.
- Download and install docker from here.
- Open your favorite command line tool and navigate to a fitting place.
- Create an empty directory.
- Copy the docker-compose.yml file — manually or with curl as shown below.
- To start the docker services, run docker-compose up.
- To stop the docker services, you can exit the process by pressing CTRL+C.
Something like the following wall of text will appear and continue running until you stop the services by pressing CTRL+C.
It may take about a minute until the NiFi WebUI is live.
Once the services are running, you can access the NiFi registry at http://localhost:18080/nifi-registry/ and NiFi at http://localhost:8091/nifi/ in your browser.
With the applications set up, let’s get to work!
Persisting the NiFi registry
Within NiFi registry, we want to persist the buckets we create and any flows which they include.
This can be done by mounting two volumes to the local machine — flow_storage and database.
flow_storage includes the buckets and the flows packed within them. Buckets and flows are identified by UUIDs - universally unique identifiers - just as NiFi uses UUIDs to identify processors, processor groups and controller services among other elements. The directory structure with a bucket inside would look like this:
The database directory consists of a single file:
This is a single file which we really shouldn’t loose. Without the database file, we won’t be able to load any of our buckets and flows from the flow_storage directory.
If you used the provided docker-compose.yml file you will see that the two volumes of the registry service are already mounted to directories on your local machine:
This means, that any files that the docker container creates in those directories will end up being read from and written to your local machine — it won’t matter whether the docker container restarts or gets completely recreated, the files are persistently stored outside of its influence.
This is it — we are already done with the registry! Any changes to flows or newly created buckets will reappear whenever we restart the docker container.
You can create a bucket in the registry by clicking on the Settings buttons (a wrenchsymbol) in the top right corner of the browser window, then on new bucket. Enter any name you want and click on create.
You should now be able to see a similar filestructure to the one shown in the early part of this chapter in your local directory /nifi_registry/flow_storage/....
Persisting NiFi
To persist NiFi’s process groups, processes, connections and controller services, we need to use a one-time trick after which — no matter how many times we restart docker, even with --force-recreate - NiFi will always come back with all of our elements intact. This step needs your input and is not automated from the start.
NiFi stores the information (which we need to persist the flow elements among others) in the conf directory - but when we mount the directory before the first startup of our docker container we will soon realize the nifi container won't start. It will instead repeatedly complain and try to fix the issue without getting anywhere:
So here is the trick summarized for all those who cannot wait, details follow below:
- Start NiFi’s docker container without mounting the conf directory.
- Copy the conf directory out of the running container to a local conf directory.
- Stop the docker container.
- Mount the local conf directory and restart the docker container.
Now to the juicy details…
Step 1
Reminder: If you want to follow along, you can get the docker-compose.yml file from here or at the end of this article.
Let’s start the docker containers by executing docker-compose up from within our newly created directory. We will have wait a bit until NiFi has completely spun up - it may take up to one minute or so until the election cycle is completed and before we can access the WebUI under http://localhost:8091/nifi/.
To test things out, let’s create some sample processors, a registry client, a controller service or whatever else we might want to persist. A single processor or processgroup will suffice to see the desired result. You can drag and drop elements from the bar at the top of the browser window.
Step 2
After starting the docker container and while it is still running, we copy the conf directory from the container to our local machine with the following two commands.
Execute them from a new terminal window (so our docker does not get interrupted), but in the same directory in which the docker-compose.yml file is saved, to guarantee that the directory gets copied to the correct place.
You can get the CONTAINER ID (in my case 7554d9c68c8f) from output lines of docker ps.
Don’t forget to replace the container id in the docker cp-command with your own!
Afterwards, we can check whether we have the /nifi_persistence_test/nifi/conf/ directory including the necessary files inside of it on our local machine.
Step 3
We can now stop the docker containers by pressing CTRL+C in the command line window in which we started them with docker-compose up.
Step 4
In the docker-compose.yml look for the commented line in which the volume gets mounted (- ./nifi/conf:/opt/nifi/nifi-current/conf) and uncomment it. Note: be careful to adhere to the same indentation as the lines above it.
Uncommenting the line will mount the directory conf of the docker container to our local directory.
Now let’s start our NiFi service again with docker-compose up --force-recreate.
We use the flag --force-recreate to make sure that our containers pick up our latest changes to the docker-compose.yml file. It does not influence the data stored in our mounted volumes in any way. This will also start our services - with all of our elements, controller services, registry buckets and connections intact and persisted.
Note: if you haven’t created sample processors before copying the conf directory to your local machine, don't worry. You can just create them now in NiFi and restart the docker containers to test whether the persistence worked.
Basically anything we create from now on will still be there after we restart the docker containers.
Closing thought
Having an Apache NiFi application running as fully persisted docker service brings many advantages, the least of which is to be able to continue our work after recreating or restarting our docker services. It is also the ideal setup to get to know NiFi, NiFi’s registry and any other connecting services a bit better than before. Even if you decide to change the setup or add new services and connect them differently — you won’t ever have to recreate those data pipelines!
As always, we’re never done learning. Find out more about …
- Apache NiFi’s purpose
- Apache NiFi registry’s purpose
- Getting started with docker compose
- UUIDs and whether you can assign one to every grain of sand on earth
- How to create UUIDs in Python
Follow me on Medium for more articles about data engineering tools as well as software and personal development!
And, as promised, here is the docker-compose file:
Host a fully persisted Apache NiFi service with docker was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
from Towards Data Science - Medium https://ift.tt/3D6B3uf
via RiYo Analytics
No comments