31. 12. 2024 Damiano Chini Automation, Development, DevOps

Maintaining forks of upstream projects without git

When adopting an open-source software project that you do not own, you may find it necessary to modify it partially to meet your specific requirements. However, as you implement these changes, it’s important to recognize that the upstream project will eventually update itself, leading to potential conflicts in the files that both you and the upstream project have altered.

Typically, you can manage this process using Git. You would create a fork of the upstream repository, make your modifications on your fork, and when it’s time to update your fork with the latest version of the upstream repository, you would perform a merge and resolve any conflicts in the modified files.

However, there are instances where your requirements may prevent you from forking the upstream Git project. For example, you might need to make changes directly to artifacts such as tar.gz or RPM files, particularly if you want to avoid or cannot repeat the upstream build processes.

This situation arises with certain NetEye components, where the NetEye CI re-packages some upstream RPM files. In these cases, we take the original RPM, apply our customizations, and then rebuild the RPM.

Since we cannot rely on Git’s functionalities to manage our forks in this context, we have developed a custom solution that allows us to easily and safely maintain the modifications made to the upstream project.

Desired features

To efficiently and safely manage our forks while integrating updates from the upstream project, we aimed to implement the following features:

  • The Continuous Integration (CI) process should operate without requiring any manual intervention if a forked file remains unchanged by the upstream project.
    • This approach ensures that the majority of updates proceed smoothly, allowing developers to focus on their work without unnecessary interruptions.
  • The CI process should halt and prompt the developer for action if a forked file has been modified by the upstream project.
    • In this scenario, manual intervention is essential, as it allows the developer to carefully assess how the upstream changes should be integrated into the fork.

Solution

Let’s consider customizing the configuration file located at /etc/cool_project.conf for the upstream project.

Our primary objective is to maintain a Git repository that tracks the following files:

  • Customized Forked File: We’ll refer to this as cool_project.conf.customized.
    • This is the version that will be deployed in production at the path /etc/cool_project.conf.
  • Latest Upstream Version: This will be named cool_project.conf.orig.
    • This file will always retain the original content prior to any updates, enabling us to identify any changes made to the configuration file by the upstream project.

At this stage, we can utilize the following pseudo-code during the build process to implement our two desired features:

new_cool_project_version = 1.2.3-4 
# download upstream artifact (e.g. tar.gz or rpm) 
new_cool_project_artifact = get_artifact(new_cool_project_version) 
# unpack the artifact in a temporary folder
unpacked_artifact_folder="/tmp/my_unpacked_artifact_folder/" 
unpack_artifact(new_cool_project_rpm, unpacked_artifact_folder)
forked_file_path = unpacked_artifact_folder + "/etc/cool_project.conf"

if (forked_file_path != "cool_project.conf.orig") {
    print_diff(forked_file_path, "cool_project.conf.orig")
    print("Please do the following: port the changes to cool_project.conf.customized, update cool_project.conf.orig and relaunch the build")
    exit 1
} else {
    # do nothing
}

# install the customized path in the final path of the a
copy("cool_project.conf.customized", forked_file_path)
# reconstruct the artifact that now will contain our customized file
repack(unpacked_artifact_folder)

Conclusion

In this blog post, we explored a low-effort method that enables you to efficiently and safely maintain a fork of an external project. This is particularly useful when you need to work directly with artifacts such as tar.gz or RPM files, where the conventional method of forking a Git project may not be applicable.

Based on our experience, this approach offers a straightforward way to keep your forks synchronized with the upstream project. It minimizes the risk of “losing” changes from upstream while keeping manual intervention to a minimum during updates. This method has proven especially effective for forks of configuration files, which are typically limited in number and infrequently updated by upstream projects.

Damiano Chini

Damiano Chini

Author

Damiano Chini

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive