Logo

dev-resources.site

for different kinds of informations.

Managing Large Debian Repositories with Pulp

Published at
11/22/2024
Categories
beginners
debian
tutorial
programming
Author
atixag
Author
6 person written this
atixag
open
Managing Large Debian Repositories with Pulp

Pulp is a free, open-source platform for software repository management. You can fetch, upload, and distribute content from various sources. Repository versioning makes sure that nothing is lost as you can always roll back to previous versions. The pulp_deb plugin adds APT repository support.

There is such a thing as Pulp Debian support, and it has been around for a while. It was expanded by ATIX for use with Katello a few years ago. It works great for small to medium-sized repositories. However, performance is not ideal.

Challenge

Around 2019, ATIX consultants wanted to synchronize all of Debian Stretch and Ubuntu Xenial for a demo. Unfortunately, they found that it generally takes about five hours, only to fail with a “Cannot allocate memory” error. What was going on?

To answer this question, they needed to take a closer look at the pulp_deb implementation. Code is organized into several steps. The implementation relies heavily on the python-debpkgr dependency, which in turn relies on deb822 from the python-debian library. python-debpkgr is mainly designed to take a pile of Debian packages and organize them into an APT repository. The structure of Debian repositories looks like this:

/dists/ stretch / Release
/dists/ stretch /main/binary -amd64/ Packages
/dists/ stretch / contrib /binary -amd64/ Packages
/dists/ stretch /non -free/binary -amd64/ Packages
/pool/
Enter fullscreen mode Exit fullscreen mode

During a sync, we have the “MetadataStep,” which is provided with a list of releases, components, and packages (with meta data) from the Mongo DB. It then applies a logic: for every combination of architecture, component, and release, a list of packages is generated. These lists contain the paths to the actual .deb package files on the disk. Finally, each list is passed to a debpkgr call as an argument.

debpkgr is mainly designed to take a pile of Debian packages and turn them into a repo. So, it does just that: Each .deb file is accessed on the disk to extract the meta data debpkgr needs. Due to the way the package lists overlap for different architectures, many of these .deb files will actually be parsed multiple times.

The solution

Our experts’ first thought was: maybe there’s a quick-and-dirty fix? However, they also considered a complete redesign of the way debpkgr works. Another alternative might be dropping debpkgr (from the MetadataStep) and implementing everything themselves.

The basic idea was to exclusively use information from the Mongo DB to create the repository structure. The old implementation already had to parse the meta data from the Mongo DB in order to generate the lists that were then passed to debpkgr. This essentially remained unchanged. Our experts had to create the desired directory structure themselves. They also had to build the symlinks to the actual .deb files themselves. They then needed the ability to write Packages and Release files. As one always does, they happened upon a few stumbling blocks:

debpkgr generates md5sum, sha1, and sha256 for metadata. The existing data base model only stored sha256 hashes. Actually using the meta data from the data base revealed a bug. User-defined meta data fields/fields were not stored in the existing data base model.

Our consultants came up with the following results:

  • Two major pull requests:

1.Ensure the db is used consistently by quba42 · Pull Request #61 · pulp/pulp_deb

2.MetadataStep performance by quba42 · Pull Request #57 · pulp/pulp_deb

  • An end to our memory problems

  • Syncs for medium-sized repositories (1500 packages) that are more than twice as fast

  • Syncing Ubuntu Xenial (main, restricted, universe, multiverse) for amd64 (53837 Packages) within 3h36m on the test system

What did everyone learn? It is important to know your tools! Furthermore, you have to take your time to plan the architecture and gain the required domain knowledge.

debian Article's
30 articles in total
Favicon
Fixes for a critical rsync vulnerability (CVE-2024-12084) have been released for Stable/Bookworm, Testing and Unstable....
Favicon
Debian and KDE 6 - WSL - How to install KDE 6 via Debian - Windows 11 - X410 - Linux - 2024 https://www.youtube.com/watch?v=yrtgmwsptVc
Favicon
Comprehensive Guide: Setting Up Gestures on Linux (Debian-Based Distributions)
Favicon
让安卓手机不再吃灰:在安卓手机上搭建 Rust 开发环境
Favicon
The Importance of Reading Documentation: A Lesson from Nvidia Drivers
Favicon
I changed the arm on my android with Debian no root Linux to the Debian arm my android does wonders now.
Favicon
"Why is it, when something happens, it is always you TWO?"- troubleshooting Bluetooth and Wi-Fi devices on Debian 12
Favicon
Virtualization on Debian with virsh&QEMU&KVM — Installation of virtualization tools and first VM creation
Favicon
The Debian LTS Team is actively working towards ensuring that security fixes made in LTS are also propogated to more recent...
Favicon
OKMX6UL Development Board Debian Filesystem Creation Process (Including Tool Installation, Configuration, and Burning)
Favicon
From Debian to Devuan
Favicon
Debian Outreachy interns selected for the December 2024 round
Favicon
Abilitare SSH root login su Debian Linux Server
Favicon
Calls for bids for DebConf26 have started, please see: https://lists.debian.org/debconf-announce/2024/11/msg00001.html
Favicon
LINUX ÜZERINDE OPERA BROWSER VIDEO OYNATMAMA SORUNU
Favicon
Debian 12 … is amazing! How to: Create your custom codehouse #6 [Giving Voice to Debian: Wireless Audio Devices configuration]
Favicon
Kicksecure: Hardening Your Linux System’s Security (Debian Morph)
Favicon
Managing Large Debian Repositories with Pulp
Favicon
Debian 12 … is amazing! How to: Create your custom codehouse #5 [From Console only to Custom Graphical User Interface]
Favicon
Reviving the Remix Mini PC: A Guide to Running ARM-based OS Images
Favicon
Automating Debian Package Update Summaries with Python and Gemini (gemini-1.5-flash)
Favicon
DebConf26 bids: Please get your information in shape soon!
Favicon
Debian 12: NVIDIA Drivers Installation
Favicon
Debian Secure Boot: To be, or not to be, that is the question!
Favicon
Bulk Linux Users Creation
Favicon
Using Timeshift for System's Snapshots and Recovery on Debian 12 via Command Line
Favicon
Streamlining .deb Package Installation on Ubuntu: A Better Way to Manage Downloaded Packages
Favicon
Debian 12 … is amazing! How to: Create your custom codehouse #4 [Security mechanisms against Network-Based attacks]
Favicon
Debian 12 … is amazing! How to: Create your custom codehouse #3 [Security mechanisms against malware]
Favicon
Debian 12 … is amazing! How to: Create your custom codehouse #2 [Installation & Manual Disk Partitioning with LVM]

Featured ones: