Wrong Copying Progress and Long Disk Unmounting on Ubuntu
Sometimes I need to copy files from an internal SSD to an external hard drive. Recently I’ve found a
strange behavior on Ubuntu 20.04. When I try to copy a fairly big file (like 2Gb - 4Gb) Nautilus
shows insane writing speed and the progress bar instantly reaches 100%, but when I want to unmount
the device - it freezes for several minutes. I tried different file managers like
Thunar from Xfce,
midnight-commander, or even utilities like rsync
or pv
- the
result was the same. For me, it’s a bit inconvenient when I don’t know how much time I have to wait
until all data will be transferred, so I tried to find the solution to this problem.
Searching on the Internet leads me to the fact that this problem is not specified to the file manager or Linux distribution. It can be explained by the Linux virtual memory subsystem and page cache mechanism.
All types of computer memory can be divided into two groups: volatile and non-volatile. Volatile memory usually fast but it has a limited size and requires power to maintain the stored information. In contrast, non-volatile memory like HDD or the optical disc can retain the stored information even after power is removed but it’s slower than RAM. Due to this fact, Linux and other operating systems use different approaches for accessing the memory of different kinds.
RAM is fast enough, that is why when a process needs to access the RAM it just specifies an address and waits until the operation completes. For non-volatile storage, such an algorithm increases latency and makes the system unresponsive, so it’s better to use the asynchronous approach as much as possible. Linux stages disk writes into the cache in RAM, and over time asynchronously flushes them to disk. This algorithm has a positive effect on speeding disk I/O, but it has some issues.
When the cache is empty, and a process tries to execute a write
operation it receives almost
instant feedback, but when the flush
command is called there is a large pause for the actual data
transfer between RAM and HDD. By default, Ubuntu uses 20% of RAM for file caches that is why when I
have a lot of free RAM the cache can hold the whole file what leads to the problem described in the
first paragraph.
There are several possible solutions.
Reduce cache size
There are some tunable settings that influence how the Linux kernel deals with the file system cache. All of them are connected with dirty data (or dirty memory) - data that is written into the cache but not saved on disk.
dirty_ratio
- maximum percentage of dirty system memorydirty_bytes
- the same asdirty_ratio
but specified in bytesdirty_background_ratio
- percentage of dirty system memory at which background writeback will startdirty_background_bytes
- the same asdirty_background_ratio
but specified in bytes
If my computer has 32Gb of RAM and dirty_ratio
is 20 then the cache size will be over 6Gb what
leads to a long unmounting time. It’s possible to reduce cache size to 48Mb and ask operating system
to start writing to the device when the cache has more than 16Mb of data using the command:
sudo bash -c 'echo $((16*1024*1024)) > /proc/sys/vm/dirty_background_bytes'
sudo bash -c 'echo $((48*1024*1024)) > /proc/sys/vm/dirty_bytes'
In this case, coping progress shows correct speed, and unmounting takes only several seconds.
To save this setting after reboot it’s required to add such a line into /etc/sysctl.conf
:
vm.dirty_background_bytes = 16777216
vm.dirty_bytes = 50331648
This option is applied to all disks - external and internal ones what can reduce system performance.
Mount USB flash drive with “sync” or “flush” option
By default, Ubuntu mounts flash drives with the async
option. It means that cache and asynchronous
writes will be used. It’s possible to ask the kernel to write all data synchronously. Assume flash
drive is a /dev/sda1
device and it’s required to mount it on /mnt
directory. I can use a command
like:
sudo mount /dev/sda1 /mnt -o rw,sync
This option can reduce write speed dramatically and moreover it can influence the lifetime of the device: Linux kernel can’t reorder writes and has to write every sector in the order requested by the applications. On cheap USB drives that don’t reallocate sectors, the repeated writes to the file allocation table on (V)FAT or to the journal on a typical modern filesystem can kill the stick pretty fast.
On FAT filesystems it’s possible to use the flush
option instead of sync
. It asks the kernel to
flush all writes as soon as the drive becomes idle, but it does not preserve the order of writes, so
the kernel can optimize the write process.
Use autofsync
On Ubuntu, it’s possible to intercept some system calls for a process and add custom behavior. This
idea is used in the library called autofsync. It intercepts
write()
call and performs sync
operation when a certain amount of data were written to a file.
Limit size is adjusted at run time to keep sync
durations around predefined value. The goal is to
express the writeback cache size limit in seconds rather than in bytes.
The library should be attached to the process using LD_PRELOAD
:
# Download and build library
git clone https://github.com/i-rinat/autofsync.git
cd autofsync
cmake .
make
# Start file manager
LD_PRELOAD=$PWD/autofsync.so mc
I prefer this method because it does not depend on the RAM size, flash drive speed, and also it is filesystem independent solution. It changes the behavior of a particular process and does not affect the whole system’s performance.
Conclusion
Using external disks is a rare operation nowadays, but it does not mean that it should be inconvenient, so one of the solutions from the article can make the user experience much better.