{"id":10769,"date":"2015-03-10T20:13:00","date_gmt":"2015-03-11T00:13:00","guid":{"rendered":"http:\/\/mjtsai.com\/blog\/?p=10769"},"modified":"2016-07-22T11:49:03","modified_gmt":"2016-07-22T15:49:03","slug":"using-cp-to-copy-a-lot-of-files","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2015\/03\/10\/using-cp-to-copy-a-lot-of-files\/","title":{"rendered":"Using cp to Copy a Lot of Files"},"content":{"rendered":"<p><a href=\"http:\/\/lists.gnu.org\/archive\/html\/coreutils\/2014-08\/msg00012.html\">Rasmus Borup Hansen<\/a> (via <a href=\"https:\/\/news.ycombinator.com\/item?id=8305283\">Hacker News<\/a>):<\/p>\r\n<blockquote cite=\"http:\/\/lists.gnu.org\/archive\/html\/coreutils\/2014-08\/msg00012.html\">\r\n<p>Having almost used up the capacity we decided to order another storage \r\nenclosure, copy the files from the old one to the new one, and then get the old \r\none into a trustworthy state and use it to extend the total capacity. Normally \r\nI&rsquo;d have copied\/moved the files at block-level (eg. using dd or pvmove), but \r\nsuspecting bad blocks, I went for a file-level copy because then I&rsquo;d know which \r\nfiles contained the bad blocks. I browsed the net for other peoples&rsquo; experience \r\nwith copying many files and quickly decided that cp would do the job nicely. \r\nKnowing that preserving the hardlinks would require bookkeeping of which files \r\nhave already been copied I also ordered 8 GB more RAM for the server and \r\nconfigured more swap space.<\/p>\r\n<p>[&#8230;]<\/p>\r\n<p>After some days of copying the first real surprise came: I noticed that the \r\ncopying had stopped, and cp did not make any system calls at all according to \r\nstrace. Reading the source code revealed that cp keeps track of which files \r\nhave been copied in a hash table that now and then has to be resized to avoid \r\ntoo many collisions. When the RAM has been used up, this becomes a slow \r\noperation.<\/p>\r\n<p>Trusting that resizing the hash table would eventually finish, the cp command \r\nwas allowed to continue, and after a while it started copying again. It stopped \r\nagain and resized the hash table a couple of times, each taking more and more \r\ntime. Finally, after 10 days of copying and hash table resizing, the new file \r\nsystem used as many blocks and inodes as the old one according to df, but to my \r\nsurprise the cp command didn&rsquo;t exit. Looking at the source again, I found that \r\ncp disassembles its hash table data structures nicely after copying (the \r\nforget_all call). Since the virtual size of the cp process was now more than 17 \r\nGB and the server only had 10 GB of RAM, it did a lot of swapping.<\/p>\r\n<\/blockquote>\r\n<p>As far as I know, the Mac version of <code>cp<\/code> does not preserve hard links.<\/p>","protected":false},"excerpt":{"rendered":"<p>Rasmus Borup Hansen (via Hacker News): Having almost used up the capacity we decided to order another storage enclosure, copy the files from the old one to the new one, and then get the old one into a trustworthy state and use it to extend the total capacity. Normally I&rsquo;d have copied\/moved the files at [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"","apple_news_api_id":"","apple_news_api_modified_at":"","apple_news_api_revision":"","apple_news_api_share_url":"","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[146,1408,571,138,174,318,163],"class_list":["post-10769","post","type-post","status-publish","format-standard","hentry","category-technology","tag-backup","tag-hard-links","tag-memory-management","tag-optimization","tag-storage","tag-terminal","tag-unix"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/10769","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=10769"}],"version-history":[{"count":1,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/10769\/revisions"}],"predecessor-version":[{"id":10770,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/10769\/revisions\/10770"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=10769"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=10769"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=10769"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}