Make backup

MakeFile Backup

make was a utility used for recompiling c source code . In early days the cpu has restricted power and compilation of large code base was time consuming , because the whole code has to be recompiled again. This is were make comes into play . make reads a file which contains a set of dependency for certain object file/target and when any of the dependency file is modified or updated the make only compiles this particular file , thus only the files which are updated are compiled , reduces the total time and cpu usage, make does this by checking whether the change in the dependency is greater than it’s target , if it is found . the required commands are executed

The beauty and flexibility of makefile comes from it it’s syntax

target ... : prerequisites ...
          recipe
          ...

the target can be the name of the file to be created or an action to be carried out prequisites is the list files used as input to create output recipe contains set of rules for the creation of the target . it is basically some shell commands

Other than to compile c, makefile can be used to automate repetitive task and as a scripting language . It is better in doing something which will take heavy coding when done using normal shell scripting . It depends upon the situation

Trying to create an alternative to cloud service , I stumbles upon a question why am i using the cloud in the first place

  1. Backup
  2. Access and Download files any time
  3. Sync b/t devices

This script is good for backup and archival of data in encrypted format , There is a problem for encryption - it is designed to be random Consider a situation were i have a Folder containing many files and the total size is 100MB and I save it to the cloud by creating a archive and encrypting it , and now if i need to update some files in the directory and re-sync to remote . the new encrypted archive will be totally different from previous file and the whole file have to be updated . We can encrypt all the files recursively in there and only the file which have changed will have to updated to the server , here also there are problems one the file name should be obfuscated and we need to track which files have changes during the course of time and encrypt only those file .

#Encryption Setting
RECIPIENT=vishnudevtj@gmail.com

#Remote Setting
USER=root
HOST=192.168.1.111
ARCHIVE_DEST=/root/archive

#Local Variables
ZIP:=$(wildcard *.gz)  $(wildcard *.gz.gpg)
LIST:=$(filter-out archive Makefile FILE_LOG CHANGE_LOG $(ZIP),$(shell find . -type f -printf "%P\n" | sort ))
DIR:=$(shell find . -type d)
VAPTH:=$(DIR)

TIMESTAMP=$(shell date '+%Y_%b_%d_%H_%M')

#upload Part

archive: $(LIST)
	@echo $(LIST) | tr ' ' '\n' > CHANGE_LOG
	tar cvfz $(TIMESTAMP).tar.gz $? CHANGE_LOG && rm CHANGE_LOG
	touch archive

sync: encrypt
	rsync -avz *.tar.gz.gpg $(USER)@$(HOST):$(ARCHIVE_DEST)

encrypt:
	gpg --encrypt-files --recipient $(RECIPIENT)  *.tar.gz

archive_full:
	@echo $(LIST) | tr ' ' '\n' > FILE_LOG
	tar cvfz  $(TIMESTAMP)_BKP.tar.gz $(LIST) FILE_LOG && rm FILE_LOG
#full: clean archive_full encrypt sync clean
#time: archive encrypt sync clean
and:
	@echo

#Recovery Part

list:
	ssh $(USER)@$(HOST) "ls $(ARCHIVE_DEST) | sort -M  "

recover: download_archive decrypt

decrypt:
	gpg --decrypt-files *.gpg

.ONESHELL:
SHELL=/bin/bash
download_archive:
	ssh $(USER)@$(HOST) "ls $(ARCHIVE_DEST) | sort -M  " > file_list
	select i in $$(cat file_list);\
	do break; done 
	LOWER=$$(grep -n $$i file_list  | cut -d":" -f1)
	UPPER=$$(cat file_list | head -n $$LOWER | sort -r | grep BKP -n | cut -d":" -f1 | head -n1)
	FILE_LIST=$$(sed -n "$$(( $$LOWER - $$UPPER + 1)),$$LOWER p" file_list | sed 's|.*|:$(ARCHIVE_DEST)/&|')
	rm file_list
	rsync -avz $(USER)@$(HOST)$$FILE_LIST .
.ONESHELL:
extract:
	tar xvf *BKP.tar.gz
	for i in $$( ls *.gz | sort -M | grep -v "BKP.tar.gz") ; do 
	echo $i; 
	if [ -f CHANGE_LOG ];then 
	mv CHANGE_LOG FILE_LOG; 
	fi 
	echo Extracting : $$i
	tar xvf $$i ;
	for j in $$(diff  FILE_LOG CHANGE_LOG | grep '<' | cut -d" " -f2);do
	mv $$j $$(dirname $$j)/_DELETED_$$(basename $$j);
	done
	done
	rm CHANGE_LOG FILE_LOG;

tools:
	apt install bleachbit gpg tar rsync
clean:
	bleachbit -s *.tar.gz 

This is the script file i created lets break down it for analysis

Initially we set some variable such as the remote server setting and the recipient address by which the encryption is done . we can change these variable for different folders to get different results Other variable are

ZIP:=$(wildcard *.gz)  $(wildcard *.gz.gpg)
LIST:=$(filter-out archive Makefile FILE_LOG CHANGE_LOG $(ZIP),$(shell find . -type f -printf "%P\n" | sort ))
DIR:=$(shell find . -type d)
VAPTH:=$(DIR)

TIMESTAMP=$(shell date '+%Y_%b_%d_%H_%M')

The wildcard and shell are built in make functions , $(..) returns the result of the command just as shell script . wildcard is return the file in this directory which matches the pattern the shell function executes a shell command , filter-out is another function it removes the specified string from the list . VPATH is a special variable in make which specifies the locations were the dependency should be searched for other than the current directory . ZIP : contains all the files containing the pattern *.gz and *.gz.gpg ie, all the zip files

LIST : contains all the files in the current list some values are removes from the list because this list is further used in the script to create the archive and we do not need them to contain our previous backed up archive files and other file , the filter-out function filters out these values . shell function are used to execute a find command which prints all the file in the directory recursively the normally find prints files starting with ./ and we don’t need them , so we use the ptinf function of find to print without the ./ prefix . printf is a nice feature of find which allow us to print our find result in many awesome way . DIR : contains all the list of the directories TIMESTAMP : time stamp for archive

archive: $(LIST)
	@echo $(LIST) | tr ' ' '\n' > CHANGE_LOG
	tar cvfz $(TIMESTAMP).tar.gz $? CHANGE_LOG && rm CHANGE_LOG
	touch archive

sync: encrypt
	rsync -avz *.tar.gz.gpg $(USER)@$(HOST):$(ARCHIVE_DEST)

encrypt:
	gpg --encrypt-files --recipient $(RECIPIENT)  *.tar.gz

archive_full:
	@echo $(LIST) | tr ' ' '\n' > FILE_LOG
	tar cvfz  $(TIMESTAMP)_BKP.tar.gz $(LIST) FILE_LOG && rm FILE_LOG

as i have told earlier the syntax of the make contains the target , dependency and the command the first target is not an action unlike all the others because there exist a file named archive . This is a neat trick we create the archive and the archive file is created when we run the make again it checks whether the files in the LIST which will be all the files in the directory have last change time greater than the time of the archive file which is the time the last archive command was run . There are many Automatic Variables in make which have special run time values the $? contains all the list of files which have changed thus this list is passed to the tar command which archive only those files which have changed from the last archive . usually make print the command it is going to run , @ suppresses this behaviour . it also creates a list of the current files and saves to the file named CHANGELOG and also archives this file there are other target such as encrypt which encrypts all the archive in the current folder . and archivefull which archives all the files the sync target syncs the encrypted archives to the specified destination . This is the archival part which is responsible for the backup.

list:
	ssh $(USER)@$(HOST) "ls $(ARCHIVE_DEST) | sort -M  "

recover: download_archive decrypt

decrypt:
	gpg --decrypt-files *.gpg

.ONESHELL:
SHELL=/bin/bash
download_archive:
	ssh $(USER)@$(HOST) "ls $(ARCHIVE_DEST) | sort -M  " > file_list
	select i in $$(cat file_list);\
	do break; done 
	LOWER=$$(grep -n $$i file_list  | cut -d":" -f1)
	UPPER=$$(cat file_list | head -n $$LOWER | sort -r | grep BKP -n | cut -d":" -f1 | head -n1)
	FILE_LIST=$$(sed -n "$$(( $$LOWER - $$UPPER + 1)),$$LOWER p" file_list | sed 's|.*|:$(ARCHIVE_DEST)/&|')
	rm file_list
	rsync -avz $(USER)@$(HOST)$$FILE_LIST .
.ONESHELL:
extract:
	tar xvf *BKP.tar.gz
	for i in $$( ls *.gz | sort -M | grep -v "BKP.tar.gz") ; do 
	echo $i; 
	if [ -f CHANGE_LOG ];then 
	mv CHANGE_LOG FILE_LOG; 
	fi 
	echo Extracting : $$i
	tar xvf $$i ;
	for j in $$(diff  FILE_LOG CHANGE_LOG | grep '<' | cut -d" " -f2);do
	mv $$j $$(dirname $$j)/_DELETED_$$(basename $$j);
	done
	done
	rm CHANGE_LOG FILE_LOG;

This part downloads the archive from the remote server , decrypts the files and extract’s it the list : target prints the current available archives in the remote server. download_archive : lists the all the available archive files and reads the name of one archive . this is done using bash select , there are two types of backup one which only contains the files which have changed from the last time the make archive was run and other is the full backup which is the snapshot of the files at that point of time . when we select a minor backup the script also download all the preceding archives till the last full snapshot . This is done using some UNIX shell wizardry . extract target extracts all the tar files . when the archives are created a files is appended which contains current directory structure and when the extraction takes place are all the archives are extracted it is like layers of the onions . sometimes some files will be deleted in course of time and when we extract the archives files reappear , when the extraction of the archive takes place if the file list does not contain a file which is already present in the directory it means that it was deleted , these files gets renames to files which contains a DELETED prefix , this makes it simple to delete those files a simple find command 'find . -name "*DELETED*" -delete' will delete all the files . also it enables easy recovery of files .

and:
	@echo
clean:
	bleachbit -s *.tar.gz 

clean target deletes all the unencrypted backups and targets just prints a new line . it just an empty target but enables me to run make in a fun and readable way like ‘make archive and sync’ it just make sense !

This approach have many advantage first if enables the encrypted storage of files the files are encrypted in our computer with our own key . It reduces space needed for archive because only changed file part is uploaded , Best part it uses rsync . we can recover deleted files from previous archives . The best part is the discovery of something new.