This html is knitted from RMarkdown on the teaching server.
Linux is the most common platform for scientific computing and deployment of data science tools.
Open source and community support.
Things break; when they break using Linux, it’s easy to fix.
Scalability: portable devices (Android, iOS), laptops, servers, clusters, and super computers.
Cost: it’s free!
Debian/Ubuntu is a popular choice for personal computers.
RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 6.10 (as of 2020-01-03).
MacOS was originally derived from Unix/Linux (Darwin kernel). It is POSIX compliant. Most shell commands we review here apply to MacOS terminal as well. Windows/DOS, unfortunately, is a totally different breed.
Show distribution/version on Linux:
cat /etc/*-release
CentOS Linux release 7.9.2009 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.9.2009 (Core)
CentOS Linux release 7.9.2009 (Core)
Show distribution/version on MacOS:
sw_vers -productVersion
or
system_profiler SPSoftwareDataType
A shell translates commands to OS instructions.
Most commonly used shells include bash
, csh
, tcsh
, zsh
, etc.
The default shell in MacOS changed from bash
to zsh
since MacOS v10.15.
Sometimes a command and a script does not run simply because it’s written for another shell.
We mostly use bash
shell commands in this class.
Determine the current shell:
echo $SHELL
/bin/bash
List available shells:
cat /etc/shells
/bin/sh
/bin/bash
/usr/bin/sh
/usr/bin/bash
Change to another shell:
exec bash -l
The -l
option indicates it should be a login shell.
Change your login shell permanently:
chsh -s /bin/bash userid
Then log out and log in.
We can navigate to previous/next commands by the upper and lower keys, or maintain a command history stack using pushd
and popd
commands.
Bash provides the following standard completion for the Linux users by default. Much less typing errors and time!
Pathname completion.
Filename completion.
Variablename completion: echo $[TAB][TAB]
.
Username completion: cd ~[TAB][TAB]
.
Hostname completion ssh huazhou@[TAB][TAB]
.
It can also be customized to auto-complete other stuff such as options and command’s arguments. Google bash completion
for more information.
man
is man’s best friendOnline help for shell commands: man [COMMANDNAME]
.
# display documentation for the ls command
man ls
LS(1) User Commands LS(1)
NAME
ls - list directory contents
SYNOPSIS
ls [OPTION]... [FILE]...
DESCRIPTION
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is speci‐
fied.
Mandatory arguments to long options are mandatory for short options
too.
-a, --all
do not ignore entries starting with .
-A, --almost-all
do not list implied . and ..
--author
with -l, print the author of each file
-b, --escape
print C-style escapes for nongraphic characters
--block-size=SIZE
scale sizes by SIZE before printing them; e.g., '--block-size=M'
prints sizes in units of 1,048,576 bytes; see SIZE format below
-B, --ignore-backups
do not list implied entries ending with ~
-c with -lt: sort by, and show, ctime (time of last modification of
file status information); with -l: show ctime and sort by name;
otherwise: sort by ctime, newest first
-C list entries by columns
--color[=WHEN]
colorize the output; WHEN can be 'never', 'auto', or 'always'
(the default); more info below
-d, --directory
list directories themselves, not their contents
-D, --dired
generate output designed for Emacs' dired mode
-f do not sort, enable -aU, disable -ls --color
-F, --classify
append indicator (one of */=>@|) to entries
--file-type
likewise, except do not append '*'
--format=WORD
across -x, commas -m, horizontal -x, long -l, single-column -1,
verbose -l, vertical -C
--full-time
like -l --time-style=full-iso
-g like -l, but do not list owner
--group-directories-first
group directories before files;
can be augmented with a --sort option, but any use of
--sort=none (-U) disables grouping
-G, --no-group
in a long listing, don't print group names
-h, --human-readable
with -l, print sizes in human readable format (e.g., 1K 234M 2G)
--si likewise, but use powers of 1000 not 1024
-H, --dereference-command-line
follow symbolic links listed on the command line
--dereference-command-line-symlink-to-dir
follow each command line symbolic link
that points to a directory
--hide=PATTERN
do not list implied entries matching shell PATTERN (overridden
by -a or -A)
--indicator-style=WORD
append indicator with style WORD to entry names: none (default),
slash (-p), file-type (--file-type), classify (-F)
-i, --inode
print the index number of each file
-I, --ignore=PATTERN
do not list implied entries matching shell PATTERN
-k, --kibibytes
default to 1024-byte blocks for disk usage
-l use a long listing format
-L, --dereference
when showing file information for a symbolic link, show informa‐
tion for the file the link references rather than for the link
itself
-m fill width with a comma separated list of entries
-n, --numeric-uid-gid
like -l, but list numeric user and group IDs
-N, --literal
print raw entry names (don't treat e.g. control characters spe‐
cially)
-o like -l, but do not list group information
-p, --indicator-style=slash
append / indicator to directories
-q, --hide-control-chars
print ? instead of nongraphic characters
--show-control-chars
show nongraphic characters as-is (the default, unless program is
'ls' and output is a terminal)
-Q, --quote-name
enclose entry names in double quotes
--quoting-style=WORD
use quoting style WORD for entry names: literal, locale, shell,
shell-always, c, escape
-r, --reverse
reverse order while sorting
-R, --recursive
list subdirectories recursively
-s, --size
print the allocated size of each file, in blocks
-S sort by file size
--sort=WORD
sort by WORD instead of name: none (-U), size (-S), time (-t),
version (-v), extension (-X)
--time=WORD
with -l, show time as WORD instead of default modification time:
atime or access or use (-u) ctime or status (-c); also use spec‐
ified time as sort key if --sort=time
--time-style=STYLE
with -l, show times using style STYLE: full-iso, long-iso, iso,
locale, or +FORMAT; FORMAT is interpreted like in 'date'; if
FORMAT is FORMAT1<newline>FORMAT2, then FORMAT1 applies to
non-recent files and FORMAT2 to recent files; if STYLE is pre‐
fixed with 'posix-', STYLE takes effect only outside the POSIX
locale
-t sort by modification time, newest first
-T, --tabsize=COLS
assume tab stops at each COLS instead of 8
-u with -lt: sort by, and show, access time; with -l: show access
time and sort by name; otherwise: sort by access time
-U do not sort; list entries in directory order
-v natural sort of (version) numbers within text
-w, --width=COLS
assume screen width instead of current value
-x list entries by lines instead of by columns
-X sort alphabetically by entry extension
-1 list one file per line
SELinux options:
--lcontext
Display security context. Enable -l. Lines will probably be
too wide for most displays.
-Z, --context
Display security context so it fits on most displays. Displays
only mode, user, group, security context and file name.
--scontext
Display only security context and file name.
--help display this help and exit
--version
output version information and exit
SIZE is an integer and optional unit (example: 10M is 10*1024*1024).
Units are K, M, G, T, P, E, Z, Y (powers of 1024) or KB, MB, ... (pow‐
ers of 1000).
Using color to distinguish file types is disabled both by default and
with --color=never. With --color=auto, ls emits color codes only when
standard output is connected to a terminal. The LS_COLORS environment
variable can change the settings. Use the dircolors command to set it.
Exit status:
0 if OK,
1 if minor problems (e.g., cannot access subdirectory),
2 if serious trouble (e.g., cannot access command-line argument).
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Report ls translation bugs to <http://translationproject.org/team/>
AUTHOR
Written by Richard M. Stallman and David MacKenzie.
COPYRIGHT
Copyright © 2013 Free Software Foundation, Inc. License GPLv3+: GNU
GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
The full documentation for ls is maintained as a Texinfo manual. If
the info and ls programs are properly installed at your site, the com‐
mand
info coreutils 'ls invocation'
should give you access to the complete manual.
GNU coreutils 8.22 November 2020 LS(1)
cat
prints the contents of a file:
cat runSim.R
## parsing command arguments
for (arg in commandArgs(TRUE)) {
eval(parse(text=arg))
}
## check if a given integer is prime
isPrime = function(n) {
if (n <= 3) {
return (TRUE)
}
if (any((n %% 2:floor(sqrt(n))) == 0)) {
return (FALSE)
}
return (TRUE)
}
## estimate mean only using observation with prime indices
estMeanPrimes = function (x) {
n = length(x)
ind = sapply(1:n, isPrime)
return (mean(x[ind]))
}
# simulate data
x = rnorm(n)
# estimate mean
estMeanPrimes(x)
head
prints the first 10 lines of a file:
head runSim.R
## parsing command arguments
for (arg in commandArgs(TRUE)) {
eval(parse(text=arg))
}
## check if a given integer is prime
isPrime = function(n) {
if (n <= 3) {
return (TRUE)
}
head -l
prints the first \(l\) lines of a file:
head -15 runSim.R
## parsing command arguments
for (arg in commandArgs(TRUE)) {
eval(parse(text=arg))
}
## check if a given integer is prime
isPrime = function(n) {
if (n <= 3) {
return (TRUE)
}
if (any((n %% 2:floor(sqrt(n))) == 0)) {
return (FALSE)
}
return (TRUE)
}
tail
prints the last 10 lines of a file:
tail runSim.R
n = length(x)
ind = sapply(1:n, isPrime)
return (mean(x[ind]))
}
# simulate data
x = rnorm(n)
# estimate mean
estMeanPrimes(x)
tail -l
prints the last \(l\) lines of a file:
tail -15 runSim.R
return (TRUE)
}
## estimate mean only using observation with prime indices
estMeanPrimes = function (x) {
n = length(x)
ind = sapply(1:n, isPrime)
return (mean(x[ind]))
}
# simulate data
x = rnorm(n)
# estimate mean
estMeanPrimes(x)
|
sends output from one command as input of another command.
ls -l | head -5
total 5304
-rw-rw-r--. 1 huazhou huazhou 258 Jan 7 23:30 autoSim.R
-rw-rw-r--. 1 huazhou huazhou 110345 Jan 7 23:30 Emacs_Reference_Card.pdf
-rw-rw-r--. 1 huazhou huazhou 157353 Jan 7 23:30 IDRE_Winter_2019_Workshops.pdf
-rw-rw-r--. 1 huazhou huazhou 321281 Jan 7 23:30 key_authentication_1.png
>
directs output from one command to a file.
>>
appends output from one command to a file.
<
reads input from a file.
Combinations of shell commands (grep
, sed
, awk
, …), piping and redirection, and regular expressions allow us pre-process and reformat huge text files efficiently.
See HW1.
less
is more; more
is lessmore
browses a text file screen by screen (only downwards). Scroll down one page (paging) by pressing the spacebar; exit by pressing the q
key.
less
is also a pager, but has more functionalities, e.g., scroll upwards and downwards through the input.
less
doesn’t need to read the whole file, i.e., it loads files faster than more
.
grep
grep
prints lines that match an expression:
Show lines that contain string CentOS
:
# quotes not necessary if not a regular expression
grep 'CentOS' linux.Rmd
- RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
- The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 6.10 (as of 2020-01-03).
- Show lines that contain string `CentOS`:
grep 'CentOS' linux.Rmd
grep 'CentOS' *.Rmd
grep -n 'CentOS' linux.Rmd
- Replace `CentOS` by `RHEL` in a text file:
sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
Search multiple text files:
grep 'CentOS' *.Rmd
- RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
- The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 6.10 (as of 2020-01-03).
- Show lines that contain string `CentOS`:
grep 'CentOS' linux.Rmd
grep 'CentOS' *.Rmd
grep -n 'CentOS' linux.Rmd
- Replace `CentOS` by `RHEL` in a text file:
sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
Show matching line numbers:
grep -n 'CentOS' linux.Rmd
37:- RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
43:- The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 6.10 (as of 2020-01-03).
333:- Show lines that contain string `CentOS`:
336: grep 'CentOS' linux.Rmd
341: grep 'CentOS' *.Rmd
346: grep -n 'CentOS' linux.Rmd
363:- Replace `CentOS` by `RHEL` in a text file:
365: sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
Find all files in current directory with .png
extension:
ls | grep '.png$'
key_authentication_1.png
key_authentication_2.png
linux_directory_structure.png
linux_filepermission_oct.png
linux_filepermission.png
redhat_kills_centos.png
Richard_Stallman_2013.png
screenshot_top.png
Find all directories in the current directory:
ls -al | grep '^d'
drwxrwxr-x. 2 huazhou huazhou 4096 Jan 12 21:12 .
drwxrwxr-x. 6 huazhou huazhou 102 Jan 11 19:17 ..
sed
sed
is a stream editor.
Replace CentOS
by RHEL
in a text file:
sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
- RHEL/RHEL is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
- The teaching server for this class runs RHEL 7. UCLA Hoffman2 cluster runs CentOS 6.10 (as of 2020-01-03).
- Show lines that contain string `RHEL`:
grep 'RHEL' linux.Rmd
grep 'RHEL' *.Rmd
grep -n 'RHEL' linux.Rmd
- Replace `RHEL` by `RHEL` in a text file:
sed 's/RHEL/RHEL/' linux.Rmd | grep RHEL
awk
awk
is a filter and report writer.
First let’s display first lines of the file /etc/passwd
:
head /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
Each line contains fields (1) user name, (2) password, (3) user ID, (4) group ID, (5) user ID info, (6) home directory, and (7) command shell, spearated by :
.
Print sorted list of login names:
awk -F: '{ print $1 }' /etc/passwd | sort | head -10
adm
ajedward
alvinw
andyliugraduateschool
angelicaruiz
asburysean
avila.analissa
awollum
bin
brett.young
Print number of lines in a file, as NR
stands for Number of Rows:
awk 'END { print NR }' /etc/passwd
71
or
wc -l /etc/passwd
71 /etc/passwd
or (not displaying file name)
wc -l < /etc/passwd
71
Print login names with UID in range 1000-1035
:
awk -F: '{if ($3 >= 1000 && $3 <= 1035) print}' /etc/passwd
huazhou:x:1000:1001::/home/huazhou:/bin/bash
valentina214:x:1001:1003::/home/valentina214:/bin/bash
asburysean:x:1002:1004::/home/asburysean:/bin/bash
avila.analissa:x:1003:1005::/home/avila.analissa:/bin/bash
jihcai:x:1004:1006::/home/jihcai:/bin/bash
lchen121:x:1005:1007::/home/lchen121:/bin/bash
mia.chen1998:x:1006:1008::/home/mia.chen1998:/bin/bash
zding2016:x:1007:1009::/home/zding2016:/bin/bash
ajedward:x:1008:1010::/home/ajedward:/bin/bash
kgfeng:x:1009:1011::/home/kgfeng:/bin/bash
filipovicmilan271:x:1010:1012::/home/filipovicmilan271:/bin/bash
pdgeraldo:x:1011:1013::/home/pdgeraldo:/bin/bash
OLIVIAGOLSTON:x:1012:1014::/home/OLIVIAGOLSTON:/bin/bash
lhashemi:x:1013:1015::/home/lhashemi:/bin/bash
ionahu08:x:1014:1016::/home/ionahu08:/bin/bash
kmishimoto:x:1015:1017::/home/kmishimoto:/bin/bash
haoyunj:x:1016:1018::/home/haoyunj:/bin/bash
lnlesko:x:1017:1019::/home/lnlesko:/bin/bash
kmli:x:1018:1020::/home/kmli:/bin/bash
liqiao93:x:1019:1021::/home/liqiao93:/bin/bash
xinyang43:x:1020:1022::/home/xinyang43:/bin/bash
andyliugraduateschool:x:1021:1023::/home/andyliugraduateschool:/bin/bash
jpan1:x:1022:1024::/home/jpan1:/bin/bash
angelicaruiz:x:1023:1025::/home/angelicaruiz:/bin/bash
r1cardosf86:x:1024:1026::/home/r1cardosf86:/bin/bash
dshehtanian:x:1025:1027::/home/dshehtanian:/bin/bash
jtan.smeltzer:x:1026:1028::/home/jtan.smeltzer:/bin/bash
msoohoo1:x:1027:1029::/home/msoohoo1:/bin/bash
spendlove.sj:x:1028:1030::/home/spendlove.sj:/bin/bash
tibbe1td:x:1029:1031::/home/tibbe1td:/bin/bash
viviantruong:x:1030:1032::/home/viviantruong:/bin/bash
jdverajones:x:1031:1033::/home/jdverajones:/bin/bash
alvinw:x:1032:1034::/home/alvinw:/bin/bash
rlwilliams34:x:1033:1035::/home/rlwilliams34:/bin/bash
awollum:x:1034:1036::/home/awollum:/bin/bash
buwenson:x:1035:1037::/home/buwenson:/bin/bash
Print login names and log-in shells in comma-seperated format:
awk -F: '{OFS = ","} {print $1, $7}' /etc/passwd
root,/bin/bash
bin,/sbin/nologin
daemon,/sbin/nologin
adm,/sbin/nologin
lp,/sbin/nologin
sync,/bin/sync
shutdown,/sbin/shutdown
halt,/sbin/halt
mail,/sbin/nologin
operator,/sbin/nologin
games,/sbin/nologin
ftp,/sbin/nologin
nobody,/sbin/nologin
systemd-network,/sbin/nologin
dbus,/sbin/nologin
polkitd,/sbin/nologin
ntp,/sbin/nologin
sshd,/sbin/nologin
postfix,/sbin/nologin
chrony,/sbin/nologin
huazhou,/bin/bash
tss,/sbin/nologin
rstudio-server,/bin/bash
shiny,/bin/sh
valentina214,/bin/bash
asburysean,/bin/bash
avila.analissa,/bin/bash
jihcai,/bin/bash
lchen121,/bin/bash
mia.chen1998,/bin/bash
zding2016,/bin/bash
ajedward,/bin/bash
kgfeng,/bin/bash
filipovicmilan271,/bin/bash
pdgeraldo,/bin/bash
OLIVIAGOLSTON,/bin/bash
lhashemi,/bin/bash
ionahu08,/bin/bash
kmishimoto,/bin/bash
haoyunj,/bin/bash
lnlesko,/bin/bash
kmli,/bin/bash
liqiao93,/bin/bash
xinyang43,/bin/bash
andyliugraduateschool,/bin/bash
jpan1,/bin/bash
angelicaruiz,/bin/bash
r1cardosf86,/bin/bash
dshehtanian,/bin/bash
jtan.smeltzer,/bin/bash
msoohoo1,/bin/bash
spendlove.sj,/bin/bash
tibbe1td,/bin/bash
viviantruong,/bin/bash
jdverajones,/bin/bash
alvinw,/bin/bash
rlwilliams34,/bin/bash
awollum,/bin/bash
buwenson,/bin/bash
yingyanwu,/bin/bash
wzdbruins318,/bin/bash
yuhaoyin,/bin/bash
brett.young,/bin/bash
zhaofan1996,/bin/bash
hunanzhou,/bin/bash
linyuzhou,/bin/bash
jyixzhou,/bin/bash
zianzhuang,/bin/bash
germc3,/bin/bash
elviscuihan,/bin/bash
shhe,/bin/bash
Print login names and indicate those with UID>1000 as vip
:
awk -F: -v status="" '{OFS = ","}
{if ($3 >= 1000) status="vip"; else status="regular"}
{print $1, status}' /etc/passwd
root,regular
bin,regular
daemon,regular
adm,regular
lp,regular
sync,regular
shutdown,regular
halt,regular
mail,regular
operator,regular
games,regular
ftp,regular
nobody,regular
systemd-network,regular
dbus,regular
polkitd,regular
ntp,regular
sshd,regular
postfix,regular
chrony,regular
huazhou,vip
tss,regular
rstudio-server,regular
shiny,regular
valentina214,vip
asburysean,vip
avila.analissa,vip
jihcai,vip
lchen121,vip
mia.chen1998,vip
zding2016,vip
ajedward,vip
kgfeng,vip
filipovicmilan271,vip
pdgeraldo,vip
OLIVIAGOLSTON,vip
lhashemi,vip
ionahu08,vip
kmishimoto,vip
haoyunj,vip
lnlesko,vip
kmli,vip
liqiao93,vip
xinyang43,vip
andyliugraduateschool,vip
jpan1,vip
angelicaruiz,vip
r1cardosf86,vip
dshehtanian,vip
jtan.smeltzer,vip
msoohoo1,vip
spendlove.sj,vip
tibbe1td,vip
viviantruong,vip
jdverajones,vip
alvinw,vip
rlwilliams34,vip
awollum,vip
buwenson,vip
yingyanwu,vip
wzdbruins318,vip
yuhaoyin,vip
brett.young,vip
zhaofan1996,vip
hunanzhou,vip
linyuzhou,vip
jyixzhou,vip
zianzhuang,vip
germc3,vip
elviscuihan,vip
shhe,vip
Emacs
is a powerful text editor with extensive support for many languages including R
, \(\LaTeX\), python
, and C/C++
; however it’s not installed by default on many Linux distributions.
Basic survival commands:
emacs filename
to open a file with emacs.CTRL-x CTRL-f
to open an existing or new file.CTRL-x CTRX-s
to save.CTRL-x CTRL-w
to save as.CTRL-x CTRL-c
to quit.Google emacs cheatsheet
C-<key>
means hold the control
key, and press <key>
.
M-<key>
means press the Esc
key once, and press <key>
.
Vi
is ubiquitous (POSIX standard). Learn at least its basics; otherwise you can edit nothing on some clusters.
Basic survival commands:
vi filename
to start editing a file.vi
is a modal editor: insert mode and normal mode. Pressing i
switches from the normal mode to insert mode. Pressing ESC
switches from the insert mode to normal mode.:x<Return>
quits vi
and saves changes.:q!<Return>
quits vi without saving latest changes.:w<Return>
saves changes.:wq<Return>
quits vi
and saves changes.Google vi cheatsheet
Statisticians/data scientists write a lot of code. Critical to adopt a good IDE that goes beyond code editing: syntax highlighting, executing code within editor, debugging, profiling, version control, etc.
R Studio, Eclipse, Emacs, Matlab, Visual Studio, etc.
Ctrl+C
to cancel a non-responding or long-running program.OS runs processes on behalf of user.
Each process has Process ID (PID), Username (UID), Parent process ID (PPID), Time and data process started (STIME), time running (TIME), etc.
ps
PID TTY TIME CMD
6257 ? 00:00:08 rsession
8371 ? 00:00:00 R
8490 ? 00:00:00 sh
8491 ? 00:00:00 ps
All current running processes:
ps -eaf
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jan03 ? 00:00:49 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
root 2 0 0 Jan03 ? 00:00:00 [kthreadd]
root 4 2 0 Jan03 ? 00:00:00 [kworker/0:0H]
root 5 2 0 Jan03 ? 00:00:03 [kworker/u8:0]
root 6 2 0 Jan03 ? 00:00:01 [ksoftirqd/0]
root 7 2 0 Jan03 ? 00:00:00 [migration/0]
root 8 2 0 Jan03 ? 00:00:00 [rcu_bh]
root 9 2 0 Jan03 ? 00:01:59 [rcu_sched]
root 10 2 0 Jan03 ? 00:00:00 [lru-add-drain]
root 11 2 0 Jan03 ? 00:00:03 [watchdog/0]
root 12 2 0 Jan03 ? 00:00:03 [watchdog/1]
root 13 2 0 Jan03 ? 00:00:00 [migration/1]
root 14 2 0 Jan03 ? 00:00:00 [ksoftirqd/1]
root 16 2 0 Jan03 ? 00:00:00 [kworker/1:0H]
root 17 2 0 Jan03 ? 00:00:03 [watchdog/2]
root 18 2 0 Jan03 ? 00:00:01 [migration/2]
root 19 2 0 Jan03 ? 00:00:02 [ksoftirqd/2]
root 21 2 0 Jan03 ? 00:00:00 [kworker/2:0H]
root 22 2 0 Jan03 ? 00:00:03 [watchdog/3]
root 23 2 0 Jan03 ? 00:00:01 [migration/3]
root 24 2 0 Jan03 ? 00:00:00 [ksoftirqd/3]
root 26 2 0 Jan03 ? 00:00:00 [kworker/3:0H]
root 28 2 0 Jan03 ? 00:00:00 [kdevtmpfs]
root 29 2 0 Jan03 ? 00:00:00 [netns]
root 30 2 0 Jan03 ? 00:00:00 [khungtaskd]
root 31 2 0 Jan03 ? 00:00:00 [writeback]
root 32 2 0 Jan03 ? 00:00:00 [kintegrityd]
root 33 2 0 Jan03 ? 00:00:00 [bioset]
root 34 2 0 Jan03 ? 00:00:00 [bioset]
root 35 2 0 Jan03 ? 00:00:00 [bioset]
root 36 2 0 Jan03 ? 00:00:00 [kblockd]
root 37 2 0 Jan03 ? 00:00:00 [md]
root 38 2 0 Jan03 ? 00:00:00 [edac-poller]
root 39 2 0 Jan03 ? 00:00:00 [watchdogd]
root 49 2 0 Jan03 ? 00:00:05 [kswapd0]
root 50 2 0 Jan03 ? 00:00:00 [ksmd]
root 51 2 0 Jan03 ? 00:00:05 [khugepaged]
root 52 2 0 Jan03 ? 00:00:00 [crypto]
root 60 2 0 Jan03 ? 00:00:00 [kthrotld]
root 61 2 0 Jan03 ? 00:00:00 [kmpath_rdacd]
root 62 2 0 Jan03 ? 00:00:00 [kaluad]
root 63 2 0 Jan03 ? 00:00:00 [kpsmoused]
root 65 2 0 Jan03 ? 00:00:00 [ipv6_addrconf]
root 78 2 0 Jan03 ? 00:00:00 [deferwq]
root 138 2 0 Jan03 ? 00:01:24 [kauditd]
root 192 2 0 Jan03 ? 00:00:00 [virtscsi-scan]
root 193 2 0 Jan03 ? 00:00:00 [scsi_eh_0]
root 195 2 0 Jan03 ? 00:00:00 [scsi_tmf_0]
root 198 2 0 Jan03 ? 00:00:08 [kworker/u8:2]
root 252 2 0 Jan03 ? 00:00:00 [bioset]
root 253 2 0 Jan03 ? 00:00:00 [xfsalloc]
root 254 2 0 Jan03 ? 00:00:00 [xfs_mru_cache]
root 255 2 0 Jan03 ? 00:00:00 [xfs-buf/sda2]
root 256 2 0 Jan03 ? 00:00:00 [xfs-data/sda2]
root 257 2 0 Jan03 ? 00:00:00 [xfs-conv/sda2]
root 258 2 0 Jan03 ? 00:00:00 [xfs-cil/sda2]
root 259 2 0 Jan03 ? 00:00:00 [xfs-reclaim/sda]
root 260 2 0 Jan03 ? 00:00:00 [xfs-log/sda2]
root 261 2 0 Jan03 ? 00:00:00 [xfs-eofblocks/s]
root 262 2 0 Jan03 ? 00:05:34 [xfsaild/sda2]
root 263 2 0 Jan03 ? 00:00:06 [kworker/0:1H]
root 326 1 0 Jan03 ? 00:07:24 /usr/lib/systemd/systemd-journald
root 354 1 0 Jan03 ? 00:00:00 /usr/lib/systemd/systemd-udevd
root 381 2 0 Jan03 ? 00:00:00 [hwrng]
root 437 2 0 Jan03 ? 00:00:00 [nfit]
root 456 1 0 Jan03 ? 00:02:04 /sbin/auditd
polkitd 495 1 0 Jan03 ? 00:00:02 /usr/lib/polkit-1/polkitd --no-debug
dbus 499 1 0 Jan03 ? 00:00:07 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
chrony 503 1 0 Jan03 ? 00:00:00 /usr/sbin/chronyd
root 504 1 0 Jan03 ? 00:00:00 /usr/sbin/acpid
root 528 1 0 Jan03 ttyS0 00:00:00 /sbin/agetty --keep-baud 115200,38400,9600 ttyS0 vt220
root 529 1 0 Jan03 tty1 00:00:00 /sbin/agetty --noclear tty1 linux
root 535 1 0 Jan03 ? 00:00:01 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid
root 556 1 0 Jan03 ? 00:00:18 /usr/sbin/NetworkManager --no-daemon
root 616 2 0 Jan03 ? 00:00:00 [kworker/2:1H]
root 682 556 0 Jan03 ? 00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-4580a89d-062a-4ddc-a6fb-53bc2caf2784-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0
root 961 1 0 Jan03 ? 00:01:39 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
root 962 1 0 Jan03 ? 00:03:15 /usr/bin/google_osconfig_agent
root 963 1 0 Jan03 ? 00:04:27 /usr/sbin/rsyslogd -n
root 970 1 0 Jan03 ? 00:02:40 /usr/bin/google_guest_agent
root 1320 1 0 Jan03 ? 00:00:05 /usr/libexec/postfix/master -w
postfix 1328 1320 0 Jan03 ? 00:00:01 qmgr -l -t unix -u
root 1341 1 0 Jan03 ? 00:00:05 /usr/lib/systemd/systemd-logind
root 1351 1 0 Jan03 ? 00:00:02 /usr/sbin/crond -n
root 1370 2 0 Jan03 ? 00:00:00 [kworker/3:1H]
root 1386 2 0 Jan03 ? 00:00:00 [kworker/1:1H]
root 1760 2 0 17:06 ? 00:00:00 [kworker/2:0]
root 4377 2 0 19:15 ? 00:00:00 [kworker/2:1]
root 4760 9430 0 19:46 ? 00:00:00 sshd: jdverajones [priv]
jdveraj+ 4767 4760 0 19:46 ? 00:00:00 sshd: jdverajones@pts/0
jdveraj+ 4768 4767 0 19:46 pts/0 00:00:00 -bash
root 5488 9430 0 20:20 ? 00:00:00 sshd: jdverajones [priv]
jdveraj+ 5493 5488 0 20:21 ? 00:00:00 sshd: jdverajones@pts/1
jdveraj+ 5494 5493 0 20:21 pts/1 00:00:00 -bash
postfix 6117 1320 0 20:51 ? 00:00:00 pickup -l -t unix -u
root 6182 2 0 Jan11 ? 00:00:00 [kworker/1:0]
root 6201 2 0 20:54 ? 00:00:00 [kworker/3:2]
huazhou 6257 14144 0 20:57 ? 00:00:08 /usr/lib/rstudio-server/bin/rsession -u huazhou --session-use-secure-cookies 0 --launcher-token B78F7F16 --r-restore-workspace 2 --r-run-rprofile 2
root 7030 2 0 21:00 ? 00:00:00 [kworker/3:1]
root 7126 2 0 21:01 ? 00:00:00 [kworker/0:1]
root 7790 2 0 01:04 ? 00:00:10 [kworker/0:2]
root 8005 2 0 21:10 ? 00:00:00 [kworker/3:0]
root 8366 9430 0 21:12 ? 00:00:00 sshd: [accepted]
sshd 8367 8366 0 21:12 ? 00:00:00 sshd: [net]
root 8368 9430 0 21:12 ? 00:00:00 sshd: root [priv]
sshd 8369 8368 0 21:12 ? 00:00:00 sshd: root [net]
huazhou 8371 6257 90 21:13 ? 00:00:00 /usr/lib64/R/bin/exec/R --slave --no-save --no-restore -e rmarkdown::render('/home/huazhou/ucla-biostat203b-2021winter.github.io/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
root 8428 9430 0 21:13 ? 00:00:00 sshd: unknown [priv]
sshd 8431 8428 0 21:13 ? 00:00:00 sshd: unknown [net]
huazhou 8492 8371 0 21:13 ? 00:00:00 sh -c 'bash' -c 'ps -eaf' 2>&1
huazhou 8493 8492 0 21:13 ? 00:00:00 ps -eaf
root 9430 1 0 Jan04 ? 00:01:23 /usr/sbin/sshd -D
rstudio+ 14144 1 0 Jan03 ? 00:05:08 /usr/lib/rstudio-server/bin/rserver
elviscu+ 15857 1 0 Jan05 ? 00:00:00 ssh-agent -s
elviscu+ 16141 1 0 04:44 ? 00:00:00 ssh-agent -s
elviscu+ 16153 1 0 04:44 ? 00:00:00 ssh-agent -s
andyliu+ 18007 1 0 Jan07 ? 00:00:00 ssh-agent
andyliu+ 18044 1 0 Jan07 ? 00:00:00 ssh-agent
root 26950 2 0 Jan11 ? 00:00:02 [kworker/1:1]
root 32087 1 0 Jan03 ? 00:00:00 /opt/shiny-server/ext/node/bin/shiny-server /opt/shiny-server/lib/main.js
All Python processes:
ps -eaf | grep python
root 535 1 0 Jan03 ? 00:00:01 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid
root 961 1 0 Jan03 ? 00:01:39 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
huazhou 8494 8371 0 21:13 ? 00:00:00 sh -c 'bash' -c 'ps -eaf | grep python' 2>&1
huazhou 8495 8494 0 21:13 ? 00:00:00 bash -c ps -eaf | grep python
huazhou 8497 8495 0 21:13 ? 00:00:00 grep python
Process with PID=1:
ps -fp 1
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jan03 ? 00:00:49 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
All processes owned by a user:
ps -fu huazhou
UID PID PPID C STIME TTY TIME CMD
huazhou 6257 14144 0 20:57 ? 00:00:08 /usr/lib/rstudio-server/bin/rsession -u huazhou --session-use-secure-cookies 0 --launcher-token B78F7F16 --r-restore-workspace 2 --r-run-rprofile 2
huazhou 8371 6257 93 21:13 ? 00:00:00 /usr/lib64/R/bin/exec/R --slave --no-save --no-restore -e rmarkdown::render('/home/huazhou/ucla-biostat203b-2021winter.github.io/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
huazhou 8500 8371 0 21:13 ? 00:00:00 sh -c 'bash' -c 'ps -fu huazhou' 2>&1
huazhou 8501 8500 0 21:13 ? 00:00:00 ps -fu huazhou
Kill process with PID=1001:
kill 1001
Kill all R processes.
killall -r R
top
top
prints realtime process information (very useful).
top
top
program by pressing the q
key.SSH (secure shell) is the dominant cryptographic network protocol for secure network connection via an insecure network.
On Linux or Mac terminal, access the teaching server by
ssh [USERNAME]@server.ucla-biostat-203b.com
Replace above [USERNAME]
by your account user name on teaching server.
For Windows users, there are at least three ways: (1) (highly recommended) Git Bash which is included in Git for Windows, (2) (not recommended) PuTTY program (free), or (3) (may be an overkill for this class) use WSL for Windows to install a full fledged Linux system within Windows.
Key authentication is more secure than password. Most passwords are weak.
Script or a program may need to systematically SSH into other machines.
Log into multiple machines using the same key.
Seamless use of many services: Git/GitHub, AWS or Google cloud service, parallel computing on multiple hosts, Travis CI (continuous integration) etc.
Many servers only allow key authentication and do not accept password authentication.
Public key. Put on the machine(s) you want to log in.
Private key. Put on your own computer. Consider this as the actual key in your pocket; never give private key to others. For fun: https://www.youtube.com/watch?v=S8K464ImU0c
Messages from server to your computer is encrypted with your public key. It can only be decrypted using your private key.
Messages from your computer to server is signed with your private key (digital signatures) and can be verified by anyone who has your public key (authentication).
On Linux, Mac, or Windows Git Bash, to generate a key pair:
ssh-keygen -t rsa -f ~/.ssh/[KEY_FILENAME] -C [USERNAME]
[KEY_FILENAME]
is the name that you want to use for your SSH key files. For example, a filename of id_rsa
generates a private key file named id_rsa
and a public key file named id_rsa.pub
.
[USERNAME]
is the user for whom you will apply this SSH key.
Use a (optional) paraphrase different from password.
Set correct permissions on the .ssh
folder and key files.
~/.ssh
folder should be 700 (drwx------)
.~/.ssh/id_rsa
should be 600 (-rw-------)
.~/.ssh/id_rsa.pub
should be 644 (-rw-r--r--)
.chmod 700 ~/.ssh
chmod 600 ~/.ssh/[KEY_FILENAME]
chmod 644 ~/.ssh/[KEY_FILENAME].pub
Note Windows is different, it doesn’t allow change of permisisons.
Append the public key to the ~/.ssh/authorized_keys
file of any Linux machine we want to SSH to, e.g.,
ssh-copy-id -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.ucla-biostat-203b.com
Make sure the permission of the authorized_keys
file is 600 (-rw-------)
.
Test your new key.
ssh -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.ucla-biostat-203b.com
Now you don’t need password each time you connect from your machine to the teaching server.
If you set paraphrase when generating keys, you’ll be prompted for the paraphrase each time the private key is used. Avoid repeatedly entering the paraphrase by using ssh-agent
on Linux/Mac or Pagent on Windows.
Same key pair can be used between any two machines. We don’t need to regenerate keys for each new connection.
scp
securely transfers files between machines using SSH.
## copy file from local to remote
scp [LOCALFILE] [USERNAME]@server.ucla-biostat-203b.com:/[PATH_TO_FOLDER]
## copy file from remote to local
scp [USERNAME]@server.ucla-biostat-203b.com:/[PATH_TO_FILE] [PATH_TO_LOCAL_FOLDER]
sftp
is FTP via SSH.
Globus
is GUI program for securely transferring files between machines. To use Globus you will have to go to https://www.globus.org/ and login through UCLA by selecting your existing organizational login as UCLA. Then you will need to download their Globus Connect Personal software, then set your laptop as an endpoint. Very detailed instructions can be found at https://www.hoffman2.idre.ucla.edu/file-transfer/globus/.
GUIs for Windows (WinSCP) or Mac (Cyberduck).
You can even use RStudio to upload files to a remote machine with RStudio Server installed.
(Preferred way) Use a version control system (git, svn, cvs, …) to sync project files between different machines and systems.
Windows uses a pair of CR
and LF
for line breaks.
Linux/Unix uses an LF
character only.
MacOS X also uses a single LF
character. But old Mac OS used a single CR
character for line breaks.
If transferred in binary mode (bit by bit) between OSs, a text file could look a mess.
Most transfer programs automatically switch to text mode when transferring text files and perform conversion of line breaks between different OSs; but I used to run into problems using WinSCP. Sometimes you have to tell WinSCP explicitly a text file is being transferred.
Start R in the interactive mode by typing R
in shell.
Then run R script by
source("script.R")
Demo script meanEst.R
implements an (terrible) estimator of mean \[
{\widehat \mu}_n = \frac{\sum_{i=1}^n x_i 1_{i \text{ is prime}}}{\sum_{i=1}^n 1_{i \text{ is prime}}}.
\]
## check if a given integer is prime
isPrime = function(n) {
if (n <= 3) {
return (TRUE)
}
if (any((n %% 2:floor(sqrt(n))) == 0)) {
return (FALSE)
}
return (TRUE)
}
## estimate mean only using observation with prime indices
estMeanPrimes = function (x) {
n = length(x)
ind = sapply(1:n, isPrime)
return (mean(x[ind]))
}
print(estMeanPrimes(rnorm(100000)))
To run your R code non-interactively aka in batch mode, we have at least two options:
# default output to meanEst.Rout
R CMD BATCH meanEst.R
or
# output to stdout
Rscript meanEst.R
Typically automate batch calls using a scripting language, e.g., Python, Perl, and shell script.
Specify arguments in R CMD BATCH
:
R CMD BATCH '--args mu=1 sig=2 kap=3' script.R
Specify arguments in Rscript
:
Rscript script.R mu=1 sig=2 kap=3
Parse command line arguments using magic formula
for (arg in commandArgs(T)) {
eval(parse(text=arg))
}
in R script. After calling the above code, all command line arguments will be available in the global namespace.
To understand the magic formula commandArgs
, run R by:
R '--args mu=1 sig=2 kap=3'
and then issue commands in R
commandArgs()
commandArgs(TRUE)
Understand the magic formula parse
and eval
:
rm(list=ls())
print(x)
Error in print(x): object 'x' not found
parse(text="x=3")
expression(x = 3)
eval(parse(text="x=3"))
print(x)
[1] 3
runSim.R
has components: (1) command argument parser, (2) method implementation, (3) data generator with unspecified parameter n
, and (4) estimation based on generated data.## parsing command arguments
for (arg in commandArgs(TRUE)) {
eval(parse(text=arg))
}
## check if a given integer is prime
isPrime = function(n) {
if (n <= 3) {
return (TRUE)
}
if (any((n %% 2:floor(sqrt(n))) == 0)) {
return (FALSE)
}
return (TRUE)
}
## estimate mean only using observation with prime indices
estMeanPrimes = function (x) {
n = length(x)
ind = sapply(1:n, isPrime)
return (mean(x[ind]))
}
# simulate data
x = rnorm(n)
# estimate mean
estMeanPrimes(x)
Call runSim.R
with sample size n=100
:
R CMD BATCH '--args n=100' runSim.R
or
Rscript runSim.R n=100
[1] 0.3837608
Many statistical computing tasks take long: simulation, MCMC, etc. If we exit Linux when the job is unfinished, the job is killed.
nohup
command in Linux runs program(s) immune to hangups and writes output to nohup.out
by default. Logging out will not kill the process; we can log in later to check status and results.
nohup
is POSIX standard thus available on Linux and MacOS.
Run runSim.R
in background and writes output to nohup.out
:
nohup Rscript runSim.R n=100 &
[1] -0.1170657
The &
at the end of the command instructs Linux to run this command in background, so we gain control of the terminal immediately.
screen
is another popular utility, but not installed by default.
Typical workflow using screen
.
Access remote server using ssh
.
Start jobs in batch mode.
Detach jobs.
Exit from server, wait for jobs to finish.
Access remote server using ssh
.
Re-attach jobs, check on progress, get results, etc.
R in conjuction with nohup
(or screen
) can be used to orchestrate a large simulation study.
It can be more elegant, transparent, and robust to parallelize jobs corresponding to different scenarios (e.g., different generative models) outside of the code used to do statistical computation.
We consider a simulation study in R but the same approach could be used with code written in Julia, Matlab, Python, etc.
Python in many ways makes a better glue.
Suppose we have
runSim.R
which runs a simulation based on command line argument n
.n
values that we want to use in our simulation study.Option 1: manually call runSim.R
for each setting.
Option 2 (smarter): automate calls using R and nohup
.
Let’s demonstrate using the script autoSim.R
cat autoSim.R
# autoSim.R
nVals <- seq(100, 1000, by=100)
for (n in nVals) {
oFile <- paste("n", n, ".txt", sep="")
sysCall <- paste("nohup Rscript runSim.R n=", n, " > ", oFile, sep="")
system(sysCall, wait = FALSE)
print(paste("sysCall=", sysCall, sep=""))
}
Note when we call bash command using the system
function in R, we set optional argument wait=FALSE
so that jobs can be run parallel.
Rscript autoSim.R
[1] "sysCall=nohup Rscript runSim.R n=100 > n100.txt"
[1] "sysCall=nohup Rscript runSim.R n=200 > n200.txt"
[1] "sysCall=nohup Rscript runSim.R n=300 > n300.txt"
[1] "sysCall=nohup Rscript runSim.R n=400 > n400.txt"
[1] "sysCall=nohup Rscript runSim.R n=500 > n500.txt"
[1] "sysCall=nohup Rscript runSim.R n=600 > n600.txt"
[1] "sysCall=nohup Rscript runSim.R n=700 > n700.txt"
[1] "sysCall=nohup Rscript runSim.R n=800 > n800.txt"
[1] "sysCall=nohup Rscript runSim.R n=900 > n900.txt"
[1] "sysCall=nohup Rscript runSim.R n=1000 > n1000.txt"
Now we just need to write a script to collect results from the output files.
Later we will learn how to coordinate large scale computation on UCLA Hoffman2 cluster, using Linux and R scripting.