카테고리 없음2018. 11. 6. 20:11

Lustre system을 구성하다 보면, luster drive 를 umount한 후에도 

남아 있는 luster 관련 module 때문에 시스템의 재부팅이나 종료가 되지 않아 

강제로 전원 버튼을 눌러 종료해야 하는 경우가 많은데요.


이 경우 lustre_rmmod 명령을 사용하면, 모든 lustre module 이 내려가면서 (remove)

이후 재부팅이나 종료가 정상적으로 진행 됩니다. 

참조링크: http://wiki.lustre.org/Starting_and_Stopping_LNet


아래와 같이 아예 reboot 명령을 변경하거나

shutdown_lustre 명령을 생성해 두는 것도 좋을것 같습니다. 




[root@dasandata:~]# rm /usr/sbin/reboot

[root@dasandata:~]#

[root@dasandata:~]# cat << EOF > /usr/sbin/reboot

#!/bin/bash

echo "### Lustre Umounting..."

umount -l /lustre


echo "### All Lustre Module Removing..."

lustre_rmmod


echo "### Shutdown -r now..."

shutdown -r now

EOF

[root@dasandata:~]#




[root@dasandata:~]# cat << EOF > /usr/sbin/shutdown_lustre

#!/bin/bash

echo "### Lustre Umounting..."

umount -l /lustre


echo "### All Lustre Module Removing..."

lustre_rmmod


echo "### Shutdown -h now..."

shutdown -h now

EOF

[root@dasandata:~]# 

[root@dasandata:~]# chmod u+x   /usr/sbin/shutdown_lustre

[root@dasandata:~]# 




Posted by CheekyKite

댓글을 달아 주세요

지식공유2018. 11. 6. 19:51

openhpc - stateful provisioning node 에서 발생한 문제 두 가지


1. nvidia driver 설치 안됨. 

(기타 kernel source를 필요로 하는 패키지들은 모두 같은 문제 예상)


/etc/warewulf/vnfs.conf 에서 /usr/src 가 제외되어 있으므로

설치된 OS 에 kernel source 가 없어 module 생성이 안됨.


[root@c43 ~]# ll /lib/modules/3.10.0-862.14.4.el7.x86_64/

lrwxrwxrwx  1 root root     43 Nov  2 09:56 build -> /usr/src/kernels/3.10.0-862.14.4.el7.x86

lrwxrwxrwx  1 root root      5 Nov  2 09:56 source -> build


해결책 a. exclude += /usr/src 주석처리 후 이미지 다시 생성

해결책 b. yum reinstall kernel-devel-3.xx.x....



2. 부팅후 eth0 이 자동으로 ifup 안됨.

정확한 원인은 모르겠으나 다른 ifcfg 파일(ib0) 과 비교해 보니 

"DEVTIMEOUT=5"  값이 없어서 추가 후 해결 되었습니다. 


[root@c43 network-scripts]# 

[root@c43 network-scripts]# cat ifcfg-eth0

This was created by the Warewulf bootstrap

DEVICE=eth0

BOOTPROTO=static

ONBOOT=yes

IPADDR=xx.xx.xx.x

NETMASK=255.255.255.0

GATEWAY=xx.xx.xx.x

HWADDR=xx:xx:xx:xx:xx:xx

[root@c43 network-scripts]#

[root@c43 network-scripts]# cat ifcfg-ib0

DEVICE=ib0

BOOTPROTO=static

IPADDR=xx.xx.xx.x

NETMASK=255.255.255.0

ONBOOT=yes

NM_CONTROLLED=no

DEVTIMEOUT=5

[root@c43 network-scripts]#

[root@c43 network-scripts]#

[root@c43 network-scripts]# echo "DEVTIMEOUT=5" >> ifcfg-eth0
[root@c43 network-scripts]#
[root@c43 network-scripts]# reboot



Posted by CheekyKite

댓글을 달아 주세요

지식공유2018. 11. 6. 19:38




centos 7.5 - nvidia driver 410 설치시 Dependency error 발생



<problem>


[root@dasandata:~]# 

[root@dasandata:~]# curl  -L -o  cuda-repo-rhel7-8.0.61-1.x86_64.rpm \

>  http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-8.0.61-1.x86_64.rpm


[root@dasandata:~]# yum -y install cuda-repo-rhel7-8.0.61-1.x86_64.rpm

[root@dasandata:~]# 

[root@dasandata:~]# cat $/etc/yum.repos.d/cuda.repo

[cuda]

name=cuda

baseurl=http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64

enabled=1

gpgcheck=1

gpgkey=http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub

[root@dasandata:~]# 

[root@dasandata:~]# 

[root@dasandata:~]# 

[root@dasandata:~]# yum -y install cuda-9-0


--> Finished Dependency Resolution

Error: Package: 3:nvidia-driver-libs-410.72-1.el7.x86_64 (cuda)

           Requires: libglvnd-gles(x86-64) >= 0.2

Error: Package: 3:nvidia-driver-libs-410.72-1.el7.x86_64 (cuda)

           Requires: libglvnd-egl(x86-64) >= 0.2

Error: Package: 3:nvidia-driver-libs-410.72-1.el7.x86_64 (cuda)

           Requires: libglvnd-opengl(x86-64) >= 0.2

Error: Package: 3:nvidia-driver-libs-410.72-1.el7.x86_64 (cuda)

           Requires: libglvnd-glx(x86-64) >= 0.2

Error: Package: 3:nvidia-driver-libs-410.72-1.el7.x86_64 (cuda)

           Requires: libglvnd(x86-64) >= 0.2

 You could try using --skip-broken to work around the problem

 You could try running: rpm -Va --nofiles --nodigest






<resolve>

https://rpmfind.net 에서 해당 패키지 rpm 을 찾아서 yum 으로 설치 한 후 해결 되었습니다. 


http://rpmfind.net/linux/epel/6/x86_64/Packages/l/libglvnd-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm

http://rpmfind.net/linux/epel/6/x86_64/Packages/l/libglvnd-gles-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm

http://rpmfind.net/linux/epel/6/x86_64/Packages/l/libglvnd-egl-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm

http://rpmfind.net/linux/epel/6/x86_64/Packages/l/libglvnd-opengl-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm

http://rpmfind.net/linux/epel/6/x86_64/Packages/l/libglvnd-glx-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm


yum -y install  http://rpmfind.net......



Posted by CheekyKite

댓글을 달아 주세요

지식공유2018. 11. 6. 18:20


open hpc 에서 wwmkchroot 명령을 사용할 때,  Trying other mirror FAILED 가 발생할 경우 


export YUM_MIRROR 를 선언한 후 진행 되는 것을 확인 하였습니다. 




[root@dasandata:~]# export CHROOT=/opt/ohpc/admin/images/centos7.5

[root@dasandata:~]# wwmkchroot centos-7 ${CHROOT}


Loaded plugins: fastestmirror, langpacks

Loading mirror speeds from cached hostfile

http://mirror.centos.org/centos-7/7/os/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - "Failed to connect to 2605:9000:401:102::2: Network is unreachable"

Trying other mirror.

http://mirror.centos.org/centos-7/7/os/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - "Failed to connect to 2605:9000:401:102::2: Network is unreachable"

Trying other mirror.


<생략>


Downloading packages:

No Presto metadata available for os-base

bash-4.2.46-30.el7.x86_64.rpm  FAILED                                          

http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: [Errno 12] Timeout on http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')

Trying other mirror.

bash-4.2.46-30.el7.x86_64.rpm  FAILED                                          

http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: [Errno 14] curl#7 - "Failed to connect to 2401:78c0::2: Network is unreachable"

Trying other mirror.

bash-4.2.46-30.el7.x86_64.rpm  FAILED                                          

http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: [Errno 14] curl#7 - "Failed to connect to 2401:78c0::2: Network is unreachable"

Trying other mirror.

bash-4.2.46-30.el7.x86_64.rpm  FAILED                                          

http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: [Errno 12] Timeout on http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')

Trying other mirror.

bash-4.2.46-30.el7.x86_64.rpm  FAILED                                          

http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: [Errno 14] curl#7 - "Failed to connect to 2401:78c0::2: Network is unreachable"

Trying other mirror.

bash-4.2.46-30.el7.x86_64.rpm  FAILED                                          

http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: [Errno 14] curl#7 - "Failed to connect to 2401:78c0::2: Network is unreachable"

Trying other mirror.

bash-4.2.46-30.el7.x86_64.rpm  FAILED                                          

http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: [Errno 12] Timeout on http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')

Trying other mirror.

bash-4.2.46-30.el7.x86_64.rpm  FAILED                                          

http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: [Errno 14] curl#7 - "Failed to connect to 2401:78c0::2: Network is unreachable"

Trying other mirror.

bash-4.2.46-30.el7.x86_64.rpm  FAILED                                          

http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: [Errno 14] curl#7 - "Failed to connect to 2401:78c0::2: Network is unreachable"

Trying other mirror.

bash-4.2.46-30.el7.x86_64.rpm  FAILED                                          

http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: [Errno 12] Timeout on http://mirror.centos.org/centos-7/7/os/x86_64/Packages/bash-4.2.46-30.el7.x86_64.rpm: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')

Trying other mirror.



Error downloading packages:

  bash-4.2.46-30.el7.x86_64: [Errno 256] No more mirrors to try.


ERROR: Failed to install packages

[root@dasandata:~]# 


[root@dasandata:~]# export YUM_MIRROR="http://ftp.kaist.ac.kr/CentOS/7.5.1804/os/\$basearch/"

[root@dasandata:~]# 




Posted by CheekyKite

댓글을 달아 주세요