http://www.mysqlkorea.co.kr
한글매뉴얼 5.0 , 한글매뉴얼 5.1 , MySQL 5.1 HA , 사용자매뉴얼
공지사항  
뉴스  
질문과 답변
DBA
Developer
Cluster
토크박스  
이벤트  
서포트 티켓  
최신글
인텍스 추가 등에…
mysql master - s…
다대다 관계에서 …
mysql my파일과 …
mysql server 설…
 
질문과 답변 (Cluster) > 커뮤니티 존 > 질문과 답변 (Cluster)
mysql Cluster 테스트중에 불규칙적으로 데이터노드에 문제가 발생합니다.
글쓴이 : 정희성   날짜 : 10-07-21 16:20   조회수 : 7260
1. 데이터 노드가 모두 동시에 내려가는 문제있습니다.
2. 데이터 노드가 1개 내려가면 자동으로 올라오지 않고 수동으로 올리는데 약 40분 가까이 걸립니다.
  (원래 데이터 노드는 자동으로 올라와야 되는거 아닌가요? 올라올때 이렇게 오래 걸리나요?)
 
조언 부탁드리겠습니다.T.T
==========================================================================================
 
// 서버 사양은   인텔 제온 듀얼 CPU 에 10G 메모리로 구성되어 있습니다.
 
// 서버 구성입니다.
 
[ndbd(NDB)]     4 node(s)
id=3    @XXX.XXX.52.211  (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0, Master)
id=4    @XXX.XXX.52.212  (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0)
id=7    @XXX.XXX.52.204  (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0)
id=8    @XXX.XXX.52.205  (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s)
id=1    @XXX.XXX.52.207  (mysql-5.1.44 ndb-7.1.4)
id=2    @XXX.XXX.52.208  (mysql-5.1.44 ndb-7.1.4)
[mysqld(API)]   2 node(s)
id=5    @XXX.XXX.52.211  (mysql-5.1.44 ndb-7.1.4)
id=6    @XXX.XXX.52.212  (mysql-5.1.44 ndb-7.1.4)
 
 
// MGM  Config 내용입니다.

[ndbd default]
NoOfReplicas=4
DataMemory=2048M
IndexMemory=320M
MaxNoOfOrderedIndexes = 10000
MaxNoOfUniqueHashIndexes = 15000
MaxNoOfAttributes = 20000
 
[tcp default]
portnumber=2202       
[ndb_mgmd]
Id=1
hostname=XXX.XXX.52.207         
datadir=/var/lib/mysql-cluster 
 
[ndb_mgmd]
Id=2
hostname=XXX.XXX.52.208        
datadir=/var/lib/mysql-cluster  
 
[ndbd]
Id=3
# (one [ndbd] section per data node)
hostname=XXX.XXX.52.211        
datadir=/usr/local/mysql/data  
MaxNoOfConcurrentOperations=500000
 
[ndbd]
Id=4
hostname=XXX.XXX.52.212         
datadir=/usr/local/mysql/data  
MaxNoOfConcurrentOperations=500000
 
[ndbd]
Id=7
# (one [ndbd] section per data node)
hostname=XXX.XXX.52.204         
datadir=/usr/local/mysql/data  
MaxNoOfConcurrentOperations=500000
 
[ndbd]
Id=8
hostname=XXX.XXX.52.205         
datadir=/usr/local/mysql/data  
MaxNoOfConcurrentOperations=500000
 
[mysqld]
nodeId=5
hostname=XXX.XXX.52.211 

[mysqld]
nodeId=6
hostname=XXX.XXX.52.212 
 
// Data 노드가 동시에 내려갔을때 mgm LOG 내용 입니다.
 
2010-07-20 14:52:08 [MgmtSrvr] INFO     -- Node 3: Local checkpoint 34 completed
2010-07-20 14:52:10 [MgmtSrvr] INFO     -- Node 3: Local checkpoint 35 started. Keep GCI = 17012 oldest restorable GCI = 17098
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 4: Transporter to node 7 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 4: Transporter to node 7 reported error 0x16: The send buffer was full, but sleeping for a while solved - Repeated 2 times
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 4: Transporter to node 3 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 4: Transporter to node 3 reported error 0x16: The send buffer was full, but sleeping for a while solved - Repeated 2 times
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 7: Transporter to node 4 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 8: Transporter to node 7 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 7: Transporter to node 4 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 4: Transporter to node 8 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 8: Transporter to node 7 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 3: Transporter to node 8 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 8: Transporter to node 7 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 3: Transporter to node 8 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:18 [MgmtSrvr] WARNING  -- Node 3: Transporter to node 8 reported error 0x16: The send buffer was full, but sleeping for a while solved
2010-07-20 14:57:20 [MgmtSrvr] ALERT    -- Node 7: Forced node shutdown completed. Caused by error 2300: 'Generic error(Restart error). Temporary error, restart node'.
2010-07-20 14:57:20 [MgmtSrvr] ALERT    -- Node 4: Node 7 Disconnected
2010-07-20 14:57:20 [MgmtSrvr] ALERT    -- Node 1: Node 7 Disconnected
2010-07-20 14:57:21 [MgmtSrvr] ALERT    -- Node 8: Forced node shutdown completed. Caused by error 2300: 'Generic error(Restart error). Temporary error, restart node'.
2010-07-20 14:57:21 [MgmtSrvr] ALERT    -- Node 4: Node 8 Disconnected
2010-07-20 14:57:21 [MgmtSrvr] ALERT    -- Node 1: Node 8 Disconnected
2010-07-20 14:57:21 [MgmtSrvr] ALERT    -- Node 3: Forced node shutdown completed. Caused by error 2300: 'Generic error(Restart error). Temporary error, restart node'.
2010-07-20 14:57:21 [MgmtSrvr] ALERT    -- Node 4: Node 3 Disconnected
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: Communication to Node 3 closed
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: Communication to Node 7 closed
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: Communication to Node 8 closed
2010-07-20 14:57:21 [MgmtSrvr] ALERT    -- Node 4: Network partitioning - arbitration required
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: President restarts arbitration thread [state=7]
2010-07-20 14:57:21 [MgmtSrvr] ALERT    -- Node 1: Node 3 Disconnected
2010-07-20 14:57:21 [MgmtSrvr] ALERT    -- Node 4: Arbitration won - positive reply from node 1
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: GCP Take over started
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: Node 4 taking over as DICT master
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: GCP Take over completed
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: kk: 17293/17 2 0
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: LCP Take over started
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: ParticipatingDIH = 0000000000000010
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: ParticipatingLQH = 0000000000000010
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0 0000000000000000]
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=1 0000000000000010]
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=0 0000000000000000]
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_From_Master_Received = 0
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: LCP Take over completed (state = 5)
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: ParticipatingDIH = 0000000000000010
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: ParticipatingLQH = 0000000000000010
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=1 0000000000000010]
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=1 0000000000000010]
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=1 0000000000000010]
2010-07-20 14:57:21 [MgmtSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_From_Master_Received = 0
2010-07-20 14:57:22 [MgmtSrvr] INFO     -- Node 4: Started arbitrator node 1 [ticket=73090002b209c881]
2010-07-20 14:57:32 [MgmtSrvr] ALERT    -- Node 4: Forced node shutdown completed. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2010-07-20 14:57:32 [MgmtSrvr] ALERT    -- Node 1: Node 4 Disconnected
 

 

 
민족
안녕하세요~ 클러스터 4중화 하셨네요 ㅡ0ㅡ;;

데이터 사이즈는 작으신대.. 상당히 중요한 시스템 인거 같습니다.

혹시 위 증상이 어떤때에 발생되었는지.. 알수 있을까여?

그리고 config.ini 파일 위에설정이 끝인가요??

그리고 클러스터 버젼도 알려주시기 바랍니다.

그리고 ndb out log 도 적어주시면 감사 하겠습니다.
클러스터
위 증상 
서버에  Web, WebStage 라는 2개의 DB 가 존재 합니다.
테스트를 위하여 Web DB의 테이블 모두를 -> WebStage  에 옮기는 명령어를 수행하고
해당 에러가 발생 하였습니다.
SQLYOG(MYSQL GUI TOOL) 에서  Copy Database to Different  명령으로 테이블 전체를 다른 테이터베이스로 이동하면서 발생)

config.ini 파일은 위의 내용이 전부 입니다.
========================================
mgm out log
========================================
stop checker 0
==CONFIRMED==
Node 3 failed
Node 4 failed
Node 7 failed
Node 8 failed

=========================================
master Data Node  Error log
=========================================
100720 14:40:23 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/adminEventList
100720 14:40:24 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/adminMainTagList
100720 14:40:24 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/adminStatMember
100720 14:40:25 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/adminStatPagora
100720 14:40:26 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/adminStatPointUse
100720 14:40:27 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/adminStatTogme
100720 14:40:27 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/banner
100720 14:40:28 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/billing
100720 14:40:29 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/blogBlackList
100720 14:40:30 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/blogComment
100720 14:41:43 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/blogManage
100720 14:42:02 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/blogPost
100720 14:43:34 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/blogPostMeta
100720 14:54:35 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/notice
100720 14:55:04 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/noticeCategory
100720 14:55:05 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/noticeCategoryComment
100720 14:55:06 [Note] NDB Binlog: CREATE TABLE Event: REPL$WebStage/pointLog
100720 14:57:17 [ERROR] Got error 4028 when reading table './Web/em_smt_tran'
100720 14:57:17 [ERROR] Got error 4028 when reading table './Web/em_mmt_tran'
100720 14:57:17 [ERROR] Got error 4028 when reading table './Web/em_smt_tran'
100720 14:57:17 [ERROR] /usr/local/mysql/bin/mysqld: Sort aborted
100720 14:57:17 [ERROR] /usr/local/mysql/bin/mysqld: Sort aborted
100720 14:57:17 [ERROR] /usr/local/mysql/bin/mysqld: Sort aborted
100720 14:57:17 [ERROR] Got error 4028 when reading table './Web/blogPost'
100720 14:57:17 [ERROR] Got error 4010 when reading table './Web/users'
100720 14:57:28 [ERROR] Got error 4002 when reading table './Web/em_smt_tran'
100720 14:57:28 [Note] NDB Binlog: Node: 3, down, Subscriber bitmask 00
100720 14:57:28 [Note] NDB Binlog: Node: 4, down, Subscriber bitmask 00
100720 14:57:28 [Note] NDB Binlog: Node: 7, down, Subscriber bitmask 00
100720 14:57:28 [Note] NDB Binlog: Node: 8, down, Subscriber bitmask 00
100720 14:57:28 [ERROR] /usr/local/mysql/bin/mysqld: Sort aborted
100720 14:57:28 [Note] NDB Binlog: cluster failure for ./mysql/ndb_schema at epoch 17294/0.
100720 14:57:28 [ERROR] /usr/local/mysql/bin/mysqld: Sort aborted
100720 14:57:28 [ERROR] Got error 4028 when reading table './Web/togMyLine'
100720 14:57:28 [ERROR] Got error 157 when reading table './Web/users'
100720 14:57:28 [ERROR] /usr/local/mysql/bin/mysqld: Sort aborted
민족
ndb_nodeid_error.log  정보좀 부탁드립니다.

클러스터 버젼 정보도 확인 부탁드립니다.

현재 NDB 를 위한 설정값이 부족해서 리소스 부족으로 인하여 위 증상이 나타난거 같습니다.

우선 위에 에러 로그 와 버젼 정보 확인 부탁드립니다.
클러스터
64비트 Cent OS 5.3 에서 mysql-cluster-gpl-7.1.4b-linux-x86_64-glibc23  사용합니다.

=================================================================================
ndb_nodeid_error.log  내용입니다.
=================================================================================

Current byte-offset of file-pointer is: 1067


Time: Tuesday 20 July 2010 - 14:57:15
Status: Temporary error, restart node
Message: Generic error (Restart error)
Error: 2300
Error data: Out of SendBufferMemory in sendSignal
Error object:
Program: /usr/local/mysql/bin/ndbmtd
Pid: 27805 thr: 2
Version: mysql-5.1.44 ndb-7.1.4b
Trace: /usr/local/mysql/data/ndb_3_trace.log.1 /usr/local/mysql/data/ndb_3_trace.log.1_t1 /usr/local/mysql/data/ndb_3_trace.log.1_t2 /usr/local/mysql/data/ndb_3_trace.log.1_t3
***EOM***

Time: Tuesday 20 July 2010 - 14:57:15
Status: Permanent error, external action needed
Message: Signal lost, out of send buffer memory, please increase SendBufferMemory or lower the load (Resource configuration error)
Error: 6052
Error data: Remote node id 8.
Error object: TransporterCallback.cpp
Program: /usr/local/mysql/bin/ndbmtd
Pid: 27805 thr: 3
Version: mysql-5.1.44 ndb-7.1.4b
Trace: /usr/local/mysql/data/ndb_3_trace.log.1 /usr/local/mysql/data/ndb_3_trace.log.1_t1 /usr/local/mysql/data/n
민족
SendBufferMemory=32M  config.ini 파일에 추가 하신후 디비 리스타트 해보시기 바랍니다.

현재 한번에 많은 부하가 생겨서 NDB 메모리 부족으로 인하여 죽은것으로 예상 됩니다.
이전글 각 데이터 노드별 ndb_out.log 파일을 보면 아래와 같은 로그가 한번씩 쌓이는데 
다음글 Cluster에 관한 문의 사항입니다. 
MySQL Korea 사이트의 컨텐츠 소유권은 (주)상상이비즈에 있으므로 무단전재를 금합니다.
Copyright ⓒ ssebiz All Rights Reserved.