ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/JSOC/doc/whattodo_sum_partn_on_off.txt
Revision: 1.1
Committed: Wed Jan 9 20:15:42 2013 UTC (10 years, 8 months ago) by production
Content type: text/plain
Branch: MAIN
CVS Tags: NetDRMS_Ver_8-0, NetDRMS_Ver_8-8, Ver_8-5, NetDRMS_Ver_8-1, Ver_LATEST, NetDRMS_Ver_LATEST, NetDRMS_Ver_8-12, NetDRMS_Ver_8-10, NetDRMS_Ver_8-11, NetDRMS_Ver_9-1, NetDRMS_Ver_9-0, NetDRMS_Ver_9-3, NetDRMS_Ver_9-2, NetDRMS_Ver_9-5, NetDRMS_Ver_9-4, NetDRMS_Ver_8-2, NetDRMS_Ver_8-3, NetDRMS_Ver_9-41, Ver_9-41, Ver_DRMSLATEST, NetDRMS_Ver_8-4, NetDRMS_Ver_8-5, NetDRMS_Ver_8-6, Ver_8-8, NetDRMS_Ver_8-7, Ver_8-2, Ver_9-3, Ver_8-0, Ver_8-1, Ver_8-6, Ver_8-7, Ver_8-4, Ver_8-11, Ver_9-1, Ver_8-3, Ver_9-5, Ver_9-4, Ver_8-10, Ver_9-2, Ver_8-12, Ver_9-0, HEAD
Log Message:
initial

File Contents

# User Rev Content
1 production 1.1 /home/production/cvs/JSOC/doc/whattodo_sum_partn_on_off.txt
2    
3     If a file server is down, it's /SUM partitions should be taken
4     offline so that SUMS does not try to allocate storage from them
5     and then hang on the mkdir that it is unable to do.
6    
7     Here are the /SUM partitions on each file server:
8    
9     d02: /SUM0 thru /SUM20s
10     d03: /SUM30s
11     d04: /SUM40s
12    
13     The way to take a /SUM partition offline is to set its
14     pds_set_num = -1. This is done in the sum_partn_avail db table
15     which looks like:
16    
17     partn_name | total_bytes | avail_bytes | pds_set_num | pds_set_prime
18     ------------+----------------+---------------+-------------+---------------
19     /SUM37 | 25000000000000 | 1349566595072 | 0 | 0
20     /SUM4 | 33000000000000 | 1600150855680 | 0 | 0
21     /SUM2 | 33000000000000 | 1599634276352 | 0 | 0
22     /SUM20 | 33000000000000 | 1599828013056 | 0 | 0
23     /SUM21 | 33000000000000 | 1600223768576 | 0 | 0
24     /SUM22 | 33000000000000 | 1599709511680 | 0 | 0
25     /SUM3 | 33000000000000 | 1599469428736 | 0 | 0
26     [etc.]
27    
28     Do this as user production on a machine with psql (e.g. n02)
29     and adjust for which file server (d03) is to be taken offline:
30    
31     > psql -h hmidb -p 5434 jsoc_sums
32     jsoc_sums=> select * from sum_partn_avail where partn_name like '/SUM3%';
33     partn_name | total_bytes | avail_bytes | pds_set_num | pds_set_prime
34     ------------+----------------+---------------+-------------+---------------
35     /SUM3 | 33000000000000 | 1599825543168 | 0 | 0
36     /SUM30 | 22000000000000 | 1199689433088 | 0 | 0
37     /SUM31 | 22000000000000 | 1199430434816 | 0 | 0
38     /SUM32 | 22000000000000 | 747345281024 | 0 | 0
39     /SUM33 | 22000000000000 | 786056609792 | 0 | 0
40     /SUM34 | 25000000000000 | 1349826641920 | 0 | 0
41     /SUM35 | 25000000000000 | 1349841321984 | 0 | 0
42     /SUM36 | 25000000000000 | 1349845516288 | 0 | 0
43     /SUM37 | 25000000000000 | 1349560303616 | 0 | 0
44     (9 rows)
45    
46     jsoc_sums=> update sum_partn_avail set pds_set_num=-1 where
47     jsoc_sums-> partn_name in ('/SUM30', '/SUM31', '/SUM32', '/SUM33',
48     jsoc-sums(> '/SUM34', '/SUM35', '/SUM36', '/SUM37');
49    
50     [Notice that we did not set '/SUM3' which is not on the d03 fileserver.]
51    
52     jsoc_sums=> \q
53    
54     Now force all the sum_svc processes to reread this new sum_partn_avail table.
55    
56     > sumrepartn
57    
58     When d03 is back on the air, rerun the update command shown above with
59     pds_set_num=0 and do again:
60    
61     > sumrepartn
62    
63