ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/JSOC/doc/whattodo_sum_partn_on_off.txt
Revision: 1.1
Committed: Wed Jan 9 20:15:42 2013 UTC (10 years, 8 months ago) by production
Content type: text/plain
Branch: MAIN
CVS Tags: NetDRMS_Ver_8-0, NetDRMS_Ver_8-8, Ver_8-5, NetDRMS_Ver_8-1, Ver_LATEST, NetDRMS_Ver_LATEST, NetDRMS_Ver_8-12, NetDRMS_Ver_8-10, NetDRMS_Ver_8-11, NetDRMS_Ver_9-1, NetDRMS_Ver_9-0, NetDRMS_Ver_9-3, NetDRMS_Ver_9-2, NetDRMS_Ver_9-5, NetDRMS_Ver_9-4, NetDRMS_Ver_8-2, NetDRMS_Ver_8-3, NetDRMS_Ver_9-41, Ver_9-41, Ver_DRMSLATEST, NetDRMS_Ver_8-4, NetDRMS_Ver_8-5, NetDRMS_Ver_8-6, Ver_8-8, NetDRMS_Ver_8-7, Ver_8-2, Ver_9-3, Ver_8-0, Ver_8-1, Ver_8-6, Ver_8-7, Ver_8-4, Ver_8-11, Ver_9-1, Ver_8-3, Ver_9-5, Ver_9-4, Ver_8-10, Ver_9-2, Ver_8-12, Ver_9-0, HEAD
Log Message:
initial

File Contents

# Content
1 /home/production/cvs/JSOC/doc/whattodo_sum_partn_on_off.txt
2
3 If a file server is down, it's /SUM partitions should be taken
4 offline so that SUMS does not try to allocate storage from them
5 and then hang on the mkdir that it is unable to do.
6
7 Here are the /SUM partitions on each file server:
8
9 d02: /SUM0 thru /SUM20s
10 d03: /SUM30s
11 d04: /SUM40s
12
13 The way to take a /SUM partition offline is to set its
14 pds_set_num = -1. This is done in the sum_partn_avail db table
15 which looks like:
16
17 partn_name | total_bytes | avail_bytes | pds_set_num | pds_set_prime
18 ------------+----------------+---------------+-------------+---------------
19 /SUM37 | 25000000000000 | 1349566595072 | 0 | 0
20 /SUM4 | 33000000000000 | 1600150855680 | 0 | 0
21 /SUM2 | 33000000000000 | 1599634276352 | 0 | 0
22 /SUM20 | 33000000000000 | 1599828013056 | 0 | 0
23 /SUM21 | 33000000000000 | 1600223768576 | 0 | 0
24 /SUM22 | 33000000000000 | 1599709511680 | 0 | 0
25 /SUM3 | 33000000000000 | 1599469428736 | 0 | 0
26 [etc.]
27
28 Do this as user production on a machine with psql (e.g. n02)
29 and adjust for which file server (d03) is to be taken offline:
30
31 > psql -h hmidb -p 5434 jsoc_sums
32 jsoc_sums=> select * from sum_partn_avail where partn_name like '/SUM3%';
33 partn_name | total_bytes | avail_bytes | pds_set_num | pds_set_prime
34 ------------+----------------+---------------+-------------+---------------
35 /SUM3 | 33000000000000 | 1599825543168 | 0 | 0
36 /SUM30 | 22000000000000 | 1199689433088 | 0 | 0
37 /SUM31 | 22000000000000 | 1199430434816 | 0 | 0
38 /SUM32 | 22000000000000 | 747345281024 | 0 | 0
39 /SUM33 | 22000000000000 | 786056609792 | 0 | 0
40 /SUM34 | 25000000000000 | 1349826641920 | 0 | 0
41 /SUM35 | 25000000000000 | 1349841321984 | 0 | 0
42 /SUM36 | 25000000000000 | 1349845516288 | 0 | 0
43 /SUM37 | 25000000000000 | 1349560303616 | 0 | 0
44 (9 rows)
45
46 jsoc_sums=> update sum_partn_avail set pds_set_num=-1 where
47 jsoc_sums-> partn_name in ('/SUM30', '/SUM31', '/SUM32', '/SUM33',
48 jsoc-sums(> '/SUM34', '/SUM35', '/SUM36', '/SUM37');
49
50 [Notice that we did not set '/SUM3' which is not on the d03 fileserver.]
51
52 jsoc_sums=> \q
53
54 Now force all the sum_svc processes to reread this new sum_partn_avail table.
55
56 > sumrepartn
57
58 When d03 is back on the air, rerun the update command shown above with
59 pds_set_num=0 and do again:
60
61 > sumrepartn
62
63