1 |
/home/production/cvs/JSOC/doc/whattodo_sum_partn_on_off.txt |
2 |
|
3 |
If a file server is down, it's /SUM partitions should be taken |
4 |
offline so that SUMS does not try to allocate storage from them |
5 |
and then hang on the mkdir that it is unable to do. |
6 |
|
7 |
Here are the /SUM partitions on each file server: |
8 |
|
9 |
d02: /SUM0 thru /SUM20s |
10 |
d03: /SUM30s |
11 |
d04: /SUM40s |
12 |
|
13 |
The way to take a /SUM partition offline is to set its |
14 |
pds_set_num = -1. This is done in the sum_partn_avail db table |
15 |
which looks like: |
16 |
|
17 |
partn_name | total_bytes | avail_bytes | pds_set_num | pds_set_prime |
18 |
------------+----------------+---------------+-------------+--------------- |
19 |
/SUM37 | 25000000000000 | 1349566595072 | 0 | 0 |
20 |
/SUM4 | 33000000000000 | 1600150855680 | 0 | 0 |
21 |
/SUM2 | 33000000000000 | 1599634276352 | 0 | 0 |
22 |
/SUM20 | 33000000000000 | 1599828013056 | 0 | 0 |
23 |
/SUM21 | 33000000000000 | 1600223768576 | 0 | 0 |
24 |
/SUM22 | 33000000000000 | 1599709511680 | 0 | 0 |
25 |
/SUM3 | 33000000000000 | 1599469428736 | 0 | 0 |
26 |
[etc.] |
27 |
|
28 |
Do this as user production on a machine with psql (e.g. n02) |
29 |
and adjust for which file server (d03) is to be taken offline: |
30 |
|
31 |
> psql -h hmidb -p 5434 jsoc_sums |
32 |
jsoc_sums=> select * from sum_partn_avail where partn_name like '/SUM3%'; |
33 |
partn_name | total_bytes | avail_bytes | pds_set_num | pds_set_prime |
34 |
------------+----------------+---------------+-------------+--------------- |
35 |
/SUM3 | 33000000000000 | 1599825543168 | 0 | 0 |
36 |
/SUM30 | 22000000000000 | 1199689433088 | 0 | 0 |
37 |
/SUM31 | 22000000000000 | 1199430434816 | 0 | 0 |
38 |
/SUM32 | 22000000000000 | 747345281024 | 0 | 0 |
39 |
/SUM33 | 22000000000000 | 786056609792 | 0 | 0 |
40 |
/SUM34 | 25000000000000 | 1349826641920 | 0 | 0 |
41 |
/SUM35 | 25000000000000 | 1349841321984 | 0 | 0 |
42 |
/SUM36 | 25000000000000 | 1349845516288 | 0 | 0 |
43 |
/SUM37 | 25000000000000 | 1349560303616 | 0 | 0 |
44 |
(9 rows) |
45 |
|
46 |
jsoc_sums=> update sum_partn_avail set pds_set_num=-1 where |
47 |
jsoc_sums-> partn_name in ('/SUM30', '/SUM31', '/SUM32', '/SUM33', |
48 |
jsoc-sums(> '/SUM34', '/SUM35', '/SUM36', '/SUM37'); |
49 |
|
50 |
[Notice that we did not set '/SUM3' which is not on the d03 fileserver.] |
51 |
|
52 |
jsoc_sums=> \q |
53 |
|
54 |
Now force all the sum_svc processes to reread this new sum_partn_avail table. |
55 |
|
56 |
> sumrepartn |
57 |
|
58 |
When d03 is back on the air, rerun the update command shown above with |
59 |
pds_set_num=0 and do again: |
60 |
|
61 |
> sumrepartn |
62 |
|
63 |
|