ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/JSOC/doc/whattodolev0.txt
Revision: 1.10
Committed: Tue May 22 19:10:45 2012 UTC (11 years, 4 months ago) by production
Content type: text/plain
Branch: MAIN
CVS Tags: NetDRMS_Ver_6-4, NetDRMS_Ver_8-0, Ver_6-4, NetDRMS_Ver_8-8, Ver_8-5, NetDRMS_Ver_7-0, NetDRMS_Ver_8-1, Ver_7-0, Ver_LATEST, NetDRMS_Ver_LATEST, NetDRMS_Ver_8-12, NetDRMS_Ver_8-10, NetDRMS_Ver_8-11, NetDRMS_Ver_9-1, NetDRMS_Ver_9-0, NetDRMS_Ver_9-3, NetDRMS_Ver_9-2, NetDRMS_Ver_9-5, NetDRMS_Ver_9-4, NetDRMS_Ver_8-2, NetDRMS_Ver_8-3, NetDRMS_Ver_9-41, Ver_9-41, Ver_DRMSLATEST, NetDRMS_Ver_8-4, NetDRMS_Ver_8-5, NetDRMS_Ver_8-6, Ver_8-8, NetDRMS_Ver_8-7, Ver_8-2, Ver_9-3, Ver_8-0, Ver_8-1, Ver_8-6, Ver_8-7, Ver_8-4, Ver_8-11, Ver_7-1, Ver_9-1, Ver_8-3, NetDRMS_Ver_7-1, Ver_9-5, Ver_9-4, Ver_8-10, Ver_9-2, Ver_8-12, Ver_9-0, HEAD
Changes since 1.9: +12 -4 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 /home/production/cvs/JSOC/doc/whattodolev0.txt 25Nov2008
2
3 ------------------------------------------------
4 WARNING!! Some of this is outdated. 3Jun2010
5 Please see more recent what*.txt files, e.g.
6 whattodo_start_stop_lev1_0_sums.txt
7 ------------------------------------------------
8
9 ------------------------------------------------------
10 Running Datacapture & Pipeline Backend lev0 Processing
11 ------------------------------------------------------
12
13
14 NOTE: For now, this is all done from the xim w/s (Jim's office)
15
16 Datacapture:
17 --------------------------
18
19 NOTE:IMPORTANT: Please keep in mind that each data capture machine has its
20 own independent /home/production.
21
22 FORMERLY: 1. The Datacapture system for aia/hmi is by convention dcs0/dcs1
23 respectively. If the spare dcs2 is to be put in place, it is renamed dcs0
24 or dcs1, and the original machine is renamed dcs2.
25
26 1. The datacapture machine serving for AIA or HMI is determined by
27 the entries in:
28
29 /home/production/cvs/JSOC/proj/datacapture/scripts/dsctab.txt
30
31 This is edited or listed by the program:
32
33 /home/production/cvs/JSOC/proj/datacapture/scripts> dcstab.pl -h
34 Display or change the datacapture system assignment file.
35 Usage: dcstab [-h][-l][-e]
36 -h = print this help message
37 -l = list the current file contents
38 -e = edit with vi the current file contents
39
40 For dcs3 the dcstab.txt would look like:
41 AIA=dcs3
42 HMI=dcs3
43
44
45 1a. The spare dcs2 normally servers as a backup destination of the postgres
46 running on dcs0 and dcs1. You should see this postgres cron job on dcs0
47 and dcs1, respectively:
48
49 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs0_to_dcs2.pl
50 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs1_to_dcs2.pl
51
52 For this to work, this must be done on dcs0, dcs1 and dcs2, as user
53 postgres, after any reboot:
54
55 > ssh-agent | head -2 > /var/lib/pgsql/ssh-agent.env
56 > chmod 600 /var/lib/pgsql/ssh-agent.env
57 > source /var/lib/pgsql/ssh-agent.env
58 > ssh-add
59 (The password is same as production's)
60
61 2. Login as user production via j0. (password is on Jim's whiteboard).
62
63 3. The Postgres must be running and is started automatically on boot:
64
65 #######OLD#########################
66 #> ps -ef |grep pg
67 #postgres 4631 1 0 Mar11 ? 00:06:21 /usr/bin/postmaster -D /var/lib/pgsql/data
68 ###################################
69
70 dcs0:/home/production> px postgres
71 postgres 6545 1 0 May04 ? 00:09:50 /usr/local/pgsql-8.4/bin/postgres -D /var/lib/pgsql/dcs0_data
72
73 4. The root of the datacapture tree is /home/production/cvs/JSOC.
74 The producton runs as user id 388.
75
76 5. The sum_svc is normally running:
77
78 > ps -ef |grep sum_svc
79 388 26958 1 0 Jun09 pts/0 00:00:54 sum_svc jsocdc
80
81 Note the SUMS database is jsocdc. This is a separate DB on each dcs.
82
83 6. To start/restart the sum_svc and related programs (e.g. tape_svc) do:
84
85 > sum_start_dc
86 sum_start at 2008.06.16_13:32:23
87 ** NOTE: "soc_pipe_scp jsocdc" still running
88 Do you want me to do a sum_stop followed by a sum_start for you (y or n):
89
90 You would normally answer 'y' here.
91
92 7. To run the datacapture gui that will display the data, mark it for archive,
93 optionally extract lev0 and send it on the the pipeline backend, do this:
94
95 > cd /home/production/cvs/JSOC/proj/datacapture/scripts>
96 > ./socdc
97
98 All you would normally do is hit "Start Instances for HMI" or AIA for
99 what datacapture machine you are on.
100
101 8. To optionally extract lev0 do this:
102
103 > touch /usr/local/logs/soc/LEV0FILEON
104
105 To stop lev0:
106
107 > /bin/rm /usr/local/logs/soc/LEV0FILEON
108
109 The last 100 images for each VC are kept in /tmp/jim.
110
111 NOTE: If you turn lev0 on, you are going to be data sensitive and you
112 may see things like this, in which case you have to restart socdc:
113
114 ingest_tlm: /home/production/cvs/EGSE/src/libhmicomp.d/decompress.c:1385: decompress_undotransform: Assertion `N>=(6) && N<=(16)' failed.
115 kill: no process ID specified
116
117 9. The datacapture machines automatically copies DDS input data to the
118 pipeline backend on /dds/socdc living on d01. This is done by the program:
119
120 > ps -ef |grep soc_pipe_scp
121 388 21529 21479 0 Jun09 pts/0 00:00:13 soc_pipe_scp /dds/soc2pipe/hmi /dds/socdc/hmi d01i 30
122
123 This requires that an ssh-agent be running. If you reboot a dcs machine do:
124
125 > ssh-agent | head -2 > /var/tmp/ssh-agent.env
126 > chmod 600 /var/tmp/ssh-agent.env
127 > source /var/tmp/ssh-agent.env
128 > ssh-add (or for sonar: ssh-add /home/production/.ssh/id_rsa)
129 (The password is written on my whiteboard)
130
131 NOTE: on some machines you may have to put the user name in
132 /etc/ssh/allowed_users
133
134 NOTE: cron jobs use this /var/tmp/ssh-agent.env file
135
136 If you want another window to use the ssh-agent that is already running do:
137 > source /var/tmp/ssh-agent.env
138
139 NOTE: on any one machine for user production there s/b just one ssh-agent
140 running.
141
142
143 If you see that a dcs has asked for a password, the ssh-agent has failed.
144 You can probably find an error msg on d01 like 'invalid user production'.
145 You should exit the socdc. Make sure there is no soc_pipe_scp still running.
146 Restart the socdc.
147
148 If you find that there is a hostname for production that is not in the
149 /home/production/.ssh/authorized_keys file then do this on the host that
150 you want to add:
151
152 Pick up the entry in /home/production/.ssh/id_rsa.pub
153 and put it in this file on the host that you want to have access to
154 (make sure that it's all one line):
155
156 /home/production/.ssh/authorized_keys
157
158 NOTE: DO NOT do a ssh-keygen or you will have to update all the host's
159 authorized_keys with the new public key you just generated.
160
161 If not already active, then do what's shown above for the ssh-agent.
162
163
164 10. There should be a cron job running that will archive to the T50 tapes.
165 Note the names are asymmetric for dcs0 and dcs1.
166
167 30 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do
168
169 00 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do_dcs1
170
171 In the beginning of the world, before any sum_start_dc, the T50 should have
172 a supply of blank tapes in it's active slots (1-24). A cleaning tape must
173 be in slot 25. The imp/exp slots (26-30) must be vacant.
174 To see the contents of the T50 before startup do:
175
176 > mtx -f /dev/t50 status
177
178 Whenever sum_start_dc is called, all the tapes are inventoried and added
179 to the SUMS database if necessary.
180 When a tape is written full by the tapearc_do cron job, the t50view
181 display (see 11. and 12. below) 'Imp/Exp' button will increment its
182 count. Tapes should be exported before the count gets above 5.
183
184 11. There should be running the t50view program to display/control the
185 tape operations.
186
187 > t50view -i jsocdc
188
189 The -i means interactive mode, which will allow you to change tapes.
190
191 12. Every 2 days, inspect the t50 display for the button on the top row
192 called 'Imp/Exp'. If it is non 0 (and yellow), then some full tapes can be
193 exported from the T50 and new tapes put in for further archiving.
194
195 Hit the 'Imp/Exp' button.
196 Follow explicitly all the directions.
197 The blank L4 tapes are in the tape room in the computer room.
198
199 When the tape drive needs cleaning, hit the "Start Cleaning" button on
200 the t50view gui.
201
202 13. There should be a cron job running as user production on both dcs0 and
203 dcs1 that will set the Offsite_Ack field in the sum_main DB table.
204 20 0 * * * /home/production/tape_verify/scripts/set_sum_main_offsite_ack.pl
205
206 Where:
207 #/home/production/tape_verify/scripts/set_sum_main_offsite_ack.pl
208 #
209 #This reads the .ver files produced by Tim's
210 #/home/production/tape_verify/scripts/run_remote_tape_verify.pl
211 #A .ver file looks like:
212 ## Offsite verify offhost:dds/off2ds/HMI_2008.06.11_01:12:27.ver
213 ## Tape 0=success 0=dcs0(aia)
214 #000684L4 0 1
215 #000701L4 0 1
216 ##END
217 #For each tape that has been verified successfully, this program
218 #sets the Offsite_Ack to 'Y' in the sum_main for all entries
219 #with Arch_Tape = the given tape id.
220 #
221 #The machine names where AIA and HMI processing live
222 #is found in dcstab.txt which must be on either dcs0 or dcs1
223
224 14. Other background info is in:
225
226 http://hmi.stanford.edu/development/JSOC_Documents/Data_Capture_Documents/DataCapture.html
227
228 ***************************dsc3*********************************************
229 NOTE: dcs3 (i.e. offsite datacapture machine shipped to Goddard Nov 2008)
230
231 At Goddard the dcs3 host name will be changed. See the following for
232 how to accomodate this:
233
234 /home/production/cvs/JSOC/doc/dcs3_name_change.txt
235
236 This cron job must be run to clean out the /dds/soc2pipe/[aia,hmi]:
237 0,5,10,15,20,25,30,35,40,45,50,55 * * * *
238 /home/production/cvs/JSOC/proj/datacapture/scripts/rm_soc2pipe.pl
239
240 Also on dcs3 the offsite_ack check and safe tape check is not done in:
241 /home/production/cvs/JSOC/base/sums/libs/pg/SUMLIB_RmDo.pgc
242
243 Also on dcs3, because there is no pipeline backend, there is not .arc file
244 ever made for the DDS.
245 ***************************dsc3*********************************************
246
247 Level 0 Backend:
248 --------------------------
249
250 !!Make sure run Phil's script for watchlev0 in the background on cl1n001:
251 /home/production/cvs/JSOC/base/sums/scripts/get_dcs_times.csh
252
253 1. As mentioned above, the datacapture machines automatically copies DDS input
254 data to the pipeline backend on /dds/socdc living on d01.
255
256 2. The lev0 code runs as ingest_lev0 on the cluster machine cl1n001,
257 which has d01:/dds mounted. cl1n001 can be accessed through j1.
258
259 3. All 4 instances of ingest_lev0 for the 4 VCs are controlled by
260 /home/production/cvs/JSOC/proj/lev0/apps/doingestlev0.pl
261
262 If you want to start afresh, kill any ingest_lev0 running (will later be
263 automated). Then do:
264
265 > cd /home/production/cvs/JSOC/proj/lev0/apps
266 > doingestlev0.pl (actually a link to start_lev0.pl)
267
268 You will see 4 instances started and the log file names can be seen.
269 You will be advised that to cleanly stop the lev0 processing, run:
270
271 > stop_lev0.pl
272
273 It may take awhile for all the ingest_lev0 processes to get to a point
274 where they can stop cleanly.
275
276 For now, every hour, the ingest_lev0 processes are automatically restarted.
277
278
279 4. The output is for the series:
280
281 hmi.tlmd
282 hmi.lev0d
283 aia.tlmd
284 aia.lev0d
285
286 #It is all save in DRMS and archived.
287 Only the tlmd is archived. (see below if you want to change the
288 archiving status of a dataseries)
289
290 5. If something in the backend goes down such that you can't run
291 ingest_lev0, then you may want to start this cron job that will
292 periodically clean out the /dds/socdc dir of the files that are
293 coming in from the datacapture systems.
294
295 > crontab -l
296 # DO NOT EDIT THIS FILE - edit the master and reinstall.
297 # (/tmp/crontab.XXXXVnxDO9 installed on Mon Jun 16 16:38:46 2008)
298 # (Cron version V5.0 -- $Id: whattodolev0.txt,v 1.9 2010/12/17 18:34:28 production Exp $)
299 #0,20,40 * * * * /home/jim/cvs/jsoc/scripts/pipefe_rm
300
301 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
302
303 Starting and stoping SUMS on d02:
304
305 Login as production on d02
306 sum_start_d02
307
308 (if sums is already running it will ask you if you want to halt it.
309 you normally say 'y'.)
310
311 sum_stop_d02
312 if you just want to stop sums.
313
314 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
315
316 SUMS archiving:
317
318 Currently SUM is archiving continuously. The script is:
319
320 /home/production/cvs/JSOC/base/sums/scripts/tape_do_0.pl (and _1, _2, _3)
321
322 To halt it do:
323
324 touch /usr/local/logs/tapearc/TAPEARC_ABORT[0,1,2]
325
326 Try to keep it running, as there is still much to be archived.
327
328 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
329
330 Change archiving status of a dataseries:
331
332 > psql -h hmidb jsoc
333
334 jsoc=> update hmi.drms_series set archive=0 where seriesname='hmi.lev0c';
335 UPDATE 1
336 jsoc=> \q
337
338 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
339
340 The modified dcs reboot procedure is in ~kehcheng/dcs.reboot.notes.