View previous topic :: View next topic |
Author |
Message |
rsivananda Beginner
Joined: 11 Aug 2004 Posts: 30 Topics: 10
|
Posted: Mon Jun 11, 2007 4:40 am Post subject: Huge Files Mapping |
|
|
Hi
I have a input file with swift messages
The input file has around 1.5 million input messages and the same no of the output messages
The job is to match the input message with the corresponding out message with such huge files
Could any one help me out on the best possible way of doing this..
A sample message is given below. There are 1.5 million msgs like below in each file to be mapped correspondingly based on certain criteria
HEADERS F01XXXXXXXXXXXXXXXXXXXXXX
O103XXXXXXXXXXXXXXXXXXXXXXXXXX
119:STP
:20: ABC111111111
:23B:CRED
:32A:07186546463626
:33B:USD100
:50K:ABCVELOPMENT
INTERNATIONAL CORP
:53A:BS12345
:54A:IR12345
:57A:B12345
:59: /200064 333.00
XXXXXXXXXXXX
:71A:ABC
:71F:USD100
Say the filter criteria is the amount and the the text in feld 32A
The prog lang basically is used is PL/1
any help to reduce the effort is greatly appreciated
Thanks
Siva |
|
Back to top |
|
 |
CICS Guy Intermediate
Joined: 30 Apr 2007 Posts: 292 Topics: 3
|
Posted: Mon Jun 11, 2007 4:58 am Post subject: Re: Huge Files Mapping |
|
|
rsivananda wrote: | The job is to match the input message with the corresponding out message with such huge files
Code: | HEADERS F01XXXXXXXXXXXXXXXXXXXXXX
O103XXXXXXXXXXXXXXXXXXXXXXXXXX
119:STP
:20: ABC111111111
:23B:CRED
:32A:07186546463626
:33B:USD100
:50K:ABCVELOPMENT
INTERNATIONAL CORP
:53A:BS12345
:54A:IR12345
:57A:B12345
:59: /200064 333.00
XXXXXXXXXXXX
:71A:ABC
:71F:USD100 |
| Which ones are the 'input message' and which ones are the 'corresponding out message'? |
|
Back to top |
|
 |
prino Banned
Joined: 01 Feb 2007 Posts: 45 Topics: 5 Location: Oostende
|
Posted: Mon Jun 11, 2007 6:04 am Post subject: |
|
|
Convert both to one SWIFT message per record with fixed positions for the tags (so insert plenty of blanks or x'00' or x'ff' whatever), making sure you add dummy tags for thos that are missing. Then sort both files on the required tag and start reading both, matching data.
Robert |
|
Back to top |
|
 |
Phantom Data Mgmt Moderator

Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Mon Jun 11, 2007 9:08 am Post subject: |
|
|
rsivananda,
You have been in this board for nearly 3 years, yet you did not follow any of the rules.
1. Please provide complete information. Pls don't make us Guess - What are the DCB parameters of your files ?
2. Make Use BB Tags (CODE - /CODE). That format your data and will be easy to read.
3. Give us proper samples of input and output files. You are talking about 3 files, but you have given example on only one. We have no clue whether it is input / output file.
4. If you are ok with solutions involving Sort, then please check this
http://www.mvsforums.com/helpboards/viewtopic.php?t=5399
Most of the times, matching data using utlities will be much faster and efficient thatn using programming languages,
Thanks,
Phantom |
|
Back to top |
|
 |
rsivananda Beginner
Joined: 11 Aug 2004 Posts: 30 Topics: 10
|
Posted: Mon Jun 11, 2007 10:56 am Post subject: |
|
|
Sorry about not posting the DCBs It's a miss
Ya the files are Vb files with LRECL of 32000
The reason i gave only one file format is that the infile and outfile looks the same except some tags which might change..
I am trying to put these msgs into one liners and then see if i can sort them on the tags i need and do a compare ....
Thanks for the hints..
Siva |
|
Back to top |
|
 |
ChrisR Beginner
Joined: 10 Jun 2007 Posts: 5 Topics: 1
|
Posted: Mon Jun 11, 2007 12:07 pm Post subject: |
|
|
This is also my problem (see posting on Hash and Data Compression above yours http://www.mvsforums.com/helpboards/viewtopic.php?t=8559 ).
We had figured the solution was to pass data to be matched to a hash routine and generate an index of the much shorter hash keys. A matching hash key does not guarantee a matching record, but would reduces the compares to be made by many milions. Just need to figure out how to invoke one of the many hash routines embedded in IBM Software.
Chris |
|
Back to top |
|
 |
CICS Guy Intermediate
Joined: 30 Apr 2007 Posts: 292 Topics: 3
|
Posted: Mon Jun 11, 2007 4:04 pm Post subject: Re: Huge Files Mapping |
|
|
I ask again: CICS Guy wrote: | Which ones are the 'input message' and which ones are the 'corresponding out message'? |
|
|
Back to top |
|
 |
rsivananda Beginner
Joined: 11 Aug 2004 Posts: 30 Topics: 10
|
Posted: Wed Jun 13, 2007 2:42 am Post subject: |
|
|
Hi Everyone
I am back again with my problem
Here is IN and OUT msg Resp as they look in the file .
The files are VB files with LRECL of 32000
{1:F01XXXXXXXXXXXXXXXXX}{2:O202XXXXXXXXXXXXXXXXXXXXXX}
{4:
:20: XXXXXXXXXXXXXXXXXXX
:21: ABCABC
:32A:050103USD88644,47
:57A:XXXXXXXXXXX
:58A:XXXXXXXXXXXXXXXX
XXXXXXXXXX
:72: XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXX
/CHGS/USD210,86/
-}
{1:F01XXXXXXXXXXXXXXXXX}{2:O202XXXXXXXXXXXXXXXXXXXXXX}{3:{XXXXXXXXXXXXXX}}
{4:
:20: XXXXXXXXXXXXX
:21: ABCABC
:32A:050103USD88644,47
:52A:XXXXXXXXXXXXX
:57A:XXXXXXXXXXXX
:58A:XXXXXXXXXX
:72: XXXXXXXXXXXXXXXXXXXX
//VALUE XXXXXXXXXXXXX
//REF XXXXXXXXXXXX
-}
So the dauting task is there are about 1.5 M msgs in IN file and about 1.3 M in out file
Now my task is to map each IN msg with corresponding out msg based on certain ref like below
1. Fields :32A: --Date In first 6 bytes follwed by amount
2. :21: Which has the Ref No
I tried putting them in one line to sort them and then compare. However since the one liners are fixed length, i couldn't do it with simple sort
Can any one suggest a better way of doing this while i try the programmiing with Pl/+ to acheive it.
Thanks
Siva |
|
Back to top |
|
 |
bauer Intermediate
Joined: 10 Oct 2003 Posts: 317 Topics: 50 Location: Germany
|
Posted: Wed Jun 13, 2007 3:57 am Post subject: |
|
|
This is input ????
Code: |
{1:F01XXXXXXXXXXXXXXXXX}{2:O202XXXXXXXXXXXXXXXXXXXXXX}
{4:
:20: XXXXXXXXXXXXXXXXXXX
:21: ABCABC
:32A:050103USD88644,47
:57A:XXXXXXXXXXX
:58A:XXXXXXXXXXXXXXXX
XXXXXXXXXX
:72: XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXX
/CHGS/USD210,86/
-}
|
and this output ????
Code: |
{1:F01XXXXXXXXXXXXXXXXX}{2:O202XXXXXXXXXXXXXXXXXXXXXX}{3:{XXXXXXXXXXXXXX}}
{4:
:20: XXXXXXXXXXXXX
:21: ABCABC
:32A:050103USD88644,47
:52A:XXXXXXXXXXXXX
:57A:XXXXXXXXXXXX
:58A:XXXXXXXXXX
:72: XXXXXXXXXXXXXXXXXXXX
//VALUE XXXXXXXXXXXXX
//REF XXXXXXXXXXXX
-}
|
|
|
Back to top |
|
 |
rsivananda Beginner
Joined: 11 Aug 2004 Posts: 30 Topics: 10
|
Posted: Wed Jun 13, 2007 4:36 am Post subject: |
|
|
yes bauer
the input shown above is just one msg and we have 1.5M such msgs
each input begins with {1: and ends with -}
i need to find the corresponding msg in the outfile for the corrsponding in infile..
Please let me know for more details...
Siva |
|
Back to top |
|
 |
CICS Guy Intermediate
Joined: 30 Apr 2007 Posts: 292 Topics: 3
|
Posted: Wed Jun 13, 2007 4:52 am Post subject: |
|
|
Finally....
Two files containing logical records that can span physical records - or can multiple logical records also share a physical record too?
Find matching logical records based upon a key(s) that float somewhere in the logical record.
Is that anywhere near what you need? |
|
Back to top |
|
 |
dbzTHEdinosauer Supermod
Joined: 20 Oct 2006 Posts: 1411 Topics: 26 Location: germany
|
Posted: Wed Jun 13, 2007 5:06 am Post subject: |
|
|
everyone,
please refer to this link which will give you a general understanding(confusion?? ) of the SWIFT msg architecture.
It is a PDF.... it defines the Swift Monetary Core Formats; not sure which MTtype we are playing with here- actually does not matter.
Swift is undergoing (has been continuously for last 10 years) changes, formats are changing and Data Centers that went the cheap route to implement Swift originally are caught in a lack-of-forethought trap of their own making similar to the challenges of EDI. I can only guess, but i imagine that the results of his match merge will/should provide the messages that the OP's system provided no response, as well as those that did invoke a response.
OP insists that Tags 21 and 32A will always be present.
These are variable length files; meaning the location of the tags can not be expected to be in the same place for any two records (of either file).
I think what the OP wants is to find the 21 & 32A tags of each record, and sort the files with these identified keys. Keep in mind that the length of the data associated with the 21 and 32A Tags is variable.
OP has two files- input file - swift msgs to OP's data center
- output file - OP data center responses
I would imagine that the OP needs to- sort/reformat (put a copy of the 21 & 32A Tags in front of each record) each file
- match the two sorted files and generate some kind of report
Prino has obviously encountered this situation before and has suggested a method whereby the two files are normalized (given a fixed structure) to simplify the SORTs and then the matching logic in PL/1.
I am not familiar with PL/1 and do not know the limitations when it comes to dealing with undefined structures - parsing. _________________ Dick Brenholtz
American living in Varel, Germany
Last edited by dbzTHEdinosauer on Wed Jun 13, 2007 5:32 am; edited 1 time in total |
|
Back to top |
|
 |
dbzTHEdinosauer Supermod
Joined: 20 Oct 2006 Posts: 1411 Topics: 26 Location: germany
|
Posted: Wed Jun 13, 2007 5:30 am Post subject: |
|
|
possibly the parse function of sort can be used to generate sorted files if the OP does not want to 'restructure' his files and is willing to deal with the necessary parsing logic in a PL/1 report pgm. _________________ Dick Brenholtz
American living in Varel, Germany |
|
Back to top |
|
 |
semigeezer Supermod
Joined: 03 Jan 2003 Posts: 1014 Topics: 13 Location: Atlantis
|
Posted: Wed Jun 13, 2007 4:12 pm Post subject: |
|
|
first impression:
Read and reformat each record (or set as it were) into a single data structure of variable length. Link these data structures into a balanced binary tree and then read the output file, searching the tree for each record. Should be very fast IF the whole think can fit in storage. I'd think that it can (say each record is 200 bytes, 1.5Million records = 300Meg+ a few more for overhead). If not, just store the keys and record offsets in the tree. A balanced tree search is Olog2 search time so you have at most log2(1500000) or 21 comparisons per output record. |
|
Back to top |
|
 |
|
|