[arrl-odv:21371] Reducing duplicate submissions to LotW

It occurs to me that we could treat files with duplicates differently from others. If we screened a file for dupes and found some number of duplicated QSOs, then we could defer processing of that file for some period of time. It's pretty hard to have zero dupes for logs like this, but if we put some tolerance on it, we could probably capture all the egregious violations and allow through those logs that, for example, were submitted on a day and then had additional QSOs on the same date (since date is a usual way to select which QSOs to upload). There may be other ways, but I daresay that putting logs with high duplicates on a "slow track" would result in an instant reduction in such submissions. Of course, better feedback to a user that his log has been received but not processed would also help close the loop, which is another substantial part of improvements we could make. I've suggested this before but haven't heard any recent discussion of it. On to New Orleans and further discussions! 73, Greg, K0GW On Mon, Dec 31, 2012 at 12:47 PM, Tom Frenaye <frenaye@pcnet.com> wrote:
At 01:27 PM 12/31/2012, Jim Weaver K8JE wrote:
Please define "duplicates" in relation to your message -- e.g. duplicate of a Q previously entered via LOTW, duplicate of Q previously entered by paper QSL or either/both?
Hi Jim -
These are duplicates of previously submitted electronic QSOs on LoTW. The usual problem is someone who just sends his whole log each time. It can also be someone who makes a correction in his log (one QSO) and then sends it in again.
This is not a duplicate in the way you might think of it in a contest log. These are repeats of the same QSO at the same date/time in a second LoTW file.
They do have easy ways to handle files that are exact copies of previous files. It's the ones that are just a little different that require the extra work. For example, a second file may just have one QSO record repeated, so one record will be added to LoTW but all of the other records have to be checked.
This may be better explained in person than by me in e-mail.... hope I haven't caused more confusion.
Tom
-----Original Message----- From: arrl-odv-bounces@reflector.arrl.org [mailto:arrl-odv-bounces@reflector.arrl.org] On Behalf Of Tom Frenaye Sent: 27 December, 2012 12:11 PM To: arrl-odv Subject: [arrl-odv:21347] Re: LoTW
At 11:06 AM 12/27/2012, Kermit Carlson wrote:
I do realize that the hardware upgrade will help the throughput, but using these numbers - 2250 files with 300 QSO (approximately) each in 24 hours is the same as a average processing time of 128 milliseconds per QSO record, which would seem high.....hopefully the hardware will provide an improved situation.
I'm told that 80% of the QSOs being processed are duplicates, so there is a lot of extra work being done to deal with dupes. Mike Keane says that about 800k QSO records are being processed daily. That translates to almost 25m records a month, but December will end with the number of QSO records added to the database at only about 7.2m for the month.
This is primarily because many people are submitting logs that includes new and previous QSOs, just to be sure... (or submitting logs a second or third time)
It reminds me of the "insurance QSOs" that people make with DXpeditions. Good for the individual, bad for everyone else as the throughput and efficiency suffers.
I know there is a lot of thinking going on by IT about how to minimize the impact of the duplicates, and how to educate users so the impact is less.
Mike Keane's analysis was that the primary way to improve/reduce the time it took to process one record was speeding up time it took to write a new record to the database. That time is constrained by the magnetic disk drives we are using. Upgrading to solid state disk drives will provide a significant increase in speed. (He said roughly 25x faster - >100k IOPS vs 4k IOPS - input/output operations per second - see http://en.wikipedia.org/wiki/IOPS )
The hardware upgrade should provide a fairly immediate impact, but the impact of duplicate records will need work in the future. They're working hard to get the new drives on line as soon as possible.
Barry - correct me if I'm wrong ...
Tom
===== e-mail: k1ki@arrl.org ARRL New England Division Director http://www.arrl.org/ Tom Frenaye, K1KI, P O Box J, West Suffield CT 06093 Phone: 860-668-5444
_______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org http://reflector.arrl.org/mailman/listinfo/arrl-odv
----- No virus found in this message. Checked by AVG - www.avg.com Version: 2013.0.2805 / Virus Database: 2637/5999 - Release Date: 12/31/12
_______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org http://reflector.arrl.org/mailman/listinfo/arrl-odv
===== e-mail: k1ki@arrl.org ARRL New England Division Director http://www.arrl.org/ Tom Frenaye, K1KI, P O Box J, West Suffield CT 06093 Phone: 860-668-5444
_______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org http://reflector.arrl.org/mailman/listinfo/arrl-odv
-- 73, Greg Widin, K0GW ARRL Dakota Division Director ARRL--of, by and for the Radio Amateur

I'm noting everything Greg is posting here. I should also comment that we might want to think about a LoTW tutorial for users. Perhaps in an interactive online format to cut down on some of the frustrations and trim down the size of some uploaded logs. 73 David A. Norris, K5UZ Director Delta Division Sent from my iPhone On Dec 31, 2012, at 1:44 PM, G Widin <gpwidin@comcast.net> wrote:
It occurs to me that we could treat files with duplicates differently from others. If we screened a file for dupes and found some number of duplicated QSOs, then we could defer processing of that file for some period of time. It's pretty hard to have zero dupes for logs like this, but if we put some tolerance on it, we could probably capture all the egregious violations and allow through those logs that, for example, were submitted on a day and then had additional QSOs on the same date (since date is a usual way to select which QSOs to upload). There may be other ways, but I daresay that putting logs with high duplicates on a "slow track" would result in an instant reduction in such submissions.
Of course, better feedback to a user that his log has been received but not processed would also help close the loop, which is another substantial part of improvements we could make. I've suggested this before but haven't heard any recent discussion of it.
On to New Orleans and further discussions! 73, Greg, K0GW
On Mon, Dec 31, 2012 at 12:47 PM, Tom Frenaye <frenaye@pcnet.com> wrote: At 01:27 PM 12/31/2012, Jim Weaver K8JE wrote:
Please define "duplicates" in relation to your message -- e.g. duplicate of a Q previously entered via LOTW, duplicate of Q previously entered by paper QSL or either/both?
Hi Jim -
These are duplicates of previously submitted electronic QSOs on LoTW. The usual problem is someone who just sends his whole log each time. It can also be someone who makes a correction in his log (one QSO) and then sends it in again.
This is not a duplicate in the way you might think of it in a contest log. These are repeats of the same QSO at the same date/time in a second LoTW file.
They do have easy ways to handle files that are exact copies of previous files. It's the ones that are just a little different that require the extra work. For example, a second file may just have one QSO record repeated, so one record will be added to LoTW but all of the other records have to be checked.
This may be better explained in person than by me in e-mail.... hope I haven't caused more confusion.
Tom
-----Original Message----- From: arrl-odv-bounces@reflector.arrl.org [mailto:arrl-odv-bounces@reflector.arrl.org] On Behalf Of Tom Frenaye Sent: 27 December, 2012 12:11 PM To: arrl-odv Subject: [arrl-odv:21347] Re: LoTW
At 11:06 AM 12/27/2012, Kermit Carlson wrote:
I do realize that the hardware upgrade will help the throughput, but using these numbers - 2250 files with 300 QSO (approximately) each in 24 hours is the same as a average processing time of 128 milliseconds per QSO record, which would seem high.....hopefully the hardware will provide an improved situation.
I'm told that 80% of the QSOs being processed are duplicates, so there is a lot of extra work being done to deal with dupes. Mike Keane says that about 800k QSO records are being processed daily. That translates to almost 25m records a month, but December will end with the number of QSO records added to the database at only about 7.2m for the month.
This is primarily because many people are submitting logs that includes new and previous QSOs, just to be sure... (or submitting logs a second or third time)
It reminds me of the "insurance QSOs" that people make with DXpeditions. Good for the individual, bad for everyone else as the throughput and efficiency suffers.
I know there is a lot of thinking going on by IT about how to minimize the impact of the duplicates, and how to educate users so the impact is less.
Mike Keane's analysis was that the primary way to improve/reduce the time it took to process one record was speeding up time it took to write a new record to the database. That time is constrained by the magnetic disk drives we are using. Upgrading to solid state disk drives will provide a significant increase in speed. (He said roughly 25x faster - >100k IOPS vs 4k IOPS - input/output operations per second - see http://en.wikipedia.org/wiki/IOPS )
The hardware upgrade should provide a fairly immediate impact, but the impact of duplicate records will need work in the future. They're working hard to get the new drives on line as soon as possible.
Barry - correct me if I'm wrong ...
Tom
===== e-mail: k1ki@arrl.org ARRL New England Division Director http://www.arrl.org/ Tom Frenaye, K1KI, P O Box J, West Suffield CT 06093 Phone: 860-668-5444
_______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org http://reflector.arrl.org/mailman/listinfo/arrl-odv
----- No virus found in this message. Checked by AVG - www.avg.com Version: 2013.0.2805 / Virus Database: 2637/5999 - Release Date: 12/31/12
_______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org http://reflector.arrl.org/mailman/listinfo/arrl-odv
===== e-mail: k1ki@arrl.org ARRL New England Division Director http://www.arrl.org/ Tom Frenaye, K1KI, P O Box J, West Suffield CT 06093 Phone: 860-668-5444
_______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org http://reflector.arrl.org/mailman/listinfo/arrl-odv
-- 73, Greg Widin, K0GW ARRL Dakota Division Director ARRL--of, by and for the Radio Amateur _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org http://reflector.arrl.org/mailman/listinfo/arrl-odv
participants (2)
-
David Norris
-
G Widin