It occurs to me that we could treat files with duplicates differently from others.  If we screened a file for dupes and found some number of duplicated QSOs, then we could defer processing of that file for some period of time.  It's pretty hard to have zero dupes for logs like this, but if we put some tolerance on it, we could probably capture all the egregious violations and allow through those logs that, for example, were submitted on a day and then had additional QSOs on the same date (since date is a usual way to select which QSOs to upload).  There may be other ways, but I daresay that putting logs with high duplicates on a "slow track" would result in an instant reduction in such submissions. 

Of course, better feedback to a user that his log has been received but not processed would also help close the loop, which is another substantial part of improvements we could make.  I've suggested this before but haven't heard any recent discussion of it.

On to New Orleans and further discussions!
73,
        Greg, K0GW


On Mon, Dec 31, 2012 at 12:47 PM, Tom Frenaye <frenaye@pcnet.com> wrote:
At 01:27 PM 12/31/2012, Jim Weaver K8JE wrote:

>Please define "duplicates" in relation to your message -- e.g. duplicate of
>a Q previously entered via LOTW, duplicate of Q previously entered by paper
>QSL or either/both?

Hi Jim -

These are duplicates of previously submitted electronic QSOs on LoTW.  The usual problem is someone who just sends his whole log each time.
It can also be someone who makes a correction in his log (one QSO) and then sends it in again.

This is not a duplicate in the way you might think of it in a contest log.  These are repeats of the same QSO at the same date/time in a second LoTW file.

They do have easy ways to handle files that are exact copies of previous files.  It's the ones that are just a little different that require the extra work.
For example, a second file may just have one QSO record repeated, so one record will be added to LoTW but all of the other records have to be checked.

This may be better explained in person than by me in e-mail....   hope I haven't caused more confusion.

    Tom




>-----Original Message-----
>From: arrl-odv-bounces@reflector.arrl.org
>[mailto:arrl-odv-bounces@reflector.arrl.org] On Behalf Of Tom Frenaye
>Sent: 27 December, 2012 12:11 PM
>To: arrl-odv
>Subject: [arrl-odv:21347] Re: LoTW
>
>At 11:06 AM 12/27/2012, Kermit Carlson wrote:
>
>>          I do realize that the hardware upgrade will
>>help the throughput, but using these numbers -
>>2250 files with 300 QSO (approximately) each
>>in 24 hours is the same as a average processing
>>time of 128 milliseconds per QSO record, which
>>would seem high.....hopefully the hardware will
>>provide an improved situation.
>
>
>I'm told that 80% of the QSOs being processed are duplicates, so there is a
>lot of extra work being done to deal with dupes.    Mike Keane says that
>about 800k QSO records are being processed daily.  That translates to almost
>25m records a month, but December will end with the number of QSO records
>added to the database at only about 7.2m for the month.
>
>This is primarily because many people are submitting logs that includes new
>and previous QSOs, just to be sure...  (or submitting logs a second or third
>time)
>
>It reminds me of the "insurance QSOs" that people make with DXpeditions.
>Good for the individual, bad for everyone else as the throughput and
>efficiency suffers.
>
>I know there is a lot of thinking going on by IT about how to minimize the
>impact of the duplicates, and how to educate users so the impact is less.
>
>Mike Keane's analysis was that the primary way to improve/reduce the time it
>took to process one record was speeding up time it took to write a new
>record to the database.   That time is constrained by the magnetic disk
>drives we are using.   Upgrading to solid state disk drives will provide a
>significant increase in speed.   (He said roughly 25x faster - >100k IOPS vs
>4k IOPS - input/output operations per second - see
>http://en.wikipedia.org/wiki/IOPS )
>
>The hardware upgrade should provide a fairly immediate impact, but the
>impact of duplicate records will need work in the future.    They're working
>hard to get the new drives on line as soon as possible.
>
>Barry - correct me if I'm wrong ...
>
>  Tom
>
>
>=====
>e-mail: k1ki@arrl.org   ARRL New England Division Director
>http://www.arrl.org/
>Tom Frenaye, K1KI, P O Box J, West Suffield CT 06093 Phone: 860-668-5444
>
>
>_______________________________________________
>arrl-odv mailing list
>arrl-odv@reflector.arrl.org
>http://reflector.arrl.org/mailman/listinfo/arrl-odv
>
>
>-----
>No virus found in this message.
>Checked by AVG - www.avg.com
>Version: 2013.0.2805 / Virus Database: 2637/5999 - Release Date: 12/31/12
>
>_______________________________________________
>arrl-odv mailing list
>arrl-odv@reflector.arrl.org
>http://reflector.arrl.org/mailman/listinfo/arrl-odv

=====
e-mail: k1ki@arrl.org   ARRL New England Division Director  http://www.arrl.org/
Tom Frenaye, K1KI, P O Box J, West Suffield CT 06093 Phone: 860-668-5444


_______________________________________________
arrl-odv mailing list
arrl-odv@reflector.arrl.org
http://reflector.arrl.org/mailman/listinfo/arrl-odv



--
73,
Greg Widin, K0GW
ARRL Dakota Division Director
ARRL--of, by and for the Radio Amateur