[arrl-odv:34445] Recent LoTW outage

Members are asking me about it. I would appreciate if there was something I could tell them as to root cause. I would also like to know the details. Thanks Ria N2RJ

The problem is software performance due to a design that was never intended for this workload. Software performance is the problem – Michael Keane wrote that the use of 256bit encryption keys impbedded in the database field causes LoTW to store 10x as much data as needed. The security model ignores improements over the last 20+ years. Workload is the root cause. -- Mickey Baker, N4MB Director, Southeastern Division ARRL, The National Association for Amateur Radio® Phone (561) 320-2775 Email: n4mb@arrl.org<mailto:n4mb@arrl.org> ARRL Customer Service: (860) 594-0355 From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> on behalf of Jairam, Ria, N2RJ (Dir, HD) <n2rj@arrl.org> Date: Wednesday, January 4, 2023 at 1:01 PM To: arrl-odv@reflector.arrl.org <arrl-odv@reflector.arrl.org> Subject: [arrl-odv:34445] Recent LoTW outage Members are asking me about it. I would appreciate if there was something I could tell them as to root cause. I would also like to know the details. Thanks Ria N2RJ

Mickey is correct. The root cause is still a mystery to us. There are a ridiculous number of people requesting downloads of their confirmations. Either a website or a new release of software is killing us. We are being careful about broadcasting our concerns as we don't want a run on the server. The uploads are a single threaded process, one log at a time. The downloads, which include basic user queries, are in parallel and are unrestricted. Most times, basically ALL the time, this is not an issue. The past few days we've been blasted by download requests. One European user made over 1800 different download requests of his confirmations. I suspect something like HRD must have new functionality that is using the LoTW API, blissfully ignorant that their users are now killing the system. It is all hands on deck right now. And again, we are being very careful about the messaging until we've figured this out. David From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> On Behalf Of Baker, Mickey, N4MB (Dir, SE) Sent: Wednesday, January 4, 2023 2:09 PM To: Jairam, Ria, N2RJ (Dir, HD) <n2rj@arrl.org>; arrl-odv@reflector.arrl.org Subject: [arrl-odv:34446] Re: Recent LoTW outage The problem is software performance due to a design that was never intended for this workload. Software performance is the problem - Michael Keane wrote that the use of 256bit encryption keys impbedded in the database field causes LoTW to store 10x as much data as needed. The security model ignores improements over the last 20+ years. Workload is the root cause. -- Mickey Baker, N4MB Director, Southeastern Division ARRL, The National Association for Amateur Radio(r) Phone (561) 320-2775 Email: n4mb@arrl.org<mailto:n4mb@arrl.org> ARRL Customer Service: (860) 594-0355 From: arrl-odv <arrl-odv-bounces@reflector.arrl.org<mailto:arrl-odv-bounces@reflector.arrl.org>> on behalf of Jairam, Ria, N2RJ (Dir, HD) <n2rj@arrl.org<mailto:n2rj@arrl.org>> Date: Wednesday, January 4, 2023 at 1:01 PM To: arrl-odv@reflector.arrl.org<mailto:arrl-odv@reflector.arrl.org> <arrl-odv@reflector.arrl.org<mailto:arrl-odv@reflector.arrl.org>> Subject: [arrl-odv:34445] Recent LoTW outage Members are asking me about it. I would appreciate if there was something I could tell them as to root cause. I would also like to know the details. Thanks Ria N2RJ

The large number of LoTW downloads might be caused by hams getting adif files to submit their CQ Marathon yearly totals at the end of the year. That award is a big thing here on the West coast. The other may be related to the Crozet, FT8WW operation, but that is a maybe. 73; Mike W7VO
On 01/04/2023 1:26 PM Minster, David NA2AA (CEO) <dminster@arrl.org> wrote:
Mickey is correct.
The root cause is still a mystery to us. There are a ridiculous number of people requesting downloads of their confirmations. Either a website or a new release of software is killing us. We are being careful about broadcasting our concerns as we don’t want a run on the server.
The uploads are a single threaded process, one log at a time. The downloads, which include basic user queries, are in parallel and are unrestricted. Most times, basically ALL the time, this is not an issue. The past few days we’ve been blasted by download requests. One European user made over 1800 different download requests of his confirmations. I suspect something like HRD must have new functionality that is using the LoTW API, blissfully ignorant that their users are now killing the system.
It is all hands on deck right now. And again, we are being very careful about the messaging until we’ve figured this out.
David
From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> On Behalf Of Baker, Mickey, N4MB (Dir, SE) Sent: Wednesday, January 4, 2023 2:09 PM To: Jairam, Ria, N2RJ (Dir, HD) <n2rj@arrl.org>; arrl-odv@reflector.arrl.org Subject: [arrl-odv:34446] Re: Recent LoTW outage
The problem is software performance due to a design that was never intended for this workload.
Software performance is the problem – Michael Keane wrote that the use of 256bit encryption keys impbedded in the database field causes LoTW to store 10x as much data as needed. The security model ignores improements over the last 20+ years.
Workload is the root cause.
--
Mickey Baker, N4MB
Director, Southeastern Division
ARRL, The National Association for Amateur Radio®
Phone (561) 320-2775
Email: n4mb@arrl.org mailto:n4mb@arrl.org
ARRL Customer Service: (860) 594-0355
From: arrl-odv <arrl-odv-bounces@reflector.arrl.org mailto:arrl-odv-bounces@reflector.arrl.org > on behalf of Jairam, Ria, N2RJ (Dir, HD) <n2rj@arrl.org mailto:n2rj@arrl.org > Date: Wednesday, January 4, 2023 at 1:01 PM To: arrl-odv@reflector.arrl.org mailto:arrl-odv@reflector.arrl.org <arrl-odv@reflector.arrl.org mailto:arrl-odv@reflector.arrl.org > Subject: [arrl-odv:34445] Recent LoTW outage
Members are asking me about it. I would appreciate if there was something I could tell them as to root cause.
I would also like to know the details.
Thanks
Ria
N2RJ
_______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

<appeal from authority>As a skeptic in general and having extensive experience as a SRE for high volume web applications </appeal from authority> I didn’t buy the explanation that it was just load. There had to be a reason for the sudden change in behavior. Today it appears as though Bob Naumann has indeed found or at least communicated the root cause, a rogue application that kept hammering us and causing the application to do some compute or database intensive actions is the root cause, or at least is a step on the way to finding it. On a side note I’m concerned that applications are now developing features like real time or periodic automatic uploads and downloads to LoTW that add transactional load. This is why many public facing SaaS (software as a service) have limits and API keys to track abusive traffic and shut them down. But instead this could be an opportunity to add an API that allows single QSO uploads, much like club log. We can even add a HMAC if we insist on digitally signing every QSO. On a side note it is thanks to AA6YQ that I found this. Irony. Behold: ________________________________ From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> on behalf of Michael Ritz <w7vo@comcast.net> Sent: Wednesday, January 4, 2023 3:08:19 PM To: arrl-odv@reflector.arrl.org <arrl-odv@reflector.arrl.org> Subject: [arrl-odv:34451] Re: Recent LoTW outage The large number of LoTW downloads might be caused by hams getting adif files to submit their CQ Marathon yearly totals at the end of the year. That award is a big thing here on the West coast. The other may be related to the Crozet, FT8WW operation, but that is a maybe. 73; Mike W7VO On 01/04/2023 1:26 PM Minster, David NA2AA (CEO) <dminster@arrl.org> wrote: Mickey is correct. The root cause is still a mystery to us. There are a ridiculous number of people requesting downloads of their confirmations. Either a website or a new release of software is killing us. We are being careful about broadcasting our concerns as we don’t want a run on the server. The uploads are a single threaded process, one log at a time. The downloads, which include basic user queries, are in parallel and are unrestricted. Most times, basically ALL the time, this is not an issue. The past few days we’ve been blasted by download requests. One European user made over 1800 different download requests of his confirmations. I suspect something like HRD must have new functionality that is using the LoTW API, blissfully ignorant that their users are now killing the system. It is all hands on deck right now. And again, we are being very careful about the messaging until we’ve figured this out. David From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> On Behalf Of Baker, Mickey, N4MB (Dir, SE) Sent: Wednesday, January 4, 2023 2:09 PM To: Jairam, Ria, N2RJ (Dir, HD) <n2rj@arrl.org>; arrl-odv@reflector.arrl.org Subject: [arrl-odv:34446] Re: Recent LoTW outage The problem is software performance due to a design that was never intended for this workload. Software performance is the problem – Michael Keane wrote that the use of 256bit encryption keys impbedded in the database field causes LoTW to store 10x as much data as needed. The security model ignores improements over the last 20+ years. Workload is the root cause. -- Mickey Baker, N4MB Director, Southeastern Division ARRL, The National Association for Amateur Radio® Phone (561) 320-2775 Email: n4mb@arrl.org<mailto:n4mb@arrl.org> ARRL Customer Service: (860) 594-0355 From: arrl-odv <arrl-odv-bounces@reflector.arrl.org<mailto:arrl-odv-bounces@reflector.arrl.org>> on behalf of Jairam, Ria, N2RJ (Dir, HD) <n2rj@arrl.org<mailto:n2rj@arrl.org>> Date: Wednesday, January 4, 2023 at 1:01 PM To: arrl-odv@reflector.arrl.org<mailto:arrl-odv@reflector.arrl.org> <arrl-odv@reflector.arrl.org<mailto:arrl-odv@reflector.arrl.org>> Subject: [arrl-odv:34445] Recent LoTW outage Members are asking me about it. I would appreciate if there was something I could tell them as to root cause. I would also like to know the details. Thanks Ria N2RJ _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

And it looks like we’ve found our smoking gun. ________________________________ From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> on behalf of Jairam, Ria, N2RJ (Dir, HD) <n2rj@arrl.org> Sent: Saturday, January 7, 2023 7:22:43 PM To: arrl-odv@reflector.arrl.org <arrl-odv@reflector.arrl.org>; Ritz, Mike, W7VO, (Dir, NW) <w7vo@comcast.net> Subject: [arrl-odv:34460] Re: Recent LoTW outage <appeal from authority>As a skeptic in general and having extensive experience as a SRE for high volume web applications </appeal from authority> I didn’t buy the explanation that it was just load. There had to be a reason for the sudden change in behavior. Today it appears as though Bob Naumann has indeed found or at least communicated the root cause, a rogue application that kept hammering us and causing the application to do some compute or database intensive actions is the root cause, or at least is a step on the way to finding it. On a side note I’m concerned that applications are now developing features like real time or periodic automatic uploads and downloads to LoTW that add transactional load. This is why many public facing SaaS (software as a service) have limits and API keys to track abusive traffic and shut them down. But instead this could be an opportunity to add an API that allows single QSO uploads, much like club log. We can even add a HMAC if we insist on digitally signing every QSO. On a side note it is thanks to AA6YQ that I found this. Irony. Behold: ________________________________ From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> on behalf of Michael Ritz <w7vo@comcast.net> Sent: Wednesday, January 4, 2023 3:08:19 PM To: arrl-odv@reflector.arrl.org <arrl-odv@reflector.arrl.org> Subject: [arrl-odv:34451] Re: Recent LoTW outage The large number of LoTW downloads might be caused by hams getting adif files to submit their CQ Marathon yearly totals at the end of the year. That award is a big thing here on the West coast. The other may be related to the Crozet, FT8WW operation, but that is a maybe. 73; Mike W7VO On 01/04/2023 1:26 PM Minster, David NA2AA (CEO) <dminster@arrl.org> wrote: Mickey is correct. The root cause is still a mystery to us. There are a ridiculous number of people requesting downloads of their confirmations. Either a website or a new release of software is killing us. We are being careful about broadcasting our concerns as we don’t want a run on the server. The uploads are a single threaded process, one log at a time. The downloads, which include basic user queries, are in parallel and are unrestricted. Most times, basically ALL the time, this is not an issue. The past few days we’ve been blasted by download requests. One European user made over 1800 different download requests of his confirmations. I suspect something like HRD must have new functionality that is using the LoTW API, blissfully ignorant that their users are now killing the system. It is all hands on deck right now. And again, we are being very careful about the messaging until we’ve figured this out. David From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> On Behalf Of Baker, Mickey, N4MB (Dir, SE) Sent: Wednesday, January 4, 2023 2:09 PM To: Jairam, Ria, N2RJ (Dir, HD) <n2rj@arrl.org>; arrl-odv@reflector.arrl.org Subject: [arrl-odv:34446] Re: Recent LoTW outage The problem is software performance due to a design that was never intended for this workload. Software performance is the problem – Michael Keane wrote that the use of 256bit encryption keys impbedded in the database field causes LoTW to store 10x as much data as needed. The security model ignores improements over the last 20+ years. Workload is the root cause. -- Mickey Baker, N4MB Director, Southeastern Division ARRL, The National Association for Amateur Radio® Phone (561) 320-2775 Email: n4mb@arrl.org<mailto:n4mb@arrl.org> ARRL Customer Service: (860) 594-0355 From: arrl-odv <arrl-odv-bounces@reflector.arrl.org<mailto:arrl-odv-bounces@reflector.arrl.org>> on behalf of Jairam, Ria, N2RJ (Dir, HD) <n2rj@arrl.org<mailto:n2rj@arrl.org>> Date: Wednesday, January 4, 2023 at 1:01 PM To: arrl-odv@reflector.arrl.org<mailto:arrl-odv@reflector.arrl.org> <arrl-odv@reflector.arrl.org<mailto:arrl-odv@reflector.arrl.org>> Subject: [arrl-odv:34445] Recent LoTW outage Members are asking me about it. I would appreciate if there was something I could tell them as to root cause. I would also like to know the details. Thanks Ria N2RJ _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv
participants (4)
-
Baker, Mickey, N4MB (Dir, SE)
-
Jairam, Ria, N2RJ (Dir, HD)
-
Michael Ritz
-
Minster, David NA2AA (CEO)