[arrl-odv:31409] LOTW Broke?

newer
[arrl-odv:31418] Fwd:...

older
[arrl-odv:31413] Advance Notice of...

Mark J Tharp

30 Nov 2020 30 Nov '20

6:22 p.m.

Third time this morning I have seen this error. Is it just me? Mark, HDX [image: image.png]

Attachments:

attachment.html (text/html — 268 bytes)
image.png (image/png — 54.5 KB)

Show replies by date

Minster, David NA2AA (CEO)

30 Nov 30 Nov

6:26 p.m.

New subject: [arrl-odv:31410] Re: LOTW Broke?

I just jumped on to see if I could replicate. Went through ARRL network and cellular network. Went into logs and awards. No errors. I will forward onto the team in case this is the tip of the iceberg. David From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> On Behalf Of Mark J Tharp Sent: Monday, November 30, 2020 1:22 PM To: arrl-odv <arrl-odv@arrl.org> Subject: [arrl-odv:31409] LOTW Broke? Third time this morning I have seen this error. Is it just me? Mark, HDX [cid:image002.png@01D6C71C.5B5B4C50]

k5ur＠aol.com

6:30 p.m.

New subject: [arrl-odv:31411] Re: LOTW Broke?

It's backlogged due to all of of the downloads from the CQWW contest this past weekend. It was running 10 hours behind earlier this morning. Rick -----Original Message----- From: Minster, David NA2AA (CEO) <dminster@arrl.org> To: Tharp, Mark, KB7HDX (VD, NW) <kb7hdx@gmail.com>; arrl-odv <arrl-odv@arrl.org> Sent: Mon, Nov 30, 2020 12:26 pm Subject: [arrl-odv:31410] Re: LOTW Broke? #yiv3731088019 #yiv3731088019 -- _filtered {} _filtered {}#yiv3731088019 #yiv3731088019 p.yiv3731088019MsoNormal, #yiv3731088019 li.yiv3731088019MsoNormal, #yiv3731088019 div.yiv3731088019MsoNormal {margin:0in;font-size:11.0pt;font-family:sans-serif;}#yiv3731088019 .yiv3731088019MsoChpDefault {font-family:sans-serif;} _filtered {}#yiv3731088019 div.yiv3731088019WordSection1 {}#yiv3731088019 I just jumped on to see if I could replicate. Went through ARRL network and cellular network. Went into logs and awards. No errors. I will forward onto the team in case this is the tip of the iceberg. David From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> On Behalf Of Mark J Tharp Sent: Monday, November 30, 2020 1:22 PM To: arrl-odv <arrl-odv@arrl.org> Subject: [arrl-odv:31409] LOTW Broke? Third time this morning I have seen this error. Is it just me? Mark, HDX _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

Mickey Baker

6:46 p.m.

New subject: [arrl-odv:31412] Re: LOTW Broke?

One of the issues David Minster hit on during the A&F meeting corresponded to some posts I made years ago in the LoTW online group. There is no reason that this system should NOT be running on a cloud-based virtualized server. Even if portions of the design are single threaded, the ability to add processing power (memory, cpu, IO bandwidth) on the fly and in anticipation of these peak workload, is essential and CAN BE DONE QUICKLY without a total system re-write. System capacity can likely be then bumped up 4x-10x around contest weekends without purchasing gear. Sticking it on a hardware server in a data center is prudent but not smart with a workload that varies like this. Corporations do this routinely with workload, such as month/year end closings, trial balances, etc. I have skills needed to design a plan to do this and will offer, again, to DONATE time to analyze our current state develop a confidential plan. One of the reasons I ran for this position is that the prior director was clueless about League IT challenges. I know this is one of many, but it is “low hanging fruit” and can be addressed before a design of lotw 2.0 can be agreed upon. 73, Mickey N4MB On Mon, Nov 30, 2020 at 1:30 PM Roderick, Rick, K5UR via arrl-odv < arrl-odv@reflector.arrl.org> wrote:

...

It's backlogged due to all of of the downloads from the CQWW contest this past weekend. It was running 10 hours behind earlier this morning.

Rick

-----Original Message----- From: Minster, David NA2AA (CEO) <dminster@arrl.org> To: Tharp, Mark, KB7HDX (VD, NW) <kb7hdx@gmail.com>; arrl-odv < arrl-odv@arrl.org> Sent: Mon, Nov 30, 2020 12:26 pm Subject: [arrl-odv:31410] Re: LOTW Broke?

I just jumped on to see if I could replicate. Went through ARRL network and cellular network. Went into logs and awards. No errors.

I will forward onto the team in case this is the tip of the iceberg.

David

*From:* arrl-odv <arrl-odv-bounces@reflector.arrl.org> * On Behalf Of *Mark J Tharp *Sent:* Monday, November 30, 2020 1:22 PM *To:* arrl-odv <arrl-odv@arrl.org> *Subject:* [arrl-odv:31409] LOTW Broke?

Third time this morning I have seen this error.

Is it just me?

Mark, HDX _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

-- “Ends and beginnings—there are no such things. There are only middles.” Robert Frost

rjairam＠gmail.com

9:15 p.m.

New subject: [arrl-odv:31414] Re: LOTW Broke?

When I did this sort of thing (elections news coverage) we scaled up the day before and set thresholds and auto dynamic scaling. Right now actually I do things like this for Black Friday and cyber Monday when billions of credit card swipes take place, but using private cloud. However we needn’t go that far, YET. An easy win would be to put the database on a managed database service like AWS RDS. The web server isn’t usually the bottleneck. The database is. The application server isn’t even to blame because in our app we aren’t doing intensive compute. We are mostly doing database inserts and selects. A managed database will have capacity and scaling completely transparently and you don’t even have to think about this. Amazon just scales the database using its theoretically infinite resources. Hopefully our attitude about cloud computing can change. When I mentioned cloud to Mike Keane in 2019 he said that we shouldn’t bother because we had “sunk cost” into things like HVAC in HQ. Dave AA6YQ also said we should be investing in programmers and not hardware/cloud. I did recommend a cloud database for LoTW in one of the IT infra committee reports. Hopefully we can at least do that. I can almost guarantee that it will alleviate 90% of the post contest crashiness. 73 Ria N2RJ On Mon, Nov 30, 2020 at 1:46 PM Mickey Baker <fishflorida@gmail.com> wrote:

...

One of the issues David Minster hit on during the A&F meeting corresponded to some posts I made years ago in the LoTW online group.

There is no reason that this system should NOT be running on a cloud-based virtualized server. Even if portions of the design are single threaded, the ability to add processing power (memory, cpu, IO bandwidth) on the fly and in anticipation of these peak workload, is essential and CAN BE DONE QUICKLY without a total system re-write.

System capacity can likely be then bumped up 4x-10x around contest weekends without purchasing gear.

Sticking it on a hardware server in a data center is prudent but not smart with a workload that varies like this.

Corporations do this routinely with workload, such as month/year end closings, trial balances, etc.

I have skills needed to design a plan to do this and will offer, again, to DONATE time to analyze our current state develop a confidential plan.

One of the reasons I ran for this position is that the prior director was clueless about League IT challenges. I know this is one of many, but it is “low hanging fruit” and can be addressed before a design of lotw 2.0 can be agreed upon.

73,

Mickey N4MB

On Mon, Nov 30, 2020 at 1:30 PM Roderick, Rick, K5UR via arrl-odv < arrl-odv@reflector.arrl.org> wrote:

...
It's backlogged due to all of of the downloads from the CQWW contest this past weekend. It was running 10 hours behind earlier this morning.

Rick

-----Original Message----- From: Minster, David NA2AA (CEO) <dminster@arrl.org> To: Tharp, Mark, KB7HDX (VD, NW) <kb7hdx@gmail.com>; arrl-odv < arrl-odv@arrl.org> Sent: Mon, Nov 30, 2020 12:26 pm Subject: [arrl-odv:31410] Re: LOTW Broke?

I just jumped on to see if I could replicate. Went through ARRL network and cellular network. Went into logs and awards. No errors.

I will forward onto the team in case this is the tip of the iceberg.

David

*From:* arrl-odv <arrl-odv-bounces@reflector.arrl.org> * On Behalf Of *Mark J Tharp *Sent:* Monday, November 30, 2020 1:22 PM *To:* arrl-odv <arrl-odv@arrl.org> *Subject:* [arrl-odv:31409] LOTW Broke?

Third time this morning I have seen this error.

Is it just me?

Mark, HDX _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

-- “Ends and beginnings—there are no such things. There are only middles.” Robert Frost _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

Mickey Baker

10:36 p.m.

New subject: [arrl-odv:31415] Re: LOTW Broke?

Since you're going deep, the first step is to measure the resources on whatever platform it is today. I believe I've heard that it isn't virtualized, and that the database is (was?) an ancient database from SAP's R2 era that is no longer supported... I'd like to see the resource starvation that is causing 10 hour turnaround times. I worked for the storage vendor where LoTW was platformed in December 2012 and communicated with Keane - the application had saturated the storage system and he had ordered new servers and solid state storage, although the storage system was BROKEN, causing I/O delays and several day's loss of uploads. It was not under contract to repair, but I was authorized to offer a no-charge repair. He didn't take me up on it. At the time, I told him that I thought that this was a temporary fix and that he needed to take a closer look at the architecture so that all these lookups were using a more efficient algorithm. I'm not prepared to judge what resource is the constraint. I would suggest some simple measurements that would show what's going on, now, at the time we're seeing this, so we are able to make some valid assumptions. This is not a predictable "big shop" environment like a bank or a worldwide television network. We need to grab whatever data we can when we can. If this isn't MySQL, Oracle or MS SQL Server, we won't be able to purchase dbserver resources, but an early step would be to get everything on those databases. ... and I've seen issues like "scratchpad" or journaling IO cause delays where fixing the DB server might not help. What usually happens in small shops is that performance demand outstrips capacity and applications queue. Response times on these queues cause upstream timeouts because of these queued requests. (Little's Law), things time out and then they fail. Where? Can you send me or point me to the IT Committee reports? Thanks, Mickey Baker, N4MB Palm Beach Gardens, FL *“The servant-leader is servant first… It begins with the natural feeling that one wants to serve, to serve first. Then conscious choice brings one to aspire to lead." Robert K. Greenleaf* On Mon, Nov 30, 2020 at 4:15 PM rjairam@gmail.com <rjairam@gmail.com> wrote:

...

When I did this sort of thing (elections news coverage) we scaled up the day before and set thresholds and auto dynamic scaling. Right now actually I do things like this for Black Friday and cyber Monday when billions of credit card swipes take place, but using private cloud.

However we needn’t go that far, YET.

An easy win would be to put the database on a managed database service like AWS RDS. The web server isn’t usually the bottleneck. The database is. The application server isn’t even to blame because in our app we aren’t doing intensive compute. We are mostly doing database inserts and selects.

A managed database will have capacity and scaling completely transparently and you don’t even have to think about this. Amazon just scales the database using its theoretically infinite resources.

Hopefully our attitude about cloud computing can change. When I mentioned cloud to Mike Keane in 2019 he said that we shouldn’t bother because we had “sunk cost” into things like HVAC in HQ. Dave AA6YQ also said we should be investing in programmers and not hardware/cloud.

I did recommend a cloud database for LoTW in one of the IT infra committee reports. Hopefully we can at least do that. I can almost guarantee that it will alleviate 90% of the post contest crashiness.

73 Ria N2RJ

On Mon, Nov 30, 2020 at 1:46 PM Mickey Baker <fishflorida@gmail.com> wrote:

...
One of the issues David Minster hit on during the A&F meeting corresponded to some posts I made years ago in the LoTW online group.

There is no reason that this system should NOT be running on a cloud-based virtualized server. Even if portions of the design are single threaded, the ability to add processing power (memory, cpu, IO bandwidth) on the fly and in anticipation of these peak workload, is essential and CAN BE DONE QUICKLY without a total system re-write.

System capacity can likely be then bumped up 4x-10x around contest weekends without purchasing gear.

Sticking it on a hardware server in a data center is prudent but not smart with a workload that varies like this.

Corporations do this routinely with workload, such as month/year end closings, trial balances, etc.

I have skills needed to design a plan to do this and will offer, again, to DONATE time to analyze our current state develop a confidential plan.

One of the reasons I ran for this position is that the prior director was clueless about League IT challenges. I know this is one of many, but it is “low hanging fruit” and can be addressed before a design of lotw 2.0 can be agreed upon.

73,

Mickey N4MB

On Mon, Nov 30, 2020 at 1:30 PM Roderick, Rick, K5UR via arrl-odv < arrl-odv@reflector.arrl.org> wrote:

...
It's backlogged due to all of of the downloads from the CQWW contest this past weekend. It was running 10 hours behind earlier this morning.

Rick

-----Original Message----- From: Minster, David NA2AA (CEO) <dminster@arrl.org> To: Tharp, Mark, KB7HDX (VD, NW) <kb7hdx@gmail.com>; arrl-odv < arrl-odv@arrl.org> Sent: Mon, Nov 30, 2020 12:26 pm Subject: [arrl-odv:31410] Re: LOTW Broke?

I just jumped on to see if I could replicate. Went through ARRL network and cellular network. Went into logs and awards. No errors.

I will forward onto the team in case this is the tip of the iceberg.

David

*From:* arrl-odv <arrl-odv-bounces@reflector.arrl.org> * On Behalf Of *Mark J Tharp *Sent:* Monday, November 30, 2020 1:22 PM *To:* arrl-odv <arrl-odv@arrl.org> *Subject:* [arrl-odv:31409] LOTW Broke?

Third time this morning I have seen this error.

Is it just me?

Mark, HDX _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

-- “Ends and beginnings—there are no such things. There are only middles.” Robert Frost _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

rjairam＠gmail.com

10:49 p.m.

New subject: [arrl-odv:31416] Re: LOTW Broke?

Based on the architecture and the purpose of the app, without even looking, I can say fairly conclusively that it is the database. 10 hour processing signals bottlenecks in the database. We aren't doing intense computation (such as modeling, simulations or rendering video/audio) so that rules that out. We are simply matching records in a database. I have seen this in similar apps. Database optimizations and query results generally yielded the biggest performance improvements. But sure, let's put an APM tool on it like new relic or appdynamics to diagnose where it is, if there isn't one already. That will tell us conclusively. AppDynamics (now a Cisco product) is my personal favorite as I used to speak at their conferences (AppJam) and was one of their early customers, but New Relic also works well, and I've used both so I can attest to their effectiveness. If we have an APM, that should be the first place to look, so we don't keep guessing. 73 Ria, N2RJ On Mon, Nov 30, 2020 at 5:36 PM Mickey Baker <fishflorida@gmail.com> wrote:

...

Since you're going deep, the first step is to measure the resources on whatever platform it is today. I believe I've heard that it isn't virtualized, and that the database is (was?) an ancient database from SAP's R2 era that is no longer supported...

I'd like to see the resource starvation that is causing 10 hour turnaround times. I worked for the storage vendor where LoTW was platformed in December 2012 and communicated with Keane - the application had saturated the storage system and he had ordered new servers and solid state storage, although the storage system was BROKEN, causing I/O delays and several day's loss of uploads. It was not under contract to repair, but I was authorized to offer a no-charge repair. He didn't take me up on it.

At the time, I told him that I thought that this was a temporary fix and that he needed to take a closer look at the architecture so that all these lookups were using a more efficient algorithm.

I'm not prepared to judge what resource is the constraint. I would suggest some simple measurements that would show what's going on, now, at the time we're seeing this, so we are able to make some valid assumptions.

This is not a predictable "big shop" environment like a bank or a worldwide television network. We need to grab whatever data we can when we can. If this isn't MySQL, Oracle or MS SQL Server, we won't be able to purchase dbserver resources, but an early step would be to get everything on those databases.

... and I've seen issues like "scratchpad" or journaling IO cause delays where fixing the DB server might not help.

What usually happens in small shops is that performance demand outstrips capacity and applications queue. Response times on these queues cause upstream timeouts because of these queued requests. (Little's Law), things time out and then they fail.

Where?

Can you send me or point me to the IT Committee reports?

Thanks,

Mickey Baker, N4MB Palm Beach Gardens, FL “The servant-leader is servant first… It begins with the natural feeling that one wants to serve, to serve first. Then conscious choice brings one to aspire to lead." Robert K. Greenleaf

On Mon, Nov 30, 2020 at 4:15 PM rjairam@gmail.com <rjairam@gmail.com> wrote:

...
When I did this sort of thing (elections news coverage) we scaled up the day before and set thresholds and auto dynamic scaling. Right now actually I do things like this for Black Friday and cyber Monday when billions of credit card swipes take place, but using private cloud.

However we needn’t go that far, YET.

An easy win would be to put the database on a managed database service like AWS RDS. The web server isn’t usually the bottleneck. The database is. The application server isn’t even to blame because in our app we aren’t doing intensive compute. We are mostly doing database inserts and selects.

A managed database will have capacity and scaling completely transparently and you don’t even have to think about this. Amazon just scales the database using its theoretically infinite resources.

Hopefully our attitude about cloud computing can change. When I mentioned cloud to Mike Keane in 2019 he said that we shouldn’t bother because we had “sunk cost” into things like HVAC in HQ. Dave AA6YQ also said we should be investing in programmers and not hardware/cloud.

I did recommend a cloud database for LoTW in one of the IT infra committee reports. Hopefully we can at least do that. I can almost guarantee that it will alleviate 90% of the post contest crashiness.

73 Ria N2RJ

On Mon, Nov 30, 2020 at 1:46 PM Mickey Baker <fishflorida@gmail.com> wrote:

...
One of the issues David Minster hit on during the A&F meeting corresponded to some posts I made years ago in the LoTW online group.

There is no reason that this system should NOT be running on a cloud-based virtualized server. Even if portions of the design are single threaded, the ability to add processing power (memory, cpu, IO bandwidth) on the fly and in anticipation of these peak workload, is essential and CAN BE DONE QUICKLY without a total system re-write.

System capacity can likely be then bumped up 4x-10x around contest weekends without purchasing gear.

Sticking it on a hardware server in a data center is prudent but not smart with a workload that varies like this.

Corporations do this routinely with workload, such as month/year end closings, trial balances, etc.

I have skills needed to design a plan to do this and will offer, again, to DONATE time to analyze our current state develop a confidential plan.

One of the reasons I ran for this position is that the prior director was clueless about League IT challenges. I know this is one of many, but it is “low hanging fruit” and can be addressed before a design of lotw 2.0 can be agreed upon.

73,

Mickey N4MB

On Mon, Nov 30, 2020 at 1:30 PM Roderick, Rick, K5UR via arrl-odv <arrl-odv@reflector.arrl.org> wrote:

...
It's backlogged due to all of of the downloads from the CQWW contest this past weekend. It was running 10 hours behind earlier this morning.

Rick

-----Original Message----- From: Minster, David NA2AA (CEO) <dminster@arrl.org> To: Tharp, Mark, KB7HDX (VD, NW) <kb7hdx@gmail.com>; arrl-odv <arrl-odv@arrl.org> Sent: Mon, Nov 30, 2020 12:26 pm Subject: [arrl-odv:31410] Re: LOTW Broke?

I just jumped on to see if I could replicate. Went through ARRL network and cellular network. Went into logs and awards. No errors.

I will forward onto the team in case this is the tip of the iceberg.

David

From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> On Behalf Of Mark J Tharp Sent: Monday, November 30, 2020 1:22 PM To: arrl-odv <arrl-odv@arrl.org> Subject: [arrl-odv:31409] LOTW Broke?

Third time this morning I have seen this error.

Is it just me?

Mark, HDX _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

-- “Ends and beginnings—there are no such things. There are only middles.” Robert Frost _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

Mickey Baker

1 Dec 1 Dec

12:43 a.m.

New subject: [arrl-odv:31417] Re: LOTW Broke?

Unless (until) we're ready for a rewrite of the app, we simply look at the behavior of the current app in its current state and "feed the beast" more resources. If we started today, we are likely two contest seasons away from a redesign and implementation. I'm assuming that there is some level of queue management for incoming uploads. I know that was the plan when I looked at this 8 years ago. I propose that we look at the current workload, find the bottleneck, virtualize what we have, and apply virtual resources as needed to avoid the problem we are having now, which is more than a simple upload issue, according to the screenshots submitted by W7VO. Something failed. This is Linux based? Simple top and gkrellm tools can tell us what we need to know without buying anything - which servers are saturated by what. I know appdymanics is not inexpensive, but it is more targeted toward architectural improvements in complex environments. Still don't even know what the DB engine is or what the architecture is, so I can't begin to tell you what the specific problem is, or how to fix it, but virtual infrastructure can fix a lot of bad design to the limits of the architecture. Would you share whatever documentation you have on LoTW? Thanks, Mickey Baker, N4MB Palm Beach Gardens, FL *“The servant-leader is servant first… It begins with the natural feeling that one wants to serve, to serve first. Then conscious choice brings one to aspire to lead." Robert K. Greenleaf* On Mon, Nov 30, 2020 at 5:49 PM rjairam@gmail.com <rjairam@gmail.com> wrote:

...

Based on the architecture and the purpose of the app, without even looking, I can say fairly conclusively that it is the database. 10 hour processing signals bottlenecks in the database. We aren't doing intense computation (such as modeling, simulations or rendering video/audio) so that rules that out. We are simply matching records in a database.

I have seen this in similar apps. Database optimizations and query results generally yielded the biggest performance improvements.

But sure, let's put an APM tool on it like new relic or appdynamics to diagnose where it is, if there isn't one already. That will tell us conclusively. AppDynamics (now a Cisco product) is my personal favorite as I used to speak at their conferences (AppJam) and was one of their early customers, but New Relic also works well, and I've used both so I can attest to their effectiveness. If we have an APM, that should be the first place to look, so we don't keep guessing.

73 Ria, N2RJ

On Mon, Nov 30, 2020 at 5:36 PM Mickey Baker <fishflorida@gmail.com> wrote:

...
Since you're going deep, the first step is to measure the resources on

whatever platform it is today. I believe I've heard that it isn't virtualized, and that the database is (was?) an ancient database from SAP's R2 era that is no longer supported...

...
I'd like to see the resource starvation that is causing 10 hour

turnaround times. I worked for the storage vendor where LoTW was platformed in December 2012 and communicated with Keane - the application had saturated the storage system and he had ordered new servers and solid state storage, although the storage system was BROKEN, causing I/O delays and several day's loss of uploads. It was not under contract to repair, but I was authorized to offer a no-charge repair. He didn't take me up on it.

...
At the time, I told him that I thought that this was a temporary fix and

that he needed to take a closer look at the architecture so that all these lookups were using a more efficient algorithm.

...
I'm not prepared to judge what resource is the constraint. I would

suggest some simple measurements that would show what's going on, now, at the time we're seeing this, so we are able to make some valid assumptions.

...
This is not a predictable "big shop" environment like a bank or a

worldwide television network. We need to grab whatever data we can when we can. If this isn't MySQL, Oracle or MS SQL Server, we won't be able to purchase dbserver resources, but an early step would be to get everything on those databases.

...
... and I've seen issues like "scratchpad" or journaling IO cause delays

where fixing the DB server might not help.

...
What usually happens in small shops is that performance demand outstrips

capacity and applications queue. Response times on these queues cause upstream timeouts because of these queued requests. (Little's Law), things time out and then they fail.

...
Where?

Can you send me or point me to the IT Committee reports?

Thanks,

Mickey Baker, N4MB Palm Beach Gardens, FL “The servant-leader is servant first… It begins with the natural feeling

that one wants to serve, to serve first. Then conscious choice brings one to aspire to lead." Robert K. Greenleaf

...
On Mon, Nov 30, 2020 at 4:15 PM rjairam@gmail.com <rjairam@gmail.com>

wrote:

...
...
When I did this sort of thing (elections news coverage) we scaled up

the day before and set thresholds and auto dynamic scaling. Right now actually I do things like this for Black Friday and cyber Monday when billions of credit card swipes take place, but using private cloud.

...
However we needn’t go that far, YET.

An easy win would be to put the database on a managed database service

like AWS RDS. The web server isn’t usually the bottleneck. The database is. The application server isn’t even to blame because in our app we aren’t doing intensive compute. We are mostly doing database inserts and selects.

...
A managed database will have capacity and scaling completely

transparently and you don’t even have to think about this. Amazon just scales the database using its theoretically infinite resources.

...
Hopefully our attitude about cloud computing can change. When I

mentioned cloud to Mike Keane in 2019 he said that we shouldn’t bother because we had “sunk cost” into things like HVAC in HQ. Dave AA6YQ also said we should be investing in programmers and not hardware/cloud.

...
I did recommend a cloud database for LoTW in one of the IT infra

committee reports. Hopefully we can at least do that. I can almost guarantee that it will alleviate 90% of the post contest crashiness.

...
73 Ria N2RJ

On Mon, Nov 30, 2020 at 1:46 PM Mickey Baker <fishflorida@gmail.com>

wrote:

...
...
One of the issues David Minster hit on during the A&F meeting

corresponded to some posts I made years ago in the LoTW online group.

...
There is no reason that this system should NOT be running on a

cloud-based virtualized server. Even if portions of the design are single threaded, the ability to add processing power (memory, cpu, IO bandwidth) on the fly and in anticipation of these peak workload, is essential and CAN BE DONE QUICKLY without a total system re-write.

...
System capacity can likely be then bumped up 4x-10x around contest

weekends without purchasing gear.

...
Sticking it on a hardware server in a data center is prudent but not

smart with a workload that varies like this.

...
Corporations do this routinely with workload, such as month/year end

closings, trial balances, etc.

...
I have skills needed to design a plan to do this and will offer,

again, to DONATE time to analyze our current state develop a confidential plan.

...
One of the reasons I ran for this position is that the prior director

was clueless about League IT challenges. I know this is one of many, but it is “low hanging fruit” and can be addressed before a design of lotw 2.0 can be agreed upon.

...
73,

Mickey N4MB

On Mon, Nov 30, 2020 at 1:30 PM Roderick, Rick, K5UR via arrl-odv <

arrl-odv@reflector.arrl.org> wrote:

...
...
It's backlogged due to all of of the downloads from the CQWW contest

this past weekend. It was running 10 hours behind earlier this morning.

...
Rick

-----Original Message----- From: Minster, David NA2AA (CEO) <dminster@arrl.org> To: Tharp, Mark, KB7HDX (VD, NW) <kb7hdx@gmail.com>; arrl-odv <

arrl-odv@arrl.org>

...
Sent: Mon, Nov 30, 2020 12:26 pm Subject: [arrl-odv:31410] Re: LOTW Broke?

I just jumped on to see if I could replicate. Went through ARRL network and cellular network. Went into logs and awards. No errors.

I will forward onto the team in case this is the tip of the iceberg.

David

From: arrl-odv <arrl-odv-bounces@reflector.arrl.org> On Behalf Of Mark J Tharp Sent: Monday, November 30, 2020 1:22 PM To: arrl-odv <arrl-odv@arrl.org> Subject: [arrl-odv:31409] LOTW Broke?

Third time this morning I have seen this error.

Is it just me?

Mark, HDX _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

-- “Ends and beginnings—there are no such things. There are only middles.” Robert Frost _______________________________________________ arrl-odv mailing list arrl-odv@reflector.arrl.org https://reflector.arrl.org/mailman/listinfo/arrl-odv

1562

Age (days ago)

1563

Last active (days ago)

List overview

Download

7 comments

5 participants

participants (5)

k5ur＠aol.com
Mark J Tharp
Mickey Baker
Minster, David NA2AA (CEO)
rjairam＠gmail.com