Follow-up to Crashes with gpg-agent 2.1.18

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Follow-up to Crashes with gpg-agent 2.1.18

Shah, Amul

Please excuse any Outlook formatting oddities. I am re-sending this as it has not

shown up on the mailing list.

 

I think that this is a follow up to a thread from January (last email in thread

https://lists.gnupg.org/pipermail/gnupg-devel/2017-January/032516.html) because it

started with me seeing the message "Ohhhh jeeee: ... this is a bug

(sexp.c:1433:do_vsexp_sscan)" in the gpg-agent log. Now in trying to reproduce the

error, the gpg- agent hits a PKDECRYPT failure claiming that it cannot allocate

memory.

 

I tried debugging the error down into libgcrypt, but could not see where things

could go wrong unless the memory manager is not npthread safe. Since I made no

progress, I'm sending this mail along with a test case similar to the one from the

previous thread.

 

Note that we don't encounter this problem with older GnuPG versions, GnuPG 2.15

and earlier.

 

Could someone take a look at this or give me some pointers on how I can help

myself?

 

Thanks,

Amul

 

---- Host Information ----

Distributor ID: Debian

Description:    Debian GNU/Linux 9.0 (stretch)

Release:        9.0

Codename:       stretch

 

Package: gnupg2 Version: 2.1.18-6

Package: gnupg-agent    Version: 2.1.18-6

Package: libgcrypt20    Version: 1.7.6-1

Package: libgpgme11     Version: 1.8.0-3+b2

Package: libnpth0       Version: 1.3-1

Package: libassuan0     Version: 2.4.3-2

 

---- Error Information ----

2017-03-30 19:34:43 gpg-agent[745886] starting a new PIN Entry

2017-03-30 19:34:43 gpg-agent[745886] failed to build S-Exp (off=0): Cannot allocate memory

2017-03-30 19:34:43 gpg-agent[745886] failed to read the secret key

2017-03-30 19:34:43 gpg-agent[745886] command 'PKDECRYPT' failed: Cannot allocate memory <gcrypt>

2017-03-30 19:34:43 gpg-agent[745886] failed to build S-Exp (off=0): Cannot allocate memory

2017-03-30 19:34:43 gpg-agent[745886] failed to read the secret key

2017-03-30 19:34:44 gpg-agent[745886] handler 0x7ff368f81700 for fd 184 started

2017-03-30 19:34:44 gpg-agent[745886] Warning: using insecure memory!

2017-03-30 19:34:44 gpg-agent[745886] retrieving cache entry '29395A5EFE485B83873C2B5435FCB4DC482765FD' failed: Invalid cipher mode

--

2017-03-30 19:34:44 gpg-agent[745886] retrieving cache entry '29395A5EFE485B83873C2B5435FCB4DC482765FD' failed: Invalid cipher mode

2017-03-30 19:34:44 gpg-agent[745886] command 'PKDECRYPT' failed: Cannot allocate memory <gcrypt>

2017-03-30 19:34:44 gpg-agent[745886] retrieving cache entry '29395A5EFE485B83873C2B5435FCB4DC482765FD' failed: Invalid cipher mode

 

---- Test case ----

echo "# Setup agent config to avoid interactive password prompting"

echo "verbose" >> $GNUPGHOME/gpg-agent.conf

echo "log-file $GNUPGHOME/agent.log" >> $GNUPGHOME/gpg-agent.conf

echo "pinentry-program $GNUPGHOME/pinentry.csh" >> $GNUPGHOME/gpg-agent.conf

 

cat >$GNUPGHOME/pinentry.csh << EOF

#!/bin/tcsh

set word = ""

while (("\$word" != "getpin") && ("\$word" != "GETPIN"))  echo "OK"

set word = "\$<"

end

echo "D abc123"

echo "OK"

EOF

chmod a+rx $GNUPGHOME/pinentry.csh

 

yes | gpg2 --passphrase abc123 --quick-gen-key [hidden email]

 

echo # Encrypt a file

rm -f test.gpg

echo testme > test

gpg2 -e -r [hidden email] test

 

echo "# This had better work"

gpg2 -d test.gpg || exit 9

 

echo "# Hit the agent hard"

for i in $(seq 1 200); do (gpg2 -d test.gpg > test${i}.log 2>&1 & ); done wait gpg-connect-agent 'getinfo pid' /bye sleep 1 gpg-connect-agent killagent /bye grep 'fail' *.log grep -C1 --color -E "jee|memory" $GNUPGHOME/agent.log

 

echo "Doit!"

echo "rm -rf $GNUPGHOME"

gpg-connect-agent killagent /bye

 

---- Full test case output ----

# Do Setup

# Setup agent config to avoid interactive password prompting

gpg: keybox '/tmp/tmp.t40xj0I2lY/pubring.kbx' created We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the

disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy.

We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the

disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy.

gpg: /tmp/tmp.t40xj0I2lY/trustdb.gpg: trustdb created

gpg: key 4DB73400DF6568DF marked as ultimately trusted

gpg: directory '/tmp/tmp.t40xj0I2lY/openpgp-revocs.d' created

gpg: revocation certificate stored as '/tmp/tmp.t40xj0I2lY/openpgp-revocs.d/61F7B673B3ABCE0A308314AE4DB73400DF6568DF.rev'

public and secret key created and signed.

 

pub   rsa2048 2017-03-30 [SC] [expires: 2019-03-30]

      61F7B673B3ABCE0A308314AE4DB73400DF6568DF

      61F7B673B3ABCE0A308314AE4DB73400DF6568DF

uid                      [hidden email]

sub   rsa2048 2017-03-30 [E]

 

 

gpg: checking the trustdb

gpg: marginals needed: 3  completes needed: 1  trust model: pgp

gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u

gpg: next trustdb check due at 2019-03-30 # This had better work

gpg: encrypted with 2048-bit RSA key, ID 5941F75AFA9801E6, created 2017-03-30

      "[hidden email]"

testme

# Hit the agent hard

D 745886

OK

OK closing connection

test25.log:gpg: public key decryption failed: Cannot allocate memory

test25.log:gpg: decryption failed: No secret key

test7.log:gpg: public key decryption failed: Cannot allocate memory

test7.log:gpg: decryption failed: No secret key

2017-03-30 19:34:43 gpg-agent[745886] starting a new PIN Entry

2017-03-30 19:34:43 gpg-agent[745886] failed to build S-Exp (off=0): Cannot allocate memory

2017-03-30 19:34:43 gpg-agent[745886] failed to read the secret key

2017-03-30 19:34:43 gpg-agent[745886] command 'PKDECRYPT' failed: Cannot allocate memory <gcrypt>

2017-03-30 19:34:43 gpg-agent[745886] failed to build S-Exp (off=0): Cannot allocate memory

2017-03-30 19:34:43 gpg-agent[745886] failed to read the secret key

2017-03-30 19:34:44 gpg-agent[745886] handler 0x7ff368f81700 for fd 184 started

2017-03-30 19:34:44 gpg-agent[745886] Warning: using insecure memory!

2017-03-30 19:34:44 gpg-agent[745886] retrieving cache entry '29395A5EFE485B83873C2B5435FCB4DC482765FD' failed: Invalid cipher mode

--

2017-03-30 19:34:44 gpg-agent[745886] retrieving cache entry '29395A5EFE485B83873C2B5435FCB4DC482765FD' failed: Invalid cipher mode

2017-03-30 19:34:44 gpg-agent[745886] command 'PKDECRYPT' failed: Cannot allocate memory <gcrypt>

2017-03-30 19:34:44 gpg-agent[745886] retrieving cache entry '29395A5EFE485B83873C2B5435FCB4DC482765FD' failed: Invalid cipher mode Doit!

rm -rf /tmp/tmp.t40xj0I2lY

OK closing connection

 

 

The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

RE: Follow-up to Crashes with gpg-agent 2.1.18

Shah, Amul
> From: Shah, Amul Sent: Wednesday, April 05, 2017 11:24 AM
> Subject: Follow-up to Crashes with gpg-agent 2.1.18
>
> I think that this is a follow up to a thread from January (last email in thread
> https://lists.gnupg.org/pipermail/gnupg-devel/2017-January/032516.html) because it
> started with me seeing the message "Ohhhh jeeee: ... this is a bug
> (sexp.c:1433:do_vsexp_sscan)" in the gpg-agent log. Now in trying to reproduce the
> error, the gpg- agent hits a PKDECRYPT failure claiming that it cannot allocate
> memory.
>
> I tried debugging the error down into libgcrypt, but could not see where things
> could go wrong unless the memory manager is not npthread safe. Since I made no
> progress, I'm sending this mail along with a test case similar to the one from the
> previous thread.
>
> Note that we don't encounter this problem with older GnuPG versions, GnuPG 2.15
> and earlier.
>
> Could someone take a look at this or give me some pointers on how I can help
> myself?

[amul:1] No hints or suggestions? I guess I'll file a bug report instead and downgrade my
Software.

Amul
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

RE: Follow-up to Crashes with gpg-agent 2.1.18

Daniel Kahn Gillmor-7
On Mon 2017-04-10 16:10:20 +0000, Shah, Amul wrote:

>> From: Shah, Amul Sent: Wednesday, April 05, 2017 11:24 AM
>> Subject: Follow-up to Crashes with gpg-agent 2.1.18
>>
>> I think that this is a follow up to a thread from January (last email in thread
>> https://lists.gnupg.org/pipermail/gnupg-devel/2017-January/032516.html) because it
>> started with me seeing the message "Ohhhh jeeee: ... this is a bug
>> (sexp.c:1433:do_vsexp_sscan)" in the gpg-agent log. Now in trying to reproduce the
>> error, the gpg- agent hits a PKDECRYPT failure claiming that it cannot allocate
>> memory.
>>
>> I tried debugging the error down into libgcrypt, but could not see where things
>> could go wrong unless the memory manager is not npthread safe. Since I made no
>> progress, I'm sending this mail along with a test case similar to the one from the
>> previous thread.
>>
>> Note that we don't encounter this problem with older GnuPG versions, GnuPG 2.15
>> and earlier.
>>
>> Could someone take a look at this or give me some pointers on how I can help
>> myself?
>
> [amul:1] No hints or suggestions? I guess I'll file a bug report instead and downgrade my
> Software.

Could you try with the version in debian experimental (2.1.20)?  I
believe the "Ohhh jeeee" crash was fixed upstream by commit
e175152ef7515921635bf1e00383e812668d13fc

        --dkg

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Follow-up to Crashes with gpg-agent 2.1.18

Shah, Amul
> From: Daniel Kahn Gillmor [mailto:[hidden email]] Sent: Monday, April 10, 2017 3:55 PM
>
> On Mon 2017-04-10 16:10:20 +0000, Shah, Amul wrote:
> >> From: Shah, Amul Sent: Wednesday, April 05, 2017 11:24 AM
> >> Subject: Follow-up to Crashes with gpg-agent 2.1.18
> >>
> >> I think that this is a follow up to a thread from January (last email
> >> in thread
> >>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> >> sts.gnupg.org%2Fpipermail%2Fgnupg-devel%2F2017-
> January%2F032516.html&
> >>
> data=01%7C01%7CAmul.Shah%40fisglobal.com%7Cdcd940ffdc2e4e47cf0608
> d480
> >>
> 4b87dc%7Ce3ff91d834c84b15a0b418910a6ac575%7C0&sdata=qqETArFrGjgm
> gOQFu
> >> 30JIawLjopx0VegGTzapNmtxco%3D&reserved=0) because it started with
> me seeing the message "Ohhhh jeeee: ... this is a bug
> (sexp.c:1433:do_vsexp_sscan)" in the gpg-agent log. Now in trying to
> reproduce the error, the gpg- agent hits a PKDECRYPT failure claiming that it
> cannot allocate memory.
> >>
> >> I tried debugging the error down into libgcrypt, but could not see
> >> where things could go wrong unless the memory manager is not
> npthread
> >> safe. Since I made no progress, I'm sending this mail along with a
> >> test case similar to the one from the previous thread.
> >>
> >> Note that we don't encounter this problem with older GnuPG versions,
> >> GnuPG 2.15 and earlier.
> >>
> >> Could someone take a look at this or give me some pointers on how I
> >> can help myself?
> >
> > [amul:1] No hints or suggestions? I guess I'll file a bug report
> > instead and downgrade my Software.
>
> Could you try with the version in debian experimental (2.1.20)?  I believe
> the "Ohhh jeeee" crash was fixed upstream by commit
> e175152ef7515921635bf1e00383e812668d13fc
>
>         --dkg

I didn't try 2.20 since I didn't see anything promising in git log. I will give it
a try since you asked. Next time I write back, I'll try to include some more
useful details like a stack trace of all threads at the point of failure.

Debian 9's gnupg-2.18-6 has that patch (see below). I thought you were the
one who put it in. :)

$ apt-get source gnupg
$ cd gnupg2-2.1.18
$ cat debian/patches/0017-agent-Fix-double-free.patch
From: Justus Winter <[hidden email]>
Date: Wed, 25 Jan 2017 13:51:57 +0100
Subject: agent: Fix double free.

* agent/cache.c (agent_store_cache_hit): Make sure the update is
atomic.
--
Previously, the function freed the last key, and duplicated the new
key after doing that.  There is a chance, however, that calling the
allocator surrenders control to a different thread, causing a double
free if a different thread also calls this function.

To make sure the update is atomic under the non-preemptive thread
model, we must make sure not to surrender control to a different
thread.  Therefore, we avoid calling the allocator during the
update.

Signed-off-by: Justus Winter <[hidden email]>
(cherry picked from commit e175152ef7515921635bf1e00383e812668d13fc)
....

Thanks,
Amul
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

RE: Follow-up to Crashes with gpg-agent 2.1.18

Shah, Amul
> From: Shah, Amul Sent: Monday, April 10, 2017 4:32 PM
> > From: Daniel Kahn Gillmor [mailto:[hidden email]] Sent: Monday,
> > April 10, 2017 3:55 PM
> >
> > On Mon 2017-04-10 16:10:20 +0000, Shah, Amul wrote:
> > >> From: Shah, Amul Sent: Wednesday, April 05, 2017 11:24 AM
> > >> Subject: Follow-up to Crashes with gpg-agent 2.1.18
> > >>
> > >> I think that this is a follow up to a thread from January (last
> > >> email in thread https://lists.gnupg.org/pipermail/gnupg-devel/2017-January/032486.html)
> > because it started with me seeing the message "Ohhhh jeeee: ... this is a bug
> > (sexp.c:1433:do_vsexp_sscan)" in the gpg-agent log. Now in trying to
> > reproduce the error, the gpg- agent hits a PKDECRYPT failure claiming
> > that it cannot allocate memory.
> > >>
> > >> I tried debugging the error down into libgcrypt, but could not see
> > >> where things could go wrong unless the memory manager is not
> > npthread
> > >> safe. Since I made no progress, I'm sending this mail along with a
> > >> test case similar to the one from the previous thread.
> > >>
> > >> Note that we don't encounter this problem with older GnuPG
> > >> versions, GnuPG 2.15 and earlier.
> > >>
> > >> Could someone take a look at this or give me some pointers on how I
> > >> can help myself?
> > >
> > > [amul:1] No hints or suggestions? I guess I'll file a bug report
> > > instead and downgrade my Software.
> >
> > Could you try with the version in debian experimental (2.1.20)?  I
> > believe the "Ohhh jeeee" crash was fixed upstream by commit
> > e175152ef7515921635bf1e00383e812668d13fc
> >
> >         --dkg
>
> I didn't try 2.20 since I didn't see anything promising in git log. I will
> give it a try since you asked. Next time I write back, I'll try to include
> some more useful details like a stack trace of all threads at the point of
> failure.
>
> Debian 9's gnupg-2.18-6 has that patch (see below). I thought you were the
> one who put it in. :)
> $ apt-get source gnupg
> $ cd gnupg2-2.1.18
> $ cat debian/patches/0017-agent-Fix-double-free.patch
> From: Justus Winter <[hidden email]>
> Date: Wed, 25 Jan 2017 13:51:57 +0100
> Subject: agent: Fix double free.
[snip]
[amul]

No luck with GPG 2.1.20. I turned up the agent logging all the way
(--debug-level guru --debug-all) and repeated my testing. In the last
few lines we see the bug message.

  13817 2017-04-11 16:54:29 gpg-agent[973941] DBG: rsa_decrypt    => Success
  13818 2017-04-11 16:54:29 gpg-agent[973941] DBG: plain: [data="\x02\xc1-\xd1'\xaa\x13\x96`\x0ebH\x8d\xf0\x18\xc3q\xf6\xcaw\x1da\x8d\xe1\xcc\x9a\xc0\xe5\xfa\xf9\xa2\x14\xea!\x98\xaa1\xad\xe5A\xf4\xe1sXc\xb6b\x
  13819 2017-04-11 16:54:29 gpg-agent[973941] DBG: chan_9 -> [ 44 20 28 35 3a 76 61 6c 75 65 32 35 35 3a 02 c1 ...(261 byte(s) skipped) ]
  13820 2017-04-11 16:54:29 gpg-agent[973941] DBG: chan_9 -> OK
  13821 2017-04-11 16:54:29 gpg-agent[973941] DBG: chan_9 <- [eof]
  13822 2017-04-11 16:54:29 gpg-agent[973941] DBG: rsa_decrypt  res:+02c12dd127aa1396600e62488df018c371f6ca771d618de1cc9ac0e5faf9a214 \
  13823 2017-04-11 16:54:29 gpg-agent[973941] DBG:                   ea2198aa31ade541f4e1735863b6628d840e471f654163ed444097553755db79 \
  13824 2017-04-11 16:54:29 gpg-agent[973941] DBG:                   739f476f56fa57400333a7612198865275e41a7ab427434df30d931e272322c3 \
  13825 2017-04-11 16:54:29 gpg-agent[973941] DBG:                   a7c722a9b524c62ba90837f39627bce1afe26050e9a55e9d52af5135d876c9f3 \
  13826 2017-04-11 16:54:29 gpg-agent[973941] DBG:                   3489d0ec4295c515813f2cb866f1424010b9a24e5d0e7ee40ec833d75229488f \
  13827 2017-04-11 16:54:29 gpg-agent[973941] DBG:                   67473945b2484cb2e1448c9f278fff09bf3e5507cb1067a32d439ca82c229ac4 \
  13828 2017-04-11 16:54:29 gpg-agent[973941] DBG:                   b96c82419e1a0c74c8a64fd0f5cdade98464a5f4a95a3beb4f888a0009e61826 \
  13829 2017-04-11 16:54:29 gpg-agent[973941] DBG:                   d423ecd9250af4bdb50ebd83e64309228d3687d7f9e9919fa1ecd8a3431195
  13830 2017-04-11 16:54:29 gpg-agent[973941] DBG: rsa_decrypt    => Success
  13831 2017-04-11 16:54:29 gpg-agent[973941] DBG: plain: [data="\x02\xc1-\xd1'\xaa\x13\x96`\x0ebH\x8d\xf0\x18\xc3q\xf6\xcaw\x1da\x8d\xe1\xcc\x9a\xc0\xe5\xfa\xf9\xa2\x14\xea!\x98\xaa1\xad\xe5A\xf4\xe1sXc\xb6b\x
  13832 2017-04-11 16:54:29 gpg-agent[973941] DBG: chan_50 -> [ 44 20 28 35 3a 76 61 6c 75 65 32 35 35 3a 02 c1 ...(261 byte(s) skipped) ]
  13833 2017-04-11 16:54:29 gpg-agent[973941] DBG: chan_50 -> OK
  13834 2017-04-11 16:54:29 gpg-agent[973941] DBG: chan_50 <- [eof]
  13835 2017-04-11 16:54:29 gpg-agent[973941] DBG: rsa_decrypt  res: [out of core]
  13836 2017-04-11 16:54:29 gpg-agent[973941] Ohhhh jeeee: ... this is a bug (../../src/sexp.c:1433:do_vsexp_sscan)
(I can share the full log if that helps)

I drilled into libgcrypt and saw that there is a lock, use_m_guard, but
the threaded gpg-agent is not using it. Not using the memory manager's
thread lock looks like a problem. I don't see where the current code
would enable the memory guard (via GCRYCTL_ENABLE_M_GUARD).
Am I barking up the wrong tree?

Amul
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Matthew Summers
In reply to this post by Shah, Amul
On Mon, Apr 10, 2017 at 3:31 PM, Shah, Amul <[hidden email]> wrote:
>
> > Could you try with the version in debian experimental (2.1.20)?  I believe
> > the "Ohhh jeeee" crash was fixed upstream by commit
> > e175152ef7515921635bf1e00383e812668d13fc


Hello,

I have been experiencing this problem for roughly a week and a half.
Although that specific error (do_vsexp_sscan) appears intermittently,
I can very reliably crash gpg-agent, cause a race that results in a
second pin-entry, and/or cause memory allocation errors.

The trick to reproducing this is to employ a high level of parallelism
when calling `gpg -d` against some small example cipher text.
I do this sort of thing, where the recipient is a new test id I
created, 4096 bit RSA:

$ echo "test" | gpg -aer [hidden email] -o gpg.txt
$ yes gpg.txt | head -100 | xargs -n 1 -P 5 gpg --decrypt -q  # 5
processes works reliably
$ yes gpg.txt | head -100 | xargs -n 1 -P 10 gpg --decrypt -q   # 10
processes fails 50% of the time
$ yes gpg.txt | head -100 | xargs -n 1 -P 100 gpg --decrypt -q  # 100
processes fails 100%

For versions, I started experiencing this problem with gnupg-2.1.18,
but only after upgrading to nettle-3.3 and glib-2.50.3 (gnutls-3.3.26
was consistent). I later upgrade gnupg to 2.1.20 and the issue
persisted. One interesting thing is that I was using 2.1.18 with great
success in handling highly parallel tasks until the time of these
upgrades. It is no small thing to track down every library that gnupg
utilizes for the agent, so any help here would be wonderful.

I am also happy to provide any further details regarding the test env
I am using here, which is based on Gentoo. I intend to continue
debugging as time permits.

Thanks!
Matt Summers

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

NIIBE Yutaka
Hello,

Matthew Summers <[hidden email]> writes:
> The trick to reproducing this is to employ a high level of parallelism
> when calling `gpg -d` against some small example cipher text.
[...]
> For versions, I started experiencing this problem with gnupg-2.1.18,
> but only after upgrading to nettle-3.3 and glib-2.50.3 (gnutls-3.3.26
> was consistent). I later upgrade gnupg to 2.1.20 and the issue
> persisted. One interesting thing is that I was using 2.1.18 with great
> success in handling highly parallel tasks until the time of these
> upgrades. It is no small thing to track down every library that gnupg
> utilizes for the agent, so any help here would be wonderful.

Thank you for your information.

GnuPG doesn't use nettle and glib, so, I wonder if it's related.

Recently, after 2.1.20, I fixed a race in gpg-agent, which is
possible to cause double free (and crash).

    https://dev.gnupg.org/T3027

This may be related.
--

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Matthew Summers
On Thu, Apr 20, 2017 at 8:19 PM, NIIBE Yutaka <[hidden email]> wrote:

>
> Thank you for your information.
>
> GnuPG doesn't use nettle and glib, so, I wonder if it's related.
>
> Recently, after 2.1.20, I fixed a race in gpg-agent, which is
> possible to cause double free (and crash).
>
>     https://dev.gnupg.org/T3027
>
> This may be related.


That explanation in https://dev.gnupg.org/T3027#95879 sounds very
accurate based on what I have seen. I'll give the patch a shot!

The dependency chain for nettle is via gnutls, though I admit to not
attempting to identify what dependency that might be. As far as glib,
I am not sure why I thought that may be involved somehow ... maybe a
socket routine or something. I apologize for incompleteness here, I
have been time limited in debugging this issue as thoroughly as I
would like to.

Thanks!
MattS

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Matthew Summers
On Fri, Apr 21, 2017 at 10:27 AM, Matthew Summers
<[hidden email]> wrote:

> On Thu, Apr 20, 2017 at 8:19 PM, NIIBE Yutaka <[hidden email]> wrote:
>>
>> Thank you for your information.
>>
>> GnuPG doesn't use nettle and glib, so, I wonder if it's related.
>>
>> Recently, after 2.1.20, I fixed a race in gpg-agent, which is
>> possible to cause double free (and crash).
>>
>>     https://dev.gnupg.org/T3027
>>
>> This may be related.
>
>
> That explanation in https://dev.gnupg.org/T3027#95879 sounds very
> accurate based on what I have seen. I'll give the patch a shot!



Testing that patch in the ticket above, I am still encountering
failures with parallelism greater than the number of CPU "cores" on my
machine (8 cores). The behavior is somewhat interesting.

On occasions I can force an error that looks like:

gpg-agent[15942] failed to build S-Exp (off=0): Cannot allocate memory
gpg-agent[15942] failed to read the secret key
gpg-agent[15942] command 'PKDECRYPT' failed: Cannot allocate memory <gcrypt>
gpg-agent[15942] DBG: chan_16 -> ERR 16810070 Cannot allocate memory <gcrypt>

in this case, decryption totally fails. Whereas on other occasions I
just get a new pin-entry pop up, with decryption succeeding in the
end.

It seems like there is a race for the socket, somehow, but I am not
sure how to verify this.

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Matthew Summers
On Fri, Apr 21, 2017 at 11:55 AM, Matthew Summers
<[hidden email]> wrote:
> Testing that patch in the ticket above, I am still encountering
> failures

Hi there, any updates on this lately? I'd really love to test a patch
that would apply against the 2.1.20 release.

Thanks,
Matt

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Matthew Summers
WIth the latest release I still experience errors with parallel
decryption (10 in parallel) using a 4096 bit RSA key.

```
GPG fails with: gpg: decryption failed: No secret key
```

GPG-Agent displays the following error msgs (log level guru) for ~15 -
25% of the decrypt operations.

```
DBG: rsa_decrypt    => Cannot allocate memory
decryption failed: Cannot allocate memory
command 'PKDECRYPT' failed: Cannot allocate memory <gcrypt>
DBG: chan_19 -> ERR 16810070 Cannot allocate memory <gcrypt>
```

If I push the parallelism up to 50, I can basically destroy the agent,
98% of decrypt operations fail, where it crashes in a similar manner
as we see here:
https://lists.gnutls.org/pipermail/gnupg-users/2017-March/057940.html

gpg-agent[2191] command 'PKDECRYPT' failed: Cannot allocate memory <gcrypt>
gpg-agent[2191] DBG: chan_15 -> ERR 16810070 Cannot allocate memory <gcrypt>
gpg-agent[2191] DBG: chan_15 <- [eof]
gpg-agent[2191] DBG: rsa_decrypt  res: [out of core]
gpg-agent[2191] Ohhhh jeeee: ... this is a bug (sexp.c:1433:do_vsexp_sscan)

At the time of the error there is at least 10GB of free RAM available.
I have also increased the ulimits to the following (basically
unlimited for everything).
data seg size unlimited
max locked memory unlimted
max memory size unlimited
stack size unlimited
virtual memory unlimited

gpg (GnuPG) 2.1.21
libgcrypt 1.7.6
npth 1.4

It may be notable that when using a smartcard (yubikey4 or nitrokey),
it appears that all decrypt operations are serialized and we see no
failures. None.

We are testing in the following manner (default-recipient is a local
4096 rsa key)
```
echo test | gpg -aer -o gpg.asc
yes gpg.asc | head -n 50 | xargs -n 1 -P 50 gpg -qd
```
We change the number 50 in both places to alter the level of parallelism.


Any help here would be greatly appreciated.

Kind Regards,
Matt Summers

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

NIIBE Yutaka
Hello,

Thank you for your keeping this issue.  We don't ignore problem, I work
for similar bug report of: https://dev.gnupg.org/T3027

For original report of crash of this thread, I think that it is the same
thing.

I have been trying fix the problem of crash itself, and I fixed several
bugs in gpg-agent/npth/libgcrypt.  I believe that I identified the cause
of crash, and proposed a particular fix for libgcrypt (in D431).

So, I think that I know some.  For crash itself, the problem will be
gone, by next release of nPth (for macOS) and libgcrypt.

For "Cannot allocate memory" error, we need another approach like
throttling, I think.  I mean, we need to regulate rate of accepting
requests.

Matthew Summers <[hidden email]> wrote:

> ```
> GPG fails with: gpg: decryption failed: No secret key
> ```
>
> GPG-Agent displays the following error msgs (log level guru) for ~15 -
> 25% of the decrypt operations.
>
> ```
> DBG: rsa_decrypt    => Cannot allocate memory
> decryption failed: Cannot allocate memory
> command 'PKDECRYPT' failed: Cannot allocate memory <gcrypt>
> DBG: chan_19 -> ERR 16810070 Cannot allocate memory <gcrypt>
> ```

For current implementation, it is (somehow) expected for many requests
(in the situation with no smartcard).

> At the time of the error there is at least 10GB of free RAM available.

The limitation comes from the fact we only have 32KB or 64KB for secure
memory; The region is mlock(2)-ed to avoid data transfer to swap
storage.  ... even if we have multiple giga bytes of memory.

> It may be notable that when using a smartcard (yubikey4 or nitrokey),
> it appears that all decrypt operations are serialized and we see no
> failures. None.

Yes.

Perhaps, it would be good for libgcrypt to have an API for an
application (in this case gpg-agent) showing how much secure memory is
used (and how much left), so that an application can throttle accepting
requests.

Or, more easier, we can introduce a limitation to gpg-agent, say, only
accepts some fixed number of multiple connections simultaneously.
--

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Daniel Kahn Gillmor-7
On Fri 2017-06-02 08:31:31 +0900, NIIBE Yutaka wrote:

>> At the time of the error there is at least 10GB of free RAM available.
>
> The limitation comes from the fact we only have 32KB or 64KB for secure
> memory; The region is mlock(2)-ed to avoid data transfer to swap
> storage.  ... even if we have multiple giga bytes of memory.

is this a reasonable conclusion today?  shouldn't swap be encrypted on
machines where it's a risk?  not to be leakage-nihilist or anything, but
hibernation itself can't be prevented at the application level, right?
So these "out of core" crashes just gratuitous outages?

> Perhaps, it would be good for libgcrypt to have an API for an
> application (in this case gpg-agent) showing how much secure memory is
> used (and how much left), so that an application can throttle accepting
> requests.
>
> Or, more easier, we can introduce a limitation to gpg-agent, say, only
> accepts some fixed number of multiple connections simultaneously.

what would gpg-agent do when the limit is reached?  it would be better
if a thread could perform a blocking call when asking for more core, so
that the thread requesting the limited resource was the thing blocked,
and other threads could go about their business.

     --dkg

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

NIIBE Yutaka
Daniel Kahn Gillmor <[hidden email]> wrote:

> On Fri 2017-06-02 08:31:31 +0900, NIIBE Yutaka wrote:
>
>>> At the time of the error there is at least 10GB of free RAM available.
>>
>> The limitation comes from the fact we only have 32KB or 64KB for secure
>> memory; The region is mlock(2)-ed to avoid data transfer to swap
>> storage.  ... even if we have multiple giga bytes of memory.
>
> is this a reasonable conclusion today?  shouldn't swap be encrypted on
> machines where it's a risk?  not to be leakage-nihilist or anything, but
> hibernation itself can't be prevented at the application level, right?
> So these "out of core" crashes just gratuitous outages?

I described the current implementation, why it occurs.

> what would gpg-agent do when the limit is reached?  it would be better
> if a thread could perform a blocking call when asking for more core, so
> that the thread requesting the limited resource was the thing blocked,
> and other threads could go about their business.

I think that simpler is better, here, to avoid dead locks; I get half,
you get half, when both need more, both wait forever.

In my opinion, the current implementation of crypto with MPI could be
improved.  Currently, it allocates memory during its computation.  It
would be good if it can allocate all memory needed for the computation
for a particular crypto method, in advance, atomically (or block).
--

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Matthew Summers
In reply to this post by NIIBE Yutaka
On Thu, Jun 1, 2017 at 6:31 PM, NIIBE Yutaka <[hidden email]> wrote:
> The limitation comes from the fact we only have 32KB or 64KB for secure
> memory; The region is mlock(2)-ed to avoid data transfer to swap
> storage.  ... even if we have multiple giga bytes of memory.

Hi, huge thanks for the effort here! I can see a definite improvement
in stability with the newly released libs.

I ask a question of you:

This mlock(2)'d region, is this something that could be made a
run-time setting or even compile time changed in configure?  I would
be keenly interested to know if allocating a larger mlock(2)'d region
would address the problem I have.

I understand that it's often needed and desirable to restrict memory
regions to smaller sizes. It's probably true that for most users,
highly parallelized decryption operations are rare. However, it's not
difficult to demonstrate use cases where it's important to handle
highly parallelized requests to gpg-agent. At present, if I make too
many calls too fast to gpg-agent, I can crash it very easily.

Maybe I can figure out how to set this AC_DEFINE HAVE_BROKEN_MLOCK,
seen in acinclude.m4, Presumably, this disables the secure region.
I am already building with `--enable-large-secmem` so I assume it's at 64kb now.

Anyway, thanks again!
Matt

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Matthew Summers
On Sun, Jun 4, 2017 at 10:35 PM, Matthew Summers
<[hidden email]> wrote:
> `--enable-large-secmem` so I assume it's at 64kb now.

Ha, I wonder if this will take care of things

diff --git a/configure b/configure
index 61a111e..96850dd 100755
--- a/configure
+++ b/configure
@@ -5536,7 +5536,7 @@ fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $large_secmem" >&5
$as_echo "$large_secmem" >&6; }
if test "$large_secmem" = yes ; then
-   SECMEM_BUFFER_SIZE=65536
+   SECMEM_BUFFER_SIZE=262144
else
   SECMEM_BUFFER_SIZE=32768
fi

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Matthew Summers
On Sun, Jun 4, 2017 at 10:41 PM, Matthew Summers
<[hidden email]> wrote:

> On Sun, Jun 4, 2017 at 10:35 PM, Matthew Summers
> <[hidden email]> wrote:
>> `--enable-large-secmem` so I assume it's at 64kb now.
>
> Ha, I wonder if this will take care of things
>
> diff --git a/configure b/configure
> index 61a111e..96850dd 100755
> --- a/configure
> +++ b/configure
> @@ -5536,7 +5536,7 @@ fi
> { $as_echo "$as_me:${as_lineno-$LINENO}: result: $large_secmem" >&5
> $as_echo "$large_secmem" >&6; }
> if test "$large_secmem" = yes ; then
> -   SECMEM_BUFFER_SIZE=65536
> +   SECMEM_BUFFER_SIZE=262144
> else
>    SECMEM_BUFFER_SIZE=32768
> fi

Using SECMEM_BUFFER_SIZE=1048576 I can run 100 parallel decrypt
operations again without crashes!
(i.e. yes gpg.txt.asc | head -n 50 | xargs -n 1 -P 50 gpg -qd)

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Daniel Kahn Gillmor-7
In reply to this post by Matthew Summers
On Sun 2017-06-04 22:35:59 -0500, Matthew Summers wrote:
> It's probably true that for most users, highly parallelized decryption
> operations are rare. However, it's not difficult to demonstrate use
> cases where it's important to handle highly parallelized requests to
> gpg-agent.

It would really help this conversation to document a few of these
real-world use cases.  I appreciate that your demonstration scripts are
narrow and targeted because it helps to isolate the problem, but having
clear real-world cases will help in motivating the fix to actually
get deployed.

Where are highly-parallelized requests to gpg-agent likely to happen?

        --dkg

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel

signature.asc (847 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Follow-up to Crashes with gpg-agent 2.1.18

Matthew Summers
On Mon, Jun 5, 2017 at 3:40 PM, Daniel Kahn Gillmor
<[hidden email]> wrote:

> On Sun 2017-06-04 22:35:59 -0500, Matthew Summers wrote:
>> It's probably true that for most users, highly parallelized decryption
>> operations are rare. However, it's not difficult to demonstrate use
>> cases where it's important to handle highly parallelized requests to
>> gpg-agent.
>
> It would really help this conversation to document a few of these
> real-world use cases.  I appreciate that your demonstration scripts are
> narrow and targeted because it helps to isolate the problem, but having
> clear real-world cases will help in motivating the fix to actually
> get deployed.
>
> Where are highly-parallelized requests to gpg-agent likely to happen?

I am delighted to outline a scenario. First some background.

There is an open source config management and remote execution tool
written in python called Salt. It let's the user store information
formatted as yaml. It's useful to store secrets in these yaml files.
It's good to encrypt these secrets [1] when storing them, encryption
at rest being good and all that. This yaml can be rendered for runtime
use in the config management system.

So, for example, let's say we have secret data encrypted with the
entire Ops team as recipients, like an API token or something larger
like private TLS key material. Further, let's say we need to put this
secret onto 10000 systems in batches of 100.

Salt provides a handy mechanism to make such a thing simple, via ssh
even [2], if desired. With a few lines of code, our gpg'd secret, and
a small bit of setup, the we get to a point where the one-liner
`salt-ssh -E 'myhosts' state.apply update-special-secret` will do the
trick. That is, it will initiate 100 processes that each decrypt the
secret, render out the execution data, ssh to the remote host,
execute, and return relevant information to us, then pickup the next
batch, and so on.

The numbers here, of 100 or 1000, are totally arbitrary, but
sufficiently large to demonstrate the scale. On a high CPU count
system with lots of memory, it would not be terribly crazy to push
these numbers much higher. In practice, with the secmem size set to
64k, we would run into issues withe batch sizes of 10.

I hope this is clear and helps. I could probably put together a demo
of this, though I am not sure how to simulate 1000 hosts really nicely
(read cheaply and easily).  Ansible or Puppet can do similar things
too, using GPG to store/decrypt secrets data.

Please feel free to ask questions if what I have outlined above needs
further explanation.

Thanks again!
Matt S.

[1] https://docs.saltstack.com/en/latest/ref/renderers/all/salt.renderers.gpg.html
[2] https://docs.saltstack.com/en/latest/topics/ssh/#running-salt-ssh-as-non-root-user
[3] https://docs.saltstack.com/en/latest/topics/pillar/#the-pillar-get-function

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel
Reply | Threaded
Open this post in threaded view
|

RE: Follow-up to Crashes with gpg-agent 2.1.18

Shah, Amul
-----Original Message-----
From: Gnupg-devel [mailto:[hidden email]] On Behalf Of Matthew Summers Sent: Tuesday, June 06, 2017 1:43 AM

>On Mon, Jun 5, 2017 at 3:40 PM, Daniel Kahn Gillmor <[hidden email]> wrote:
>> On Sun 2017-06-04 22:35:59 -0500, Matthew Summers wrote:
>>> It's probably true that for most users, highly parallelized
>>> decryption operations are rare. However, it's not difficult to
>>> demonstrate use cases where it's important to handle highly
>>> parallelized requests to gpg-agent.
>>
>> It would really help this conversation to document a few of these
>> real-world use cases.  I appreciate that your demonstration scripts
>> are narrow and targeted because it helps to isolate the problem, but
>> having clear real-world cases will help in motivating the fix to
>> actually get deployed.
>>
>> Where are highly-parallelized requests to gpg-agent likely to happen?
>
[snip Matt's response and thanks to him for his investigation]

Certainly! Sorry for the late reply. GT.M (http://fis-gtm.com/) is a high performance cooperatively managed database engine with ACID transaction support and MUMPS language compiler and runtime. GT.M leverages GnuPG for both encryption and encryption key management (thank you for the tools and your hard work).

When I say that GT.M is a cooperatively managed database, I mean that GT.M processes cooperatively manage a shared memory cache. There is no primary daemon that starts and stops the database. Each process calls out to GnuPG for access to encrypted resources. During testing with GnuPG 2.1.18 and above, when our automated tests started more than 20 processes at once, we would face intermittent inexplicable failures in acquiring encryption keys. The example that I provided illustrates the bug with hundreds of process that we see with fewer processes.

At database startup, our customers typically start somewhere between 10 and hundreds of processes. One customer goes up into the tens of thousands by the time they reach a steady state. These are real world use cases (http://www.fisglobal.com/solutions/banking-and-wealth/services/database-engine/sample-users).

Please let me know if you have any questions or need more information from me.

Regards, Amul

PS: Please excuse the abominable reformatting by Outlook.


The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

_______________________________________________
Gnupg-devel mailing list
[hidden email]
http://lists.gnupg.org/mailman/listinfo/gnupg-devel