Allow identifying a long-running or stuck query and killing it

Markus Halttunen shared this question 6 years ago
Answered

Sometimes I run a query that doesn't use an index and it can take a very long time. I stop it in NoSQLBooster, but it doesn't kill the actual operation.


I need to use the following to find the query:


db.currentOp().inprog.forEach(
  function(op) {
    if(op.secs_running > 5 && op.ns != "local.oplog.rs" && op.ns != "local.replset.minvalid") printjson(op);
  }
)
And then kill the stuck operation manually. It would be great if NoSQLBooster could help me with this process.

Replies (14)

photo
1

After you press the stop button, NoSQLBooster will call db.killOp(). But MongoDB's killOp doesn't take effect immediately. if you call db.currentOp, you will find that the flag bit of killPending has flagged.

According to MongoDB online doc, currentOp.killPending Returns true if the operation is currently flagged for termination. When the operation encounters its next safe termination point, the operation will terminate.


BTW, NoSQLBooster has some build-in currentOp snippets. Enter "db.currentOp" to popup code-complete list. The following one is to return information on all active operations for db that have been running longer than 3 seconds.


/H

photo
1

Unfortunately it does not work at all like you described. I am not sure if the problem is related to the fact that I am running these queries on the secondaries.

When I press the stop button, the query stops immediately. But it keeps running on the server "forever". There is no "killPending" flagged. If I try to run db.killOp() in NoSQLBooster, it doesn't do anything. I just have to press the stop on that command. The only way to kill the slow operation is to SSH to the instance and run it there.

Please advice.

photo
1

Thank you for your feedback. We have worked out a new test build to support killOp with "secondary mode" connection please download and give it a try.

https://nosqlbooster.com/s3/download/releasesv4/nosqlbooster4mongo-4.5.5.dmg

photo
1

Calling killOp manually works now. Thank you!


Unfortunately pressing stop still does not kill the operation, so I have to do it manually which is inconvenient.

photo
1

Could you give me a sample script and some test data to recall the issue?

I tested the function with the following script which use $where function to simulate long-run query.

db.collection.find({}).limit(10).$where(`
    function(){
        sleep(10*1000); //simulate long run query
        
        return true;
    }

`)
And, Use the following code to find long-run query .

db.currentOp(
   {
     "active" : true,
     "secs_running" : { "$gt" : 3 },
     // "appName":/NoSQL/
   }
).inprog

photo
1

I used the same to find the long running query.


To get an actual long-running query, I simply ran the following query on a collection with millions of entries and no index on field foo:

db.players.find({"foo":"bar"})
I pressed the "Stop" button and waited at least 30 seconds and the query was still running. It stopped immediately when I called db.killOp myself.

Could you try to create a collection of (tens of) millions of entries and try the same?

photo
1

I use NoSQLBooster's test data generator to generate 10M records, query without field index.

Then I run MongoDB shell method "show log", it shows that the kill op has been called.

2018-06-06T15:53:50.961+0800 I COMMAND [conn25] going to kill op: 510852018-06-06T15:53:50.963+0800 I COMMAND [conn24] command demo.transaction appName: "NoSQLBoosterV4" command: getMore { getMore: 227254540936, collection: "transaction", batchSize: 21.0, $clusterTime: { clusterTime: Timestamp(1528271623, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0.0 } }, $readPreference: { mode: "secondaryPreferred" }, $db: "demo" } originatingCommand: { find: "transaction", filter: { name: "xcvcvcx" }, limit: 21.0, batchSize: 0.0, $clusterTime: { c

/24iNO2sZ05trbHPNLTS9tbOxR856vbm7p

photo
1

I downloaded the binary again and tested multiple times. The on-going operation is never killed.

Here is the log:

2018-06-06T08:41:03.646+0000 I NETWORK   connection accepted from [redacted] #8769 (344 connections now open)
2018-06-06T08:41:03.761+0000 I NETWORK  [conn8769] received client metadata from [redacted] conn8769: { driver: { name: "nodejs", version: "3.0.2-2" }, os: { type: "Darwin", name: "darwin", architecture: "x64", version: "17.5.0" }, platform: "Node.js v8.2.1, LE, mongodb-core: 3.0.2", application: { name: "NoSQLBoosterV4" } }
2018-06-06T08:41:04.272+0000 I -        [conn8768] end connection [redacted] (344 connections now open)
2018-06-06T08:41:15.680+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to [redacted]:27017
2018-06-06T08:41:15.684+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to [redacted]:27017, took 4ms (3 connections now open to [redacted]:27017)
2018-06-06T08:42:15.680+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Ending idle connection to host [redacted]:27017 because the pool meets constraints; 2 connections to that host remain open
2018-06-06T08:42:54.408+0000 I COMMAND  [conn8764] going to kill op: 1692617107
2018-06-06T08:42:54.412+0000 I COMMAND  [conn8760] command [redacted].players appName: "NoSQLBoosterV4" command: getMore { getMore: 138009004606, collection: "players", batchSize: 21.0 } originatingCommand: { find: "players", filter: { foo: "bar" }, limit: 21.0, batchSize: 0.0 } planSummary: COLLSCAN cursorid:138009004606 exception: operation was interrupted code:11601 numYields:7151 reslen:111 locks:{ Global: { acquireCount: { r: 14304 }, acquireWaitCount: { r: 5743 }, timeAcquiringMicros: { r: 15696924 } },
The operation at 08:42:54 (going to kill op) is me manually killing the op after it was not killed by NoSQLBooster after about 40 seconds.

photo
1

Thank you for your patience, I am sorry, I uploaded a wrong test build, please re-download and try it again If it still does not work, please give me your MongoDB log.

https://nosqlbooster.com/s3/download/releasesv4/nosqlbooster4mongo-4.5.5.dmg

photo
1

I tried again using the latest build you linked to and the end result is exactly the same as before? Are you sure you are sending the right build? Could you include a build number in the file name and the "About" dialog so we could make sure I'm using the right version?

I called "show log" and it provides nothing more than what I pasted above.

photo
1

Try start and stop script, focus to object tree then press CMD+SHIFT+ALT+F7 to show the developer tools windows. Can you see the killop log message in the console tab like the following picture? If read-preference mode is "primary", can this long-run query be terminated correctly?


And, What's your MongoDB server version and rs configuration? run db.version() and rs.conf()


/huM1SYIUMLWU+vG1x2Gf5Apz

photo
1

Version is 3.4.14

RsConf:


{
	"_id" : "[redacted]",
	"version" : 52,
	"protocolVersion" : 1,
	"members" : [
		{
			"_id" : 6,
			"host" : "[redacted]1:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : 0,
			"votes" : 1
		},
		{
			"_id" : 7,
			"host" : "[redacted]2:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : 0,
			"votes" : 1
		},
		{
			"_id" : 8,
			"host" : "[redacted]3:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : 0,
			"votes" : 1
		}
	],
	"settings" : {
		"chainingAllowed" : true,
		"heartbeatIntervalMillis" : 2000,
		"heartbeatTimeoutSecs" : 10,
		"electionTimeoutMillis" : 10000,
		"catchUpTimeoutMillis" : 2000,
		"getLastErrorModes" : {
			
		},
		"getLastErrorDefaults" : {
			"w" : 1,
			"wtimeout" : 0
		},
		"replicaSetId" : ObjectId("58b5354aa6c065d045edbf0a")
	}
}
There is no killop in the console. Only "connect..." (I needed to press ctrl+shift+alt+F7 even on a mac to open it)

Stopping did not work correctly on primary either.

photo
1

If possible, could you please give me a temporarily accessible Mongodb RS (MongoDB URI) to recall the issue locally?

I can't reproduce the issue on my local test servers. I guess the problem is related to the specific configuration of the replica set or permissions. To solve this problem, I need a test environment and detailed steps to reproduce it?

Thanks a lot.

photo
2

It's been two years, but the problem is still there. I'm using the version 6.1.8 now against MongoDB 4.0.9. Here are the steps to reproduce:

1. Run a query against a collection that has tens of millions of large entries. Make sure your query doesn't hit an index.

2. In another tab, run the currentOpActiveAndRunningLongerThan3Secs template to confirm that your query is visible

3. Stop the original query in the first tab

4. In the second tab, run the currentOpActiveAndRunningLongerThan3Secs again to confirm that the query is still there

5. Run db.killOp() using the operationId you got at 4

6. Run the currentOpActiveAndRunningLongerThan3Secs one more time to confirm that the query is gone


What should happen: In step 4 you should no longer see the query. You stopped it in NoSqlBooser so the query should have been killed.

What happens instead: In step 4 you still see the query and you have to kill it manually.


Are you 100% sure you can't reproduce this issue? Unfortunately I can't give you access to our MongoDB but please tell me what kind of logging I can enable for you to understand in the logs what is going wrong?

photo
1

Well... I have to second that, Stop does not killOp.

I've tested it multiple times with any permission against primary and secondary instances, it only affect the op locally, but it keeps running on the server.


We would really appreciate fixing this.

photo
1

Please give the build (6.2.1) a try. This version partially solves this problem.

Download page: https://nosqlbooster.com/downloads

photo
1

For transparency: It did not solve the problem for us. Please let us know whether it worked for you, Tomer.

photo
photo
1

Thank you for trying to figure out the problem!

Unfortunately I don't think I will be able to spend more time on this at this point. I will probably revive this topic when we've upgraded to a newer version of MongoDB.

Leave a Comment
 
Attach a file