Difference between revisions of "Helpful MySQL Queries"
(→The DATEADD feature is similar to the MySQL DATE_ADD feature.) |
(→Number of users with data:) |
||
(2 intermediate revisions by one user not shown) | |||
Line 80: | Line 80: | ||
; | ; | ||
</pre> | </pre> | ||
+ | |||
+ | To limit it to users less than 3 months or older than 3 months add | ||
+ | <pre> | ||
+ | HAVING MAX(tm.f_delivered) < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH)) | ||
+ | </pre> | ||
+ | underneath the GROUP BY clause. Use > for newer than and < for older than. | ||
+ | |||
+ | '''''*Warning: This makes the query much more intense. This means that adding the older than, or newer than part of the query will run for a significant amount of time. A large system could see this query take several hours.''''' | ||
===Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):=== | ===Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):=== | ||
Line 447: | Line 455: | ||
GO | GO | ||
</pre> | </pre> | ||
+ | |||
+ | To limit it to users less than 3 months or older than 3 months add | ||
+ | <pre> | ||
+ | HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01' | ||
+ | </pre> | ||
+ | underneath the GROUP BY clause. Use > for newer than and < for older than. | ||
===Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):=== | ===Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):=== |
Latest revision as of 18:14, 7 April 2017
Daniel has created a number of helpful queries:
All Queries are for Retain 3.x and newer. MySQL syntax. In most instances MSSQL will work fine with the same query.
Warning: Some queries can take a heavy performance toll on the SQL server, I’ve tried to create them to be as efficient as possible but some things just take lots of processing/disk I/O time. Don’t expect these to always come back right away. ALSO, do not run a query unless you understand what you’re doing. If a query goes cartesian on you it will bring the SQL server to it’s knees and will probably ruin anything else that’s trying to access that server.
Additional note: Because google thinks they’re awesome copy and paste straight from this doc may not work. Copy the query to a notepad, or notepad++ and change the ‘text’ to actual ‘ (AKA “tick” in some circles) marks. Quotes also work.
User Information:
[edit] To get User (displaying email) with Last date they were cached, Most recently archived, and the Item Store flag:
SELECT abook_id, f_email, from_unixtime(ts_cache) AS 'Last Cached From Mail Server', from_unixtime(ts_store) AS 'Most recently Accessed', from_unixtime(ts_item) AS 'Item Store Flag' FROM t_abook GROUP BY abook_id; SELECT f_first AS 'First Name', f_last AS 'Last Name', f_mailbox AS 'Mailbox ID', f_email AS 'Email Address' FROM t_abook INNER JOIN t_uuid_mapping ON t_abook.f_uid = t_uuid_mapping.f_uuid; SELECT COUNT(*) FROM t_abook INNER JOIN t_uuid_mapping ON t_abook.f_uid = t_uuid_mapping.f_uuid;
[edit] Get A list of All users with a count of how many messages each person has:
SELECT abook_id, f_email AS 'Email Address', COUNT(DISTINCT(tm.message_id)) AS 'Total Message Count' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id;
- Note: The above query took 20ish minutes to run on a system with 8,000 users and 290 million messages. Number of users that had mail archived was actually nearly 20,000
Another thing of note is my use of abook_id in the SELECT and also the WHERE clause. I’ve found that not everyone has an email address. Sometimes Retain will archive something that doesn’t have an email attached to it. Hence “WHERE abook_id =” and “GROUP BY abook_id”
[edit] Total number of messages a particular user has:
SELECT COUNT(*) FROM t_message INNER JOIN t_uuid_mapping ON t_message.uuid_mapping_id = t_uuid_mapping.uid_mapping_id INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid WHERE f_email = ‘user@place.net’;
IMPORTANT: It has been found that in Retain 4.0.3 that there are sometimes situations where there is data in the t_message table, but not in the t_message_attachments table. If you have nothing in the t_message_attachments table Retain will not be able to retrieve anything from disc.
[edit] A more accurate query for the number of messages Retain actually has would probably be:
SELECT COUNT(*) FROM t_message INNER JOIN t_uuid_mapping ON t_message.uuid_mapping_id = t_uuid_mapping.uid_mapping_id INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid WHERE abook_id = {number};
[edit] Number of users with data:
Select count(*) AS 'Number of users with data' From ( SELECT COUNT(*) FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_domains td ON ta.domain_id = td.domain_id GROUP BY abook_id ) AS t ;
To limit it to users less than 3 months or older than 3 months add
HAVING MAX(tm.f_delivered) < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH))
underneath the GROUP BY clause. Use > for newer than and < for older than.
*Warning: This makes the query much more intense. This means that adding the older than, or newer than part of the query will run for a significant amount of time. A large system could see this query take several hours.
[edit] Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):
Select count(*) AS 'Number of Exchange/O365 Users with archive data' From ( SELECT COUNT(*) FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_domains td ON ta.domain_id = td.domain_id WHERE td.f_type = 1 GROUP BY abook_id ) AS t ;
[edit] Get total (possible) size of message store used by a user.
Size is in Bytes (This size may not be entirely accurate, this is a number that has been passed by the Email server to Retain):
SELECT SUM(f_size) AS 'Approximate Storage Use' FROM t_message_attachments tma INNER JOIN t_message tm ON tma.message_id = tm.message_id INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid WHERE f_email = 'admin@knobias.com';
- NOTE: The previous 2 queries could possibly be inaccurate if 2 users have the same Email address. The most accurate would probably be to use abook_id
In the WHERE clause you can use these fields that should be unique:
abook_id = f_uid = f_mailbox = althash =
Most Unique Example:
SELECT abook_id AS 'Retain Address ID', f_email AS 'Email Address', SUM(f_size) AS 'Approximate Storage Use' FROM t_message_attachments tma INNER JOIN t_message tm ON tma.message_id = tm.message_id INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid WHERE abook_id = {abook_id number};
[edit] User ID (abook_id), mailbox name, email address and most recently stored message date:
(Thanks Daron)
SELECT abook_id AS 'User ID', ta.f_mailbox Mailbox, ta.f_email AS 'Eamil address', from_unixtime(max(tm.f_stored)) AS 'Most recently archived message' from t_message tm inner join t_abook ta on tm.uuid_mapping_id = ta.f_uid GROUP BY abook_id;
[edit] Message counts and message store size:
Get A list of All users with a count of how many messages each person has along with the possible size that their archive would take on disk. This last number will not add up to the total disk use that Retain is using. Use it as a ballpark figure.
WARNING: The following query is VERY intense. On a system with 3,500 users with mail in Retain, and 41 million total messages, it took 1 hour 17 minutes to return. Use with caution.
SELECT f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)' FROM t_message_attachments tma INNER JOIN t_message tm ON tma.message_id = tm.message_id INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id ORDER BY tm.message_id;
By default it’ll organize the results by the email address. If you’d like to organize it by size of the mailbox, smallest first, change the ORDER BY clause to read as follows:
ORDER BY SUM(f_size);
If you’d like it in reverse order tack on a “DESC” like so:
ORDER BY SUM(f_size) DESC;
If you want to organize it by number of messages, fewest first, tack on:
COUNT(DISTINCT(tm.message_id));
If you want to run a less intensive version: find specific users that the customer is interested in. Get the address book id and nail it down to that. On the system mentioned above the user with the most messages took only 35 seconds to complete.
Example:
SELECT f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)' FROM t_message_attachments tma INNER JOIN t_message tm ON tma.message_id = tm.message_id INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid WHERE abook_id = 1585;
Other Variations
SELECT CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name', f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)', FROM_UNIXTIME(MIN(tm.f_delivered)) AS 'Oldest Message', FROM_UNIXTIME(MAX(tm.f_delivered)) AS 'Newest Message' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id ORDER BY tm.message_id;
Example Return
[edit] Find User name, and most recently archived message
Below is a way to see everyone that’s older than 3 months. Just switch the > sign to a < sign to do 3 months and newer.
If you don’t care about if it’s older or younger than 3 months (for licensing) then just take out the “HAVING” line
SELECT CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name', f_email AS 'Email Address', FROM_UNIXTIME(MAX(tm.f_delivered)) AS 'Newest Message' FROM t_message_attachments tma INNER JOIN t_message tm ON tma.message_id = tm.message_id INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id HAVING MAX(tm.f_delivered) < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH)) ORDER BY MAX(tm.f_delivered) ;
Example Return
[edit] Get email, number of messages that user has, Post office or server subsection they’re in, and the date of the most recent message that was archived
SELECT f_email AS 'E-mail', COUNT(*) AS 'Number of Messages in Retain', f_name AS 'Post Office', MAX(FROM_UNIXTIME(f_stored)) AS 'Most Recent Message Archived' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_postoffices tp ON tp.postoffice_id = ta.postoffice_id GROUP BY abook_id;
WARNING: This query is VERY intense, do not run this without the understanding that this will take a significant amount of time and resources on SQL. Small test system with 96 users and 4,553,000 items in t_message took 1 minute 7 seconds to run. By way of comparison: A straight count of t_message took only 4 seconds.
[edit] Get all message data for one user for messages delivered to the mailbox before X date:
SELECT * FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid WHERE abook_id = 1 AND tm.f_delivered < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -20 YEAR)) ;
[edit] Getting Message data
SELECT f_name AS 'Attachment Name', hash AS "Hash ID" FROM t_document td INNER JOIN t_message_attachments tma ON td.document_id = tma.document_id WHERE message_id = 614645;
The above is an expanded version of the query in this KB: http://support2.gwava.com/kb/?View=entry&EntryID=2420
Besides the HASH ID it will also give you the name of the attachment.
Going in reverse order, if you have the hash ID then you can find the message ID(s) that it connects to, I also threw in the date that it was initially stored on disc (Human readable according to the date of the server you’re running this on.):
SELECT message_id, FROM_UNIXTIME(date) FROM t_message_attachments tma INNER JOIN t_document td ON td.document_id = tma.document_id WHERE hash = '{Your_hash}';
Example:
SELECT message_id AS 'Message ID', FROM_UNIXTIME(date) AS 'Date Stored' FROM t_message_attachments tma INNER JOIN t_document td ON td.document_id = tma.document_id WHERE hash = '3249153B5525F19A46D28D29A4BE381CD2E8E22B8510C970328CF54464FAFACC';
[edit] To find if a particular hash was indexed and the message id:
SELECT tm.message_id, f_indexed FROM t_message tm inner join t_message_attachments tma ON tm.message_id = tma.message_id INNER JOIN t_document td ON tma.document_id = td.document_id WHERE hash = '{hash_number}';
[edit] Check to see if you have any accidentally deleted documents (might take a while to run)
This can be a result of a problem with the database and the count will come up how many deleted documents you have. You’ll have to go through a clean up process to fix this.:
SELECT COUNT(*) FROM t_message_attachments tma LEFT JOIN t_document td ON td.document_id = tma.document_id WHERE td.document_id IS NULL ;
[edit] XML MERGE FILE CREATION:
Possible way to get the GUID’s out of a database so that you can merge them through the UI:
SELECT CONCAT('<MergeTask><sourceMailbox>', gw.f_uid, '</sourceMailbox><destinationMailbox>', ex.f_uid, '</destinationMailbox><removeOrphanedEntity>true</removeOrphanedEntity></MergeTask>' ) FROM t_abook gw, t_abook ex WHERE gw.domain_id = 2 and ex.domain_id = 3 and gw.f_email = ex.f_email ;
Note: you’ll probably need to figure out which domain is the old and which is the new so that you can appropriately assign the combination.
Copy and paste your results into an .xml file then tack on
<MergeTasks> <tasks>
To the top of the file and
</tasks> </MergeTasks>
To the bottom of the file
[edit] Instead of looking up the unix timestamps for colums like "f_delievered" etc you can use UNIX_TIMESTAMP('20100101') I.E. UNIX_TIMESTAMP('YYYYMMDD')
So this query works in MySQL:
SELECT COUNT(*) FROM t_message WHERE f_delivered < UNIX_TIMESTAMP('2009-10-24') ;
(You can include or exclude hyphens)
[edit] If you want a specific hour/min/sec you can use:
SELECT COUNT(*) FROM t_message WHERE f_delivered < UNIX_TIMESTAMP('2010-10-24 01:02:03') ;
[edit] As Requested in RET-7262
SELECT abook_id AS 'Retain ID', td.f_name AS 'Mail Server or Domain', CONCAT(f_first, ', ', f_last) AS 'First and Last Name', f_mailbox AS 'Mailbox ID', f_email AS 'Email Address', COUNT(*) AS 'Number of Messages' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_domains td ON ta.domain_id = td.domain_id WHERE f_stored < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH)) GROUP BY abook_id ORDER BY COUNT(*) ;
[edit] MSSQL specific stuff:
[edit] The DATEADD feature is similar to the MySQL DATE_ADD feature.
SELECT abook_id, MIN(f_first) AS 'First Name', MIN(f_last) AS 'Last Name', MIN(f_mailbox) AS 'Mailbox ID', MIN(f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01' ORDER BY COUNT(*) GO
[edit] Newer than 3 months:
SELECT ta.abook_id, MIN(ta.f_first) AS 'First Name', MIN(ta.f_last) AS 'Last Name', MIN(ta.f_mailbox) AS 'Mailbox ID', MIN(ta.f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id HAVING DATEADD(S, MIN(ts_cache), '1970-01-01') > DATEADD(m, -3, GETDATE()) ORDER BY COUNT(*);
[edit] Number of users with data:
[edit] Total Users with data:
Select count(*) AS 'Number of users with data' From ( SELECT COUNT(*) Col1 FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_domains td ON ta.domain_id = td.domain_id GROUP BY abook_id ) AS t GO
To limit it to users less than 3 months or older than 3 months add
HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01'
underneath the GROUP BY clause. Use > for newer than and < for older than.
[edit] Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):
Select count(*) AS 'Number of Exchange/O365 Users with archive data' From ( SELECT COUNT(*) Col1 FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_domains td ON ta.domain_id = td.domain_id WHERE td.f_type = 1 GROUP BY abook_id ) AS t GO