Difference between revisions of "Helpful MySQL Queries"
(→Number of users with data:) |
|||
(7 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Daniel has created a number of helpful queries: | Daniel has created a number of helpful queries: | ||
Line 36: | Line 10: | ||
User Information: | User Information: | ||
− | To get User (displaying email) with Last date they were cached, Most recently archived, and the Item Store flag: | + | ===To get User (displaying email) with Last date they were cached, Most recently archived, and the Item Store flag:=== |
− | < | + | |
+ | <pre> | ||
SELECT abook_id, f_email, from_unixtime(ts_cache) AS 'Last Cached From Mail Server', from_unixtime(ts_store) AS 'Most recently Accessed', from_unixtime(ts_item) AS 'Item Store Flag' | SELECT abook_id, f_email, from_unixtime(ts_cache) AS 'Last Cached From Mail Server', from_unixtime(ts_store) AS 'Most recently Accessed', from_unixtime(ts_item) AS 'Item Store Flag' | ||
FROM t_abook | FROM t_abook | ||
Line 49: | Line 24: | ||
FROM t_abook | FROM t_abook | ||
INNER JOIN t_uuid_mapping ON t_abook.f_uid = t_uuid_mapping.f_uuid; | INNER JOIN t_uuid_mapping ON t_abook.f_uid = t_uuid_mapping.f_uuid; | ||
− | </ | + | </pre> |
− | + | ||
− | + | ||
+ | ===Get A list of All users with a count of how many messages each person has:=== | ||
+ | <pre> | ||
SELECT abook_id, f_email AS 'Email Address', COUNT(DISTINCT(tm.message_id)) AS 'Total Message Count' | SELECT abook_id, f_email AS 'Email Address', COUNT(DISTINCT(tm.message_id)) AS 'Total Message Count' | ||
FROM t_message tm | FROM t_message tm | ||
Line 60: | Line 35: | ||
t_abook ta ON ta.f_uid = tum.f_uuid | t_abook ta ON ta.f_uid = tum.f_uuid | ||
GROUP BY abook_id; | GROUP BY abook_id; | ||
+ | </pre> | ||
*Note: The above query took 20ish minutes to run on a system with 8,000 users and 290 million messages. Number of users that had mail archived was actually nearly 20,000 | *Note: The above query took 20ish minutes to run on a system with 8,000 users and 290 million messages. Number of users that had mail archived was actually nearly 20,000 | ||
Another thing of note is my use of abook_id in the SELECT and also the WHERE clause. I’ve found that not everyone has an email address. Sometimes Retain will archive something that doesn’t have an email attached to it. Hence “WHERE abook_id =” and “GROUP BY abook_id” | Another thing of note is my use of abook_id in the SELECT and also the WHERE clause. I’ve found that not everyone has an email address. Sometimes Retain will archive something that doesn’t have an email attached to it. Hence “WHERE abook_id =” and “GROUP BY abook_id” | ||
− | Total number of messages a particular user has: | + | ===Total number of messages a particular user has:=== |
+ | |||
+ | <pre> | ||
SELECT COUNT(*) | SELECT COUNT(*) | ||
FROM t_message | FROM t_message | ||
Line 70: | Line 48: | ||
INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid | INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid | ||
WHERE f_email = ‘user@place.net’; | WHERE f_email = ‘user@place.net’; | ||
+ | </pre> | ||
IMPORTANT: It has been found that in Retain 4.0.3 that there are sometimes situations where there is data in the t_message table, but not in the t_message_attachments table. If you have nothing in the t_message_attachments table Retain will not be able to retrieve anything from disc. | IMPORTANT: It has been found that in Retain 4.0.3 that there are sometimes situations where there is data in the t_message table, but not in the t_message_attachments table. If you have nothing in the t_message_attachments table Retain will not be able to retrieve anything from disc. | ||
− | A more accurate query for the number of messages Retain actually has would probably be: | + | ====A more accurate query for the number of messages Retain actually has would probably be:==== |
+ | |||
+ | <pre> | ||
SELECT COUNT(*) | SELECT COUNT(*) | ||
FROM t_message | FROM t_message | ||
Line 79: | Line 60: | ||
INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid | INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid | ||
WHERE abook_id = {number}; | WHERE abook_id = {number}; | ||
+ | </pre> | ||
− | |||
+ | ===Number of users with data:=== | ||
+ | |||
+ | <pre> | ||
Select count(*) AS 'Number of users with data' | Select count(*) AS 'Number of users with data' | ||
From | From | ||
Line 95: | Line 79: | ||
AS t | AS t | ||
; | ; | ||
+ | </pre> | ||
+ | |||
+ | To limit it to users less than 3 months or older than 3 months add | ||
+ | <pre> | ||
+ | HAVING MAX(tm.f_delivered) < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH)) | ||
+ | </pre> | ||
+ | underneath the GROUP BY clause. Use > for newer than and < for older than. | ||
+ | |||
+ | '''''*Warning: This makes the query much more intense. This means that adding the older than, or newer than part of the query will run for a significant amount of time. A large system could see this query take several hours.''''' | ||
+ | |||
+ | ===Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):=== | ||
− | + | <pre> | |
Select count(*) AS 'Number of Exchange/O365 Users with archive data' | Select count(*) AS 'Number of Exchange/O365 Users with archive data' | ||
From | From | ||
Line 111: | Line 106: | ||
AS t | AS t | ||
; | ; | ||
+ | </pre> | ||
− | Get total (possible) size of message store used by a user. Size is in Bytes (This size may not be entirely accurate, this is a number that has been passed by the Email server to Retain): | + | ===Get total (possible) size of message store used by a user.=== |
+ | Size is in Bytes (This size may not be entirely accurate, this is a number that has been passed by the Email server to Retain): | ||
+ | <pre> | ||
SELECT SUM(f_size) AS 'Approximate Storage Use' | SELECT SUM(f_size) AS 'Approximate Storage Use' | ||
FROM t_message_attachments tma | FROM t_message_attachments tma | ||
Line 121: | Line 119: | ||
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid | INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid | ||
WHERE f_email = 'admin@knobias.com'; | WHERE f_email = 'admin@knobias.com'; | ||
+ | </pre> | ||
*NOTE: The previous 2 queries could possibly be inaccurate if 2 users have the same Email address. The most accurate would probably be to use abook_id | *NOTE: The previous 2 queries could possibly be inaccurate if 2 users have the same Email address. The most accurate would probably be to use abook_id | ||
In the WHERE clause you can use these fields that should be unique: | In the WHERE clause you can use these fields that should be unique: | ||
+ | <pre> | ||
abook_id = | abook_id = | ||
f_uid = | f_uid = | ||
f_mailbox = | f_mailbox = | ||
althash = | althash = | ||
− | + | </pre> | |
Most Unique Example: | Most Unique Example: | ||
+ | <pre> | ||
SELECT abook_id AS 'Retain Address ID', f_email AS 'Email Address', SUM(f_size) AS 'Approximate Storage Use' | SELECT abook_id AS 'Retain Address ID', f_email AS 'Email Address', SUM(f_size) AS 'Approximate Storage Use' | ||
FROM t_message_attachments tma | FROM t_message_attachments tma | ||
Line 138: | Line 139: | ||
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid | INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid | ||
WHERE abook_id = {abook_id number}; | WHERE abook_id = {abook_id number}; | ||
+ | </pre> | ||
− | User ID (abook_id), mailbox name, email address and most recently stored message date: | + | ===User ID (abook_id), mailbox name, email address and most recently stored message date:=== |
(Thanks Daron) | (Thanks Daron) | ||
+ | <pre> | ||
SELECT abook_id AS 'User ID', ta.f_mailbox Mailbox, ta.f_email AS 'Eamil address', from_unixtime(max(tm.f_stored)) AS 'Most recently archived message' | SELECT abook_id AS 'User ID', ta.f_mailbox Mailbox, ta.f_email AS 'Eamil address', from_unixtime(max(tm.f_stored)) AS 'Most recently archived message' | ||
from t_message tm | from t_message tm | ||
inner join t_abook ta on tm.uuid_mapping_id = ta.f_uid | inner join t_abook ta on tm.uuid_mapping_id = ta.f_uid | ||
GROUP BY abook_id; | GROUP BY abook_id; | ||
+ | </pre> | ||
− | Message counts and message store size: | + | ===Message counts and message store size:=== |
Get A list of All users with a count of how many messages each person has along with the possible size that their archive would take on disk. This last number will not add up to the total disk use that Retain is using. Use it as a ballpark figure. | Get A list of All users with a count of how many messages each person has along with the possible size that their archive would take on disk. This last number will not add up to the total disk use that Retain is using. Use it as a ballpark figure. | ||
WARNING: The following query is VERY intense. On a system with 3,500 users with mail in Retain, and 41 million total messages, it took 1 hour 17 minutes to return. Use with caution. | WARNING: The following query is VERY intense. On a system with 3,500 users with mail in Retain, and 41 million total messages, it took 1 hour 17 minutes to return. Use with caution. | ||
+ | <pre> | ||
SELECT | SELECT | ||
f_email AS 'Email Address', | f_email AS 'Email Address', | ||
Line 162: | Line 167: | ||
GROUP BY abook_id | GROUP BY abook_id | ||
ORDER BY tm.message_id; | ORDER BY tm.message_id; | ||
− | + | </pre> | |
By default it’ll organize the results by the email address. If you’d like to organize it by size of the mailbox, smallest first, change the ORDER BY clause to read as follows: | By default it’ll organize the results by the email address. If you’d like to organize it by size of the mailbox, smallest first, change the ORDER BY clause to read as follows: | ||
+ | |||
+ | <pre> | ||
ORDER BY SUM(f_size); | ORDER BY SUM(f_size); | ||
+ | </pre> | ||
If you’d like it in reverse order tack on a “DESC” like so: | If you’d like it in reverse order tack on a “DESC” like so: | ||
+ | <pre> | ||
ORDER BY SUM(f_size) DESC; | ORDER BY SUM(f_size) DESC; | ||
+ | </pre> | ||
If you want to organize it by number of messages, fewest first, tack on: | If you want to organize it by number of messages, fewest first, tack on: | ||
+ | <pre> | ||
COUNT(DISTINCT(tm.message_id)); | COUNT(DISTINCT(tm.message_id)); | ||
− | + | </pre> | |
If you want to run a less intensive version: find specific users that the customer is interested in. Get the address book id and nail it down to that. On the system mentioned above the user with the most messages took only 35 seconds to complete. | If you want to run a less intensive version: find specific users that the customer is interested in. Get the address book id and nail it down to that. On the system mentioned above the user with the most messages took only 35 seconds to complete. | ||
Example: | Example: | ||
− | + | <pre> | |
SELECT f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)' | SELECT f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)' | ||
FROM t_message_attachments tma | FROM t_message_attachments tma | ||
Line 183: | Line 194: | ||
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid | INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid | ||
WHERE abook_id = 1585; | WHERE abook_id = 1585; | ||
+ | </pre> | ||
Other Variations | Other Variations | ||
− | + | <pre> | |
SELECT | SELECT | ||
CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name', | CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name', | ||
Line 198: | Line 210: | ||
GROUP BY abook_id | GROUP BY abook_id | ||
ORDER BY tm.message_id; | ORDER BY tm.message_id; | ||
+ | </pre> | ||
Example Return | Example Return | ||
− | Find User name, and most recently archived message | + | ===Find User name, and most recently archived message=== |
+ | Below is a way to see everyone that’s older than 3 months. Just switch the > sign to a < sign to do 3 months and newer. | ||
If you don’t care about if it’s older or younger than 3 months (for licensing) then just take out the “HAVING” line | If you don’t care about if it’s older or younger than 3 months (for licensing) then just take out the “HAVING” line | ||
+ | <pre> | ||
SELECT | SELECT | ||
CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name', | CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name', | ||
Line 218: | Line 233: | ||
ORDER BY MAX(tm.f_delivered) | ORDER BY MAX(tm.f_delivered) | ||
; | ; | ||
+ | </pre> | ||
Example Return | Example Return | ||
− | Get email, number of messages that user has, Post office or server subsection they’re in, and the date of the most recent message that was archived | + | ===Get email, number of messages that user has, Post office or server subsection they’re in, and the date of the most recent message that was archived=== |
+ | <pre> | ||
SELECT f_email AS 'E-mail', COUNT(*) AS 'Number of Messages in Retain', f_name AS 'Post Office', MAX(FROM_UNIXTIME(f_stored)) AS 'Most Recent Message Archived' | SELECT f_email AS 'E-mail', COUNT(*) AS 'Number of Messages in Retain', f_name AS 'Post Office', MAX(FROM_UNIXTIME(f_stored)) AS 'Most Recent Message Archived' | ||
FROM t_message tm | FROM t_message tm | ||
Line 230: | Line 247: | ||
INNER JOIN t_postoffices tp ON tp.postoffice_id = ta.postoffice_id | INNER JOIN t_postoffices tp ON tp.postoffice_id = ta.postoffice_id | ||
GROUP BY abook_id; | GROUP BY abook_id; | ||
+ | </pre> | ||
WARNING: This query is VERY intense, do not run this without the understanding that this will take a significant amount of time and resources on SQL. Small test system with 96 users and 4,553,000 items in t_message took 1 minute 7 seconds to run. By way of comparison: A straight count of t_message took only 4 seconds. | WARNING: This query is VERY intense, do not run this without the understanding that this will take a significant amount of time and resources on SQL. Small test system with 96 users and 4,553,000 items in t_message took 1 minute 7 seconds to run. By way of comparison: A straight count of t_message took only 4 seconds. | ||
− | Get all message data for one user for messages delivered to the mailbox before X date: | + | ===Get all message data for one user for messages delivered to the mailbox before X date:=== |
+ | <pre> | ||
SELECT * | SELECT * | ||
FROM t_message tm | FROM t_message tm | ||
Line 242: | Line 261: | ||
AND tm.f_delivered < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -20 YEAR)) | AND tm.f_delivered < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -20 YEAR)) | ||
; | ; | ||
+ | </pre> | ||
+ | ===Getting Message data=== | ||
− | + | <pre> | |
− | + | ||
SELECT f_name AS 'Attachment Name', hash AS "Hash ID" | SELECT f_name AS 'Attachment Name', hash AS "Hash ID" | ||
FROM t_document td | FROM t_document td | ||
INNER JOIN t_message_attachments tma ON td.document_id = tma.document_id | INNER JOIN t_message_attachments tma ON td.document_id = tma.document_id | ||
WHERE message_id = 614645; | WHERE message_id = 614645; | ||
+ | </pre> | ||
The above is an expanded version of the query in this KB: http://support2.gwava.com/kb/?View=entry&EntryID=2420 | The above is an expanded version of the query in this KB: http://support2.gwava.com/kb/?View=entry&EntryID=2420 | ||
Line 257: | Line 278: | ||
Going in reverse order, if you have the hash ID then you can find the message ID(s) that it connects to, I also threw in the date that it was initially stored on disc (Human readable according to the date of the server you’re running this on.): | Going in reverse order, if you have the hash ID then you can find the message ID(s) that it connects to, I also threw in the date that it was initially stored on disc (Human readable according to the date of the server you’re running this on.): | ||
+ | <pre> | ||
SELECT message_id, FROM_UNIXTIME(date) | SELECT message_id, FROM_UNIXTIME(date) | ||
FROM t_message_attachments tma | FROM t_message_attachments tma | ||
INNER JOIN t_document td ON td.document_id = tma.document_id | INNER JOIN t_document td ON td.document_id = tma.document_id | ||
WHERE hash = '{Your_hash}'; | WHERE hash = '{Your_hash}'; | ||
+ | </pre> | ||
+ | Example: | ||
− | + | <pre> | |
SELECT message_id AS 'Message ID', FROM_UNIXTIME(date) AS 'Date Stored' | SELECT message_id AS 'Message ID', FROM_UNIXTIME(date) AS 'Date Stored' | ||
FROM t_message_attachments tma | FROM t_message_attachments tma | ||
INNER JOIN t_document td ON td.document_id = tma.document_id | INNER JOIN t_document td ON td.document_id = tma.document_id | ||
WHERE hash = '3249153B5525F19A46D28D29A4BE381CD2E8E22B8510C970328CF54464FAFACC'; | WHERE hash = '3249153B5525F19A46D28D29A4BE381CD2E8E22B8510C970328CF54464FAFACC'; | ||
+ | </pre> | ||
+ | ===To find if a particular hash was indexed and the message id:=== | ||
− | + | <pre> | |
− | + | ||
SELECT | SELECT | ||
tm.message_id, f_indexed | tm.message_id, f_indexed | ||
Line 286: | Line 311: | ||
WHERE | WHERE | ||
hash = '{hash_number}'; | hash = '{hash_number}'; | ||
+ | </pre> | ||
− | Check to see if you have any accidentally deleted documents (might take a while to run) | + | ===Check to see if you have any accidentally deleted documents (might take a while to run)=== |
This can be a result of a problem with the database and the count will come up how many deleted documents you have. You’ll have to go through a clean up process to fix this.: | This can be a result of a problem with the database and the count will come up how many deleted documents you have. You’ll have to go through a clean up process to fix this.: | ||
+ | <pre> | ||
SELECT COUNT(*) | SELECT COUNT(*) | ||
FROM t_message_attachments tma | FROM t_message_attachments tma | ||
Line 296: | Line 323: | ||
td.document_id IS NULL | td.document_id IS NULL | ||
; | ; | ||
+ | </pre> | ||
− | + | ===XML MERGE FILE CREATION:=== | |
− | XML MERGE FILE CREATION: | + | |
Possible way to get the GUID’s out of a database so that you can merge them through the UI: | Possible way to get the GUID’s out of a database so that you can merge them through the UI: | ||
+ | <pre> | ||
SELECT | SELECT | ||
CONCAT('<MergeTask><sourceMailbox>', | CONCAT('<MergeTask><sourceMailbox>', | ||
Line 316: | Line 344: | ||
and gw.f_email = ex.f_email | and gw.f_email = ex.f_email | ||
; | ; | ||
+ | </pre> | ||
Note: you’ll probably need to figure out which domain is the old and which is the new so that you can appropriately assign the combination. | Note: you’ll probably need to figure out which domain is the old and which is the new so that you can appropriately assign the combination. | ||
Copy and paste your results into an .xml file then tack on | Copy and paste your results into an .xml file then tack on | ||
+ | |||
+ | <pre> | ||
<MergeTasks> | <MergeTasks> | ||
<tasks> | <tasks> | ||
+ | </pre> | ||
To the top of the file and | To the top of the file and | ||
+ | <pre> | ||
</tasks> | </tasks> | ||
</MergeTasks> | </MergeTasks> | ||
+ | </pre> | ||
To the bottom of the file | To the bottom of the file | ||
− | + | ===Instead of looking up the unix timestamps for colums like "f_delievered" etc you can use UNIX_TIMESTAMP('20100101') I.E. UNIX_TIMESTAMP('YYYYMMDD')=== | |
− | Instead of looking up the unix timestamps for colums like "f_delievered" etc you can use UNIX_TIMESTAMP('20100101') I.E. UNIX_TIMESTAMP('YYYYMMDD') | + | |
So this query works in MySQL: | So this query works in MySQL: | ||
+ | |||
+ | <pre> | ||
SELECT COUNT(*) FROM t_message | SELECT COUNT(*) FROM t_message | ||
WHERE f_delivered < | WHERE f_delivered < | ||
UNIX_TIMESTAMP('2009-10-24') | UNIX_TIMESTAMP('2009-10-24') | ||
; | ; | ||
+ | </pre> | ||
(You can include or exclude hyphens) | (You can include or exclude hyphens) | ||
− | If you want a specific hour/min/sec you can use: | + | ====If you want a specific hour/min/sec you can use:==== |
+ | <pre> | ||
SELECT COUNT(*) FROM t_message | SELECT COUNT(*) FROM t_message | ||
WHERE f_delivered < | WHERE f_delivered < | ||
UNIX_TIMESTAMP('2010-10-24 01:02:03') | UNIX_TIMESTAMP('2010-10-24 01:02:03') | ||
; | ; | ||
+ | </pre> | ||
− | As Requested in RET-7262 | + | ===As Requested in RET-7262=== |
+ | <pre> | ||
SELECT | SELECT | ||
abook_id AS 'Retain ID', | abook_id AS 'Retain ID', | ||
Line 366: | Line 405: | ||
ORDER BY COUNT(*) | ORDER BY COUNT(*) | ||
; | ; | ||
+ | </pre> | ||
+ | ===MSSQL specific stuff:=== | ||
− | + | ===The DATEADD feature is similar to the MySQL DATE_ADD feature.=== | |
− | + | ||
− | The DATEADD feature is similar to the MySQL DATE_ADD feature. | + | |
+ | <pre> | ||
SELECT abook_id, MIN(f_first) AS 'First Name', MIN(f_last) AS 'Last Name', MIN(f_mailbox) AS 'Mailbox ID', MIN(f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages' | SELECT abook_id, MIN(f_first) AS 'First Name', MIN(f_last) AS 'Last Name', MIN(f_mailbox) AS 'Mailbox ID', MIN(f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages' | ||
FROM t_message tm | FROM t_message tm | ||
Line 379: | Line 419: | ||
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid | INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid | ||
GROUP BY abook_id | GROUP BY abook_id | ||
− | HAVING | + | HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01' |
ORDER BY COUNT(*) | ORDER BY COUNT(*) | ||
GO | GO | ||
+ | </pre> | ||
+ | ===Newer than 3 months:=== | ||
− | + | <pre> | |
− | + | ||
SELECT ta.abook_id, MIN(ta.f_first) AS 'First Name', MIN(ta.f_last) AS 'Last Name', MIN(ta.f_mailbox) AS 'Mailbox ID', MIN(ta.f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages' | SELECT ta.abook_id, MIN(ta.f_first) AS 'First Name', MIN(ta.f_last) AS 'Last Name', MIN(ta.f_mailbox) AS 'Mailbox ID', MIN(ta.f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages' | ||
FROM t_message tm | FROM t_message tm | ||
Line 393: | Line 434: | ||
HAVING DATEADD(S, MIN(ts_cache), '1970-01-01') > DATEADD(m, -3, GETDATE()) | HAVING DATEADD(S, MIN(ts_cache), '1970-01-01') > DATEADD(m, -3, GETDATE()) | ||
ORDER BY COUNT(*); | ORDER BY COUNT(*); | ||
+ | </pre> | ||
− | Number of users with data: | + | ===Number of users with data:=== |
− | Total Users with data: | + | ====Total Users with data:==== |
+ | <pre> | ||
Select count(*) | Select count(*) | ||
AS 'Number of users with data' | AS 'Number of users with data' | ||
Line 411: | Line 454: | ||
AS t | AS t | ||
GO | GO | ||
+ | </pre> | ||
+ | |||
+ | To limit it to users less than 3 months or older than 3 months add | ||
+ | <pre> | ||
+ | HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01' | ||
+ | </pre> | ||
+ | underneath the GROUP BY clause. Use > for newer than and < for older than. | ||
− | Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things): | + | ===Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):=== |
+ | <pre> | ||
Select count(*) AS 'Number of Exchange/O365 Users with archive data' | Select count(*) AS 'Number of Exchange/O365 Users with archive data' | ||
From | From | ||
Line 428: | Line 479: | ||
AS t | AS t | ||
GO | GO | ||
+ | </pre> |
Latest revision as of 18:14, 7 April 2017
Daniel has created a number of helpful queries:
All Queries are for Retain 3.x and newer. MySQL syntax. In most instances MSSQL will work fine with the same query.
Warning: Some queries can take a heavy performance toll on the SQL server, I’ve tried to create them to be as efficient as possible but some things just take lots of processing/disk I/O time. Don’t expect these to always come back right away. ALSO, do not run a query unless you understand what you’re doing. If a query goes cartesian on you it will bring the SQL server to it’s knees and will probably ruin anything else that’s trying to access that server.
Additional note: Because google thinks they’re awesome copy and paste straight from this doc may not work. Copy the query to a notepad, or notepad++ and change the ‘text’ to actual ‘ (AKA “tick” in some circles) marks. Quotes also work.
User Information:
[edit] To get User (displaying email) with Last date they were cached, Most recently archived, and the Item Store flag:
SELECT abook_id, f_email, from_unixtime(ts_cache) AS 'Last Cached From Mail Server', from_unixtime(ts_store) AS 'Most recently Accessed', from_unixtime(ts_item) AS 'Item Store Flag' FROM t_abook GROUP BY abook_id; SELECT f_first AS 'First Name', f_last AS 'Last Name', f_mailbox AS 'Mailbox ID', f_email AS 'Email Address' FROM t_abook INNER JOIN t_uuid_mapping ON t_abook.f_uid = t_uuid_mapping.f_uuid; SELECT COUNT(*) FROM t_abook INNER JOIN t_uuid_mapping ON t_abook.f_uid = t_uuid_mapping.f_uuid;
[edit] Get A list of All users with a count of how many messages each person has:
SELECT abook_id, f_email AS 'Email Address', COUNT(DISTINCT(tm.message_id)) AS 'Total Message Count' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id;
- Note: The above query took 20ish minutes to run on a system with 8,000 users and 290 million messages. Number of users that had mail archived was actually nearly 20,000
Another thing of note is my use of abook_id in the SELECT and also the WHERE clause. I’ve found that not everyone has an email address. Sometimes Retain will archive something that doesn’t have an email attached to it. Hence “WHERE abook_id =” and “GROUP BY abook_id”
[edit] Total number of messages a particular user has:
SELECT COUNT(*) FROM t_message INNER JOIN t_uuid_mapping ON t_message.uuid_mapping_id = t_uuid_mapping.uid_mapping_id INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid WHERE f_email = ‘user@place.net’;
IMPORTANT: It has been found that in Retain 4.0.3 that there are sometimes situations where there is data in the t_message table, but not in the t_message_attachments table. If you have nothing in the t_message_attachments table Retain will not be able to retrieve anything from disc.
[edit] A more accurate query for the number of messages Retain actually has would probably be:
SELECT COUNT(*) FROM t_message INNER JOIN t_uuid_mapping ON t_message.uuid_mapping_id = t_uuid_mapping.uid_mapping_id INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid WHERE abook_id = {number};
[edit] Number of users with data:
Select count(*) AS 'Number of users with data' From ( SELECT COUNT(*) FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_domains td ON ta.domain_id = td.domain_id GROUP BY abook_id ) AS t ;
To limit it to users less than 3 months or older than 3 months add
HAVING MAX(tm.f_delivered) < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH))
underneath the GROUP BY clause. Use > for newer than and < for older than.
*Warning: This makes the query much more intense. This means that adding the older than, or newer than part of the query will run for a significant amount of time. A large system could see this query take several hours.
[edit] Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):
Select count(*) AS 'Number of Exchange/O365 Users with archive data' From ( SELECT COUNT(*) FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_domains td ON ta.domain_id = td.domain_id WHERE td.f_type = 1 GROUP BY abook_id ) AS t ;
[edit] Get total (possible) size of message store used by a user.
Size is in Bytes (This size may not be entirely accurate, this is a number that has been passed by the Email server to Retain):
SELECT SUM(f_size) AS 'Approximate Storage Use' FROM t_message_attachments tma INNER JOIN t_message tm ON tma.message_id = tm.message_id INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid WHERE f_email = 'admin@knobias.com';
- NOTE: The previous 2 queries could possibly be inaccurate if 2 users have the same Email address. The most accurate would probably be to use abook_id
In the WHERE clause you can use these fields that should be unique:
abook_id = f_uid = f_mailbox = althash =
Most Unique Example:
SELECT abook_id AS 'Retain Address ID', f_email AS 'Email Address', SUM(f_size) AS 'Approximate Storage Use' FROM t_message_attachments tma INNER JOIN t_message tm ON tma.message_id = tm.message_id INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid WHERE abook_id = {abook_id number};
[edit] User ID (abook_id), mailbox name, email address and most recently stored message date:
(Thanks Daron)
SELECT abook_id AS 'User ID', ta.f_mailbox Mailbox, ta.f_email AS 'Eamil address', from_unixtime(max(tm.f_stored)) AS 'Most recently archived message' from t_message tm inner join t_abook ta on tm.uuid_mapping_id = ta.f_uid GROUP BY abook_id;
[edit] Message counts and message store size:
Get A list of All users with a count of how many messages each person has along with the possible size that their archive would take on disk. This last number will not add up to the total disk use that Retain is using. Use it as a ballpark figure.
WARNING: The following query is VERY intense. On a system with 3,500 users with mail in Retain, and 41 million total messages, it took 1 hour 17 minutes to return. Use with caution.
SELECT f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)' FROM t_message_attachments tma INNER JOIN t_message tm ON tma.message_id = tm.message_id INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id ORDER BY tm.message_id;
By default it’ll organize the results by the email address. If you’d like to organize it by size of the mailbox, smallest first, change the ORDER BY clause to read as follows:
ORDER BY SUM(f_size);
If you’d like it in reverse order tack on a “DESC” like so:
ORDER BY SUM(f_size) DESC;
If you want to organize it by number of messages, fewest first, tack on:
COUNT(DISTINCT(tm.message_id));
If you want to run a less intensive version: find specific users that the customer is interested in. Get the address book id and nail it down to that. On the system mentioned above the user with the most messages took only 35 seconds to complete.
Example:
SELECT f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)' FROM t_message_attachments tma INNER JOIN t_message tm ON tma.message_id = tm.message_id INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid WHERE abook_id = 1585;
Other Variations
SELECT CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name', f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)', FROM_UNIXTIME(MIN(tm.f_delivered)) AS 'Oldest Message', FROM_UNIXTIME(MAX(tm.f_delivered)) AS 'Newest Message' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id ORDER BY tm.message_id;
Example Return
[edit] Find User name, and most recently archived message
Below is a way to see everyone that’s older than 3 months. Just switch the > sign to a < sign to do 3 months and newer.
If you don’t care about if it’s older or younger than 3 months (for licensing) then just take out the “HAVING” line
SELECT CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name', f_email AS 'Email Address', FROM_UNIXTIME(MAX(tm.f_delivered)) AS 'Newest Message' FROM t_message_attachments tma INNER JOIN t_message tm ON tma.message_id = tm.message_id INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id HAVING MAX(tm.f_delivered) < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH)) ORDER BY MAX(tm.f_delivered) ;
Example Return
[edit] Get email, number of messages that user has, Post office or server subsection they’re in, and the date of the most recent message that was archived
SELECT f_email AS 'E-mail', COUNT(*) AS 'Number of Messages in Retain', f_name AS 'Post Office', MAX(FROM_UNIXTIME(f_stored)) AS 'Most Recent Message Archived' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_postoffices tp ON tp.postoffice_id = ta.postoffice_id GROUP BY abook_id;
WARNING: This query is VERY intense, do not run this without the understanding that this will take a significant amount of time and resources on SQL. Small test system with 96 users and 4,553,000 items in t_message took 1 minute 7 seconds to run. By way of comparison: A straight count of t_message took only 4 seconds.
[edit] Get all message data for one user for messages delivered to the mailbox before X date:
SELECT * FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid WHERE abook_id = 1 AND tm.f_delivered < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -20 YEAR)) ;
[edit] Getting Message data
SELECT f_name AS 'Attachment Name', hash AS "Hash ID" FROM t_document td INNER JOIN t_message_attachments tma ON td.document_id = tma.document_id WHERE message_id = 614645;
The above is an expanded version of the query in this KB: http://support2.gwava.com/kb/?View=entry&EntryID=2420
Besides the HASH ID it will also give you the name of the attachment.
Going in reverse order, if you have the hash ID then you can find the message ID(s) that it connects to, I also threw in the date that it was initially stored on disc (Human readable according to the date of the server you’re running this on.):
SELECT message_id, FROM_UNIXTIME(date) FROM t_message_attachments tma INNER JOIN t_document td ON td.document_id = tma.document_id WHERE hash = '{Your_hash}';
Example:
SELECT message_id AS 'Message ID', FROM_UNIXTIME(date) AS 'Date Stored' FROM t_message_attachments tma INNER JOIN t_document td ON td.document_id = tma.document_id WHERE hash = '3249153B5525F19A46D28D29A4BE381CD2E8E22B8510C970328CF54464FAFACC';
[edit] To find if a particular hash was indexed and the message id:
SELECT tm.message_id, f_indexed FROM t_message tm inner join t_message_attachments tma ON tm.message_id = tma.message_id INNER JOIN t_document td ON tma.document_id = td.document_id WHERE hash = '{hash_number}';
[edit] Check to see if you have any accidentally deleted documents (might take a while to run)
This can be a result of a problem with the database and the count will come up how many deleted documents you have. You’ll have to go through a clean up process to fix this.:
SELECT COUNT(*) FROM t_message_attachments tma LEFT JOIN t_document td ON td.document_id = tma.document_id WHERE td.document_id IS NULL ;
[edit] XML MERGE FILE CREATION:
Possible way to get the GUID’s out of a database so that you can merge them through the UI:
SELECT CONCAT('<MergeTask><sourceMailbox>', gw.f_uid, '</sourceMailbox><destinationMailbox>', ex.f_uid, '</destinationMailbox><removeOrphanedEntity>true</removeOrphanedEntity></MergeTask>' ) FROM t_abook gw, t_abook ex WHERE gw.domain_id = 2 and ex.domain_id = 3 and gw.f_email = ex.f_email ;
Note: you’ll probably need to figure out which domain is the old and which is the new so that you can appropriately assign the combination.
Copy and paste your results into an .xml file then tack on
<MergeTasks> <tasks>
To the top of the file and
</tasks> </MergeTasks>
To the bottom of the file
[edit] Instead of looking up the unix timestamps for colums like "f_delievered" etc you can use UNIX_TIMESTAMP('20100101') I.E. UNIX_TIMESTAMP('YYYYMMDD')
So this query works in MySQL:
SELECT COUNT(*) FROM t_message WHERE f_delivered < UNIX_TIMESTAMP('2009-10-24') ;
(You can include or exclude hyphens)
[edit] If you want a specific hour/min/sec you can use:
SELECT COUNT(*) FROM t_message WHERE f_delivered < UNIX_TIMESTAMP('2010-10-24 01:02:03') ;
[edit] As Requested in RET-7262
SELECT abook_id AS 'Retain ID', td.f_name AS 'Mail Server or Domain', CONCAT(f_first, ', ', f_last) AS 'First and Last Name', f_mailbox AS 'Mailbox ID', f_email AS 'Email Address', COUNT(*) AS 'Number of Messages' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_domains td ON ta.domain_id = td.domain_id WHERE f_stored < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH)) GROUP BY abook_id ORDER BY COUNT(*) ;
[edit] MSSQL specific stuff:
[edit] The DATEADD feature is similar to the MySQL DATE_ADD feature.
SELECT abook_id, MIN(f_first) AS 'First Name', MIN(f_last) AS 'Last Name', MIN(f_mailbox) AS 'Mailbox ID', MIN(f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01' ORDER BY COUNT(*) GO
[edit] Newer than 3 months:
SELECT ta.abook_id, MIN(ta.f_first) AS 'First Name', MIN(ta.f_last) AS 'Last Name', MIN(ta.f_mailbox) AS 'Mailbox ID', MIN(ta.f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages' FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid GROUP BY abook_id HAVING DATEADD(S, MIN(ts_cache), '1970-01-01') > DATEADD(m, -3, GETDATE()) ORDER BY COUNT(*);
[edit] Number of users with data:
[edit] Total Users with data:
Select count(*) AS 'Number of users with data' From ( SELECT COUNT(*) Col1 FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_domains td ON ta.domain_id = td.domain_id GROUP BY abook_id ) AS t GO
To limit it to users less than 3 months or older than 3 months add
HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01'
underneath the GROUP BY clause. Use > for newer than and < for older than.
[edit] Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):
Select count(*) AS 'Number of Exchange/O365 Users with archive data' From ( SELECT COUNT(*) Col1 FROM t_message tm INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid INNER JOIN t_domains td ON ta.domain_id = td.domain_id WHERE td.f_type = 1 GROUP BY abook_id ) AS t GO