The Latest Login in 2020 - Problem

You're tasked with analyzing user login data for a busy web application. Given a Logins table that tracks every user login with timestamps, you need to find the most recent login for each user specifically during the year 2020.

The Logins table structure:

Column NameType
user_idint
time_stampdatetime

Important: Only include users who actually logged in during 2020 - ignore users who didn't log in that year at all.

Goal: Write a SQL query that returns each user's latest login timestamp from 2020. The results can be in any order.

Input & Output

example_1.sql โ€” Basic Example
$ Input: Logins table: | user_id | time_stamp | |---------|--------------------| | 6 | 2020-06-30 15:06:07| | 6 | 2021-04-21 14:03:06| | 6 | 2019-03-07 00:18:15| | 8 | 2020-02-01 05:10:53| | 8 | 2020-12-30 00:46:50| | 2 | 2020-01-16 02:49:50| | 2 | 2019-08-25 07:59:08|
โ€บ Output: | user_id | last_login | |---------|--------------------| | 6 | 2020-06-30 15:06:07| | 8 | 2020-12-30 00:46:50| | 2 | 2020-01-16 02:49:50|
๐Ÿ’ก Note: User 6 logged in twice in 2020 (June 30) and other years, so we take their latest 2020 login. User 8 had two 2020 logins, we take December 30. User 2 had one 2020 login in January. All other login years are ignored.
example_2.sql โ€” Single Login Per User
$ Input: Logins table: | user_id | time_stamp | |---------|--------------------| | 1 | 2020-03-15 10:30:00| | 2 | 2020-07-22 14:15:30| | 3 | 2020-11-05 09:45:12| | 1 | 2019-12-01 08:00:00| | 2 | 2021-01-10 16:30:00|
โ€บ Output: | user_id | last_login | |---------|--------------------| | 1 | 2020-03-15 10:30:00| | 2 | 2020-07-22 14:15:30| | 3 | 2020-11-05 09:45:12|
๐Ÿ’ก Note: Each user has exactly one login in 2020, so those are their latest (and only) 2020 logins. User 1's 2019 login and User 2's 2021 login are excluded.
example_3.sql โ€” No 2020 Logins Edge Case
$ Input: Logins table: | user_id | time_stamp | |---------|--------------------| | 1 | 2019-12-31 23:59:59| | 1 | 2021-01-01 00:00:01| | 2 | 2018-05-15 12:30:00| | 2 | 2022-03-10 14:20:15|
โ€บ Output: | user_id | last_login | |---------|------------| | | |
๐Ÿ’ก Note: No users logged in during 2020, so the result set is empty. User 1 logged in on Dec 31, 2019 and Jan 1, 2021, but neither is in 2020. User 2 logged in 2018 and 2022, also outside our target year.

Constraints

  • 1 โ‰ค Number of login records โ‰ค 104
  • Each (user_id, time_stamp) combination is unique
  • time_stamp is a valid datetime value
  • user_id is a positive integer

Visualization

Tap to expand
Login Data Processing PipelineAll Login RecordsUser 1: 2019, 2020, 2021User 2: 2020, 2020User 3: 2019, 2022User 4: 2020Filter: 2020 OnlyUser 1: 2020User 2: 2020, 2020User 4: 2020Group by User[User 1] [User 2] [User 4]2020-Mar 2020-Jan,Dec 2020-JunMAX per GroupUser 1: 2020-MarUser 2: 2020-DecUser 4: 2020-JunSQL: WHERE โ†’ GROUP BY โ†’ MAX()Single pass through data with efficient aggregationDatabase optimizer handles sorting and grouping๐Ÿ’ก Key: Use built-in aggregation instead of manual loopsTime: O(n) | Space: O(unique_users)
Understanding the Visualization
1
Filter by Year
Remove all login records that aren't from 2020, like filtering mail by postmark year
2
Group by User
Organize remaining records by user_id, like sorting mail into piles by recipient
3
Find Latest in Each Group
For each user's pile, pick the record with the maximum timestamp
4
Return Results
Output each user with their latest 2020 login timestamp
Key Takeaway
๐ŸŽฏ Key Insight: GROUP BY with MAX() leverages database optimization to process each record exactly once, eliminating the need for nested queries or multiple passes through the data.
Asked in
Amazon 45 Meta 35 Google 30 Microsoft 25
35.2K Views
High Frequency
~8 min Avg. Time
1.5K Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen