はじめに
登壇スライド
Gunosy での Security Hub の運用方法
Security Hub とは
Security Hub をマルチアカウント環境で運用する際の課題
Security Lake に期待したこと
導入方法・設定等
運用方法
まとめ

はじめに

こんにちは、SRE チームの Yamaguchi（@yamaguchi_tk ）です。

さて、本エントリーはJAWS-UG千葉支部オンライン#20 -AWSではじめるクラウドセキュリティ- - connpassで発表した内容での補足です。

Security Lake が 2023年5月30日に GA されましたので、GA で変更された内容も含めています。

弊社では Security Hub のダッシュボード運用を Security Lake を利用した QuickSight ダッシュボード運用に切り替えましたが、その辺りの課題感や Secuirty Lake の具体的な実装について、本エントリーでご紹介できればと思います。

登壇スライド

speakerdeck.com

Gunosy での Security Hub の運用方法

Gunosy でのセキュリティ脆弱性の対応フローは以下の図のように運用しています。

AWS環境のミスコンフィグレーションチェックでは以下のように開発チームと責任分担をしていて、スキャンとトリアージにSecurity Hubを利用しています。

アクション	責任・対応	利用サービス、ツール等
スキャン	システム化・自動化	Security Hub（＋AWS Config）
トリアージ	SRE チーム	Security Hub のダッシュボード
対応	各開発チーム	IaC、マネコン等

Security Hub とは

AWSのサービス紹介（AWS Security Hub（統合されたセキュリティ & コンプライアンスセンター）| AWS）にもあるとおり、

AWS のセキュリティチェックの自動化とセキュリティアラートの一元化

をしてくれるサービスです。

Security Hub をマルチアカウント環境で運用する際の課題

トリアージに Security Hub のダッシュボードを使用しますが、マルチアカウント環境では以下のような課題がありました。

Critical、High、Middium、Low 別の検知件数が表示できない
アカウント別に集計した形式で一覧表示できない
例外にしたリソース・コントロールが一覧表示できない

シングルアカウント運用の場合では（手間ですが）そこまで問題となることはないと思いますが、マルチアカウント環境では どのアカウントにどれくらいセキュリティリスクがあるかがわからない という点で致命的な問題でした。

その結果としてトリアージをする SRE の負担が大きくなりすぎて、SRE が責務を持つその他の業務に時間を使えないという影響が出てきました。

Security Lake に期待したこと

そこで Security Lake （＋ QuickSight ）に期待したことは以下です。

アカウント別に Severity 別に集計した形式で一覧表示できる
コントロールを指定してどのリソースに脆弱性があるか確認できる（ドリルダウン）
リソースを指定して指摘された脆弱性が確認できる（ドリルアップ）

ドリルダウン以外は Security Hub のダッシュボードでは実現できなかったことです。

導入方法・設定等

実装自体は弊社の Data Lake 基盤に載せていて、権限管理もそこで行なっています。詳細はGunosyにおけるデータの民主化を促進するデータ基盤 - Speaker Deckをご参照ください。

ここでは Data Lake 基盤の実装方法によらずに汎用的に使っていただけると思われる、Glue テーブルスキーマ*1と Athena の View 定義をご紹介します。

Gule テーブル定義

[
  {
    "Name": "metadata",
    "Type": "struct<product:struct<version:string,feature:struct<uid:string,name:string>,uid:string,vendor_name:string,name:string>,version:string>",
    "Comment": "メタデータ",
    "Parameters": {}
  },
  {
    "Name": "time",
    "Type": "bigint",
    "Comment": "タイムスタンプ (UNIXTIME,ミリ秒単位)",
    "Parameters": {}
  },
  {
    "Name": "severity",
    "Type": "string",
    "Comment": "重大度",
    "Parameters": {}
  },
  {
    "Name": "state",
    "Type": "string",
    "Comment": "ステータス",
    "Parameters": {}
  },
  {
    "Name": "cloud",
    "Type": "struct<account_uid:string,region:string,provider:string>",
    "Comment": "クラウド情報",
    "Parameters": {}
  },
  {
    "Name": "resources",
    "Type": "array<struct<type:string,uid:string,cloud_partition:string,region:string,details:string>>",
    "Comment": "リソース",
    "Parameters": {}
  },
  {
    "Name": "finding",
    "Type": "struct<created_time:bigint,uid:string,desc:string,title:string,modified_time:bigint,first_seen_time:bigint,last_seen_time:bigint,related_events:array<struct<product_uid:string,uid:string>>,types:array<string>,remediation:struct<desc:string,kb_articles:array<string>>,src_url:string>",
    "Comment": "ファインディング",
    "Parameters": {}
  },
  {
    "Name": "compliance",
    "Type": "struct<status:string>",
    "Comment": "コンプライアンス　ステータス",
    "Parameters": {}
  },
  {
    "Name": "process",
    "Type": "struct<file:struct<type_id:int,name:string>,parent_process:struct<file:struct<type_id:int,name:string>>>",
    "Comment": "プロセスオブジェクト",
    "Parameters": {}
  },
  {
    "Name": "class_name",
    "Type": "string",
    "Comment": "イベントクラス名",
    "Parameters": {}
  },
  {
    "Name": "class_uid",
    "Type": "int",
    "Comment": "イベントクラス　ID",
    "Parameters": {}
  },
  {
    "Name": "category_name",
    "Type": "string",
    "Comment": "イベントカタログ名",
    "Parameters": {}
  },
  {
    "Name": "category_uid",
    "Type": "int",
    "Comment": "イベントカタログ　ID",
    "Parameters": {}
  },
  {
    "Name": "activity_id",
    "Type": "int",
    "Comment": "正規化したアクティビティ　ID",
    "Parameters": {}
  },
  {
    "Name": "activity_name",
    "Type": "string",
    "Comment": "アクティビティ名",
    "Parameters": {}
  },
  {
    "Name": "type_name",
    "Type": "string",
    "Comment": "イベントタイプ名",
    "Parameters": {}
  },
  {
    "Name": "type_uid",
    "Type": "int",
    "Comment": "イベントタイプ　ID",
    "Parameters": {}
  },
  {
    "Name": "state_id",
    "Type": "int",
    "Comment": "正規化したステート　ID",
    "Parameters": {}
  },
  {
    "Name": "severity_id",
    "Type": "int",
    "Comment": "正規化した重大度　ID",
    "Parameters": {}
  },
  {
    "Name": "unmapped",
    "Type": "map<string,string>",
    "Comment": "追加情報",
    "Parameters": {}
  },
  {
    "Name": "region",
    "Type": "string",
    "Comment": "リージョン",
    "PartitionKey": "Partition (0)"
  },
  {
    "Name": "account_id",
    "Type": "string",
    "Comment": "アカウント ID",
    "PartitionKey": "Partition (1)"
  },
  {
    "Name": "year",
    "Type": "string",
    "Comment": "year (YYYY)",
    "PartitionKey": "Partition (2)"
  },
  {
    "Name": "month",
    "Type": "string",
    "Comment": "month (MM)",
    "PartitionKey": "Partition (3)"
  },
  {
    "Name": "day",
    "Type": "string",
    "Comment": "day (DD)",
    "PartitionKey": "Partition (4)"
  }
]

Athena の View 定義

実際に運用している View では、 AWS アカウントIDを内部で管理しているアカウント名に変換するために社内管理のマスターテーブルを Join して、QuickSight 上で AWS アカウント名で絞り込みできるようにしています。

GA 前のテスト運用時に削除されたリソースに対しての Findings が同じコントロールでも 2件出力されることがあったため、finding_modified_time で直近 2日間に絞っています。 GA 前のテスト運用では、Finding が最初に出力された日付パーティションに出力され続ける挙動が確認されていたのですが*2、GA 後は、更新日時に近しい日付パーティションに出力される（ギリギリの更新時刻だと過去日付に出力される）ようになっていました。

前述の理由で直近 2日分の更新データに絞りたいので、日付パーティションは過去3日分に絞っています。

利用する際は ${security_hub_table} 部分は置き換えて利用してください。また、弊社では東京リージョンを主に使用している関係でリージョンを絞っていますので、ここも必要に応じて変更して利用してください。

各変数の説明は以下の通りです。

${security_hub_table}: Glue で定義した、Seucrity Lake で出力した Security Hub の Table

with resource_finding as (
    select
        t1.finding.uid as finding_uid
        , t1.region
        , t1.account_id
        , max_by(t1.metadata.product.version, t1.finding.modified_time) as metadata_product_version
        , max_by(t1.metadata.product.feature.uid, t1.finding.modified_time) as metadata_product_feature_uid
        , max_by(t1.metadata.product.feature.name, t1.finding.modified_time) as metadata_product_feature_name
        , max_by(t1.metadata.product.uid, t1.finding.modified_time) as metadata_product_uid
        , max_by(t1.metadata.product.vendor_name, t1.finding.modified_time) as metadata_product_vendor_name
        , max_by(t1.metadata.product.name, t1.finding.modified_time) as metadata_product_name
        , max_by(t1.metadata.version, t1.finding.modified_time) as metadata_version
        , max_by(t1."time", t1.finding.modified_time) as event_time
        , max_by(t1.severity, t1.finding.modified_time) as severity
        , max_by(t1.state, t1.finding.modified_time) as event_state
        , max_by(t1.cloud.account_uid, t1.finding.modified_time) as cloud_account_uid
        , max_by(t1.cloud.region, t1.finding.modified_time) as cloud_region
        , max_by(t1.cloud.provider, t1.finding.modified_time) as cloud_provider
        , max_by(resource.cloud_partition, t1.finding.modified_time) as resource_cloud_partition
        , max_by(resource.details, t1.finding.modified_time) as resource_details
        , max_by(resource.region, t1.finding.modified_time) as resource_region
        , max_by(resource.type, t1.finding.modified_time) as resource_type
        , max_by(resource.uid, t1.finding.modified_time) as resource_uid
        , max_by(t1.finding.created_time, t1.finding.modified_time) as finding_created_time
        , max_by(t1.finding.desc, t1.finding.modified_time) as finding_desc
        , max_by(t1.finding.title, t1.finding.modified_time) as finding_title
        , max_by(t1.finding.modified_time, t1.finding.modified_time) as finding_modified_time
        , max_by(t1.finding.first_seen_time, t1.finding.modified_time) as finding_first_seen_time
        , max_by(t1.finding.last_seen_time, t1.finding.modified_time) as finding_last_seen_time
        , max_by(t1.finding.related_events, t1.finding.modified_time) as finding_related_events -- array
        , max_by(t1.finding.types, t1.finding.modified_time) as finding_types -- array
        , max_by(t1.finding.remediation.desc, t1.finding.modified_time) as finding_remediation_desc
        , max_by(t1.finding.remediation.kb_articles, t1.finding.modified_time) as finding_remediation_kb_articles -- array
        , max_by(t1.finding.src_url, t1.finding.modified_time) as finding_src_url
        , max_by(t1.compliance.status, t1.finding.modified_time) as compliance_status -- key-value
        , max_by(t1.state_id, t1.finding.modified_time) as state_id
        , max_by(t1.severity_id, t1.finding.modified_time) as severity_id
        , max_by(t1.unmapped, t1.finding.modified_time) as unmapped -- map
    from
        ${security_hub_table} as t1
    cross join unnest(resources) as t (resource)
    where
        t1.region = 'ap-northeast-1'
        and
        (
            (
                t1.year = date_format(current_timestamp - interval '2' day, '%Y')
                and t1.month = date_format(current_timestamp - interval '2' day, '%m')
                and t1.day = date_format(current_timestamp - interval '2' day, '%d')
            )
            or
            (
                t1.year = date_format(current_timestamp - interval '1' day, '%Y')
                and t1.month = date_format(current_timestamp - interval '1' day, '%m')
                and t1.day = date_format(current_timestamp - interval '1' day, '%d')
            )
            or
            (
                t1.year = date_format(current_timestamp, '%Y')
                and t1.month = date_format(current_timestamp, '%m')
                and t1.day = date_format(current_timestamp, '%d')
            )
        )
    group by 1, 2, 3
)

select
    t1.finding_uid
    , t1.region
    , t1.account_id
    , t1.metadata_product_version
    , t1.metadata_product_feature_name
    , t1.metadata_product_uid
    , t1.metadata_product_vendor_name
    , t1.metadata_product_name
    , t1.metadata_version
    , from_unixtime(t1.event_time / 1000) as event_at
    , t1.severity
    , t1.event_state
    , t1.cloud_account_uid
    , t1.cloud_region
    , t1.cloud_provider
    , t1.resource_cloud_partition
    , t1.resource_details
    , t1.resource_region
    , t1.resource_type
    , t1.resource_uid
    , from_unixtime(t1.finding_created_time / 1000) as finding_created_at
    , t1.finding_desc
    , t1.finding_title
    , from_unixtime(t1.finding_modified_time / 1000) as finding_modified_at
    , from_unixtime(t1.finding_first_seen_time / 1000) as finding_first_seen_at
    , from_unixtime(t1.finding_last_seen_time / 1000) as finding_last_seen_at
    , array_join(t1.finding_types, ' ') as finding_type
    , t1.finding_remediation_desc
    , array_join(t1.finding_remediation_kb_articles, ' ') as finding_remediation_kb_article
    , t1.finding_src_url
    , t1.compliance_status -- key-value
    , t1.state_id
    , t1.severity_id
    , t1.unmapped -- map
    , regexp_replace(t1.unmapped['ProductFields'], '\\n?') as unmapped_product_fields
    , t1.unmapped['NetworkPath'] as unmapped_network_path
    , t1.unmapped['Compliance.SecurityControlId'] as unmapped_compliance_security_control_id
from resource_finding as t1
where
    (
        t1.compliance_status not in ('PASSED', 'NOT_AVAILABLE')
        or t1.compliance_status is null
    )
    and from_unixtime(t1.finding_modified_time / 1000) >= current_timestamp - interval '2' day

QuickSight での工夫点

フィルタリングで event_state に対して Resolved、Suppressed を除外すると、Security Hub で指摘があった情報だけ抽出できます。 Suppressed だけに絞ると、例外にしたリソースが抽出できます。

metadata_product_feature_name に Security Hub へ情報を登録したプロダクト名が入っているので、Security Hub の情報だけを抽出したい場合は、Security Hub で絞るとその情報だけ抽出できます。

なお、event_state はプロダクト毎に違っていますので、QuickSight 側で適宜調整が必要です。

QuickSight ダッシュボード

運用方法

Security Lake を導入後は、SRE チームが行なっているトリアージ作業の負担が目に見えて下がったことを実感しています。

これまでの運用方法との比較をすると

これまで

各アカウントの Security Hub 画面を使用してトリアージ（アカウント数分繰り返す）
新しい指摘があればリソースの詳細を確認
抑止が必要な場合は抑止対応をする

Security Lake 導入後

QuickSight のダッシュボードで一覧を確認
Critical、High の指摘があれば Security Hub のタブで詳細を確認
新しい指摘があればアカウントにログインして詳細を確認
抑止が必要な場合は抑止対応をする

大きく変わらないように見えますが、2.以降の作業は新しい指摘が発生しない限りは発生しないのですが、1.のトリアージは新しい指摘の有無を確認するために必ず発生する作業です。

1.のトリアージ作業が楽になったことで、セキュリティ脆弱性への対応フローにおける SRE チームの負担は劇的に軽くなりました。

まとめ

Security Lake を活用することで、マルチアカウント環境での Security Hub 運用が劇的に楽になります。本エントリーが皆様の Security Lake 実装にお役に立てれば幸いです。

最後になりますが強調したいことは以下です。

導入よりもその後の運用が大事
- 検知だけして満足しない、修復するまでが運用
- 運用を楽にするためには継続した改善活動が必要
Security Lake で Security Hub の情報をいい感じにダッシュボード化可能
- ぱっと見ですが 3rd Party のソリューションを使うよりも安価
- Security Hub には AWS のセキュリティサービスを統合できるのでより便利に！

*1:Security Lake を有効化すると AWS 公式のサンプルスキーマが作成されるようになっています

*2:この挙動は困るので改善要望を出していました